﻿﻿﻿﻿ Data Science – TheDataMonk

Home » Data Science

# Category Archives: Data Science

## Data Science puzzles in interview questions

In the current scenario, getting your first break into analytics can be difficult. Around 30% of analytics companies (especially the top ones) evaluate candidates on their prowess at solving puzzles. It implies that you are logical, creative and good with numbers.

The ability to bring a unique perspective to solving business problems can provide you a huge advantage over other candidates. Such abilities can only be developed with regular practice and consistent efforts.

below are some common puzzling questions which are generally asked during interviews.

Two trains X and Y (80 km from each other) are running towards each other on the same track with a speed of 40km/hr. A bird starts from the train X and travels towards train Y with a constant speed of 100km/hr. Once it reaches train Y, it turns and starts moving toward train X. It does this till the two trains collide with each other. Find the total distance traveled by the bird?

Solution : Velocity of approach for the two trains = (40 + 40) km/hr

Total time the trains will take to collide = 80km/80km/hr = 1 hour

Total distance travelled by the bird = 100km/hr * 1hr = 100 km.

You have two beakers – one of 4 liters and other of 5 liters. You are expected to pour exactly 7 liters in a bucket. How will you complete the task?

Step 1 : Fill in 5-liter beaker and empty it in the 4-liter beaker. You are left with 1 liter in the 5-liter beaker. Pour this 1 liter in the bucket.

Step 2 : Repeat step 1 and you will have 2 liters in the bucket.

Step 3 : Fill in the 5-liter beaker and add to the bucket.You now have 7 liters in the bucket.

There are 5 pirates on a ship. Pirates have hierarchy C1, C2, C3, C4 and C5.C1 designation is the highest and C5 is the lowest. These pirates have three characteristics: a. Every pirate is so greedy that he can even take lives to make more money.  b. Every pirate desperately wants to stay alive. c. They are all very intelligent.There are total 100 gold coins on the ship. The person with the highest designation on the deck is expected to make the distribution. If the majority of the deck does not agree to the distribution proposed, the highest designation pirate will be thrown out of the ship (or simply killed). Only the person with the highest designation can be killed at any moment. What is the right distribution of the coins proposed by the captain so that he is not killed and does make maximum amount?

The solution of this problem lies in thinking through what will happen if all the pirates were thrown one by one and then thinking in reverse order.

Let us name pirates as A,B,C,D and E in hierarchy (A being highest).

If only D and E are left at end, D will simply give 0 coins to E and still escape because majority cannot be reached. Hence, even if E gets 1 coin he will give his vote to the distributor.

If C, D and E are there on the deck, C will simply give one coin to E to get his vote. And D  simply gets nothing. Hence, even if D gets 1 coin he will give his vote to the distributor.

If B,C,D and E are there on the deck, B will simply give one coin to D to get his vote. C & E simply gets nothing.

If A,B,C,D and E are there on the deck, A simply gives 1 coin each to C and E to get their votes.

Hence, in the final solution A gets 98 coins and only C & E get 1 coin each.

There are 3 mislabeled jars, with apple and oranges in the first and second jar respectively. The third jar contains a mixture of apples and oranges. You can pick as many fruits as required to precisely label each jar. Determine the minimum number of fruits to be picked up in the process of labeling the jars.

This is another tricky puzzle where you must really churn your brain. A noticeable aspect in this puzzles is the fact that there’s a circular misplacement, which implies if apple is wrongly labelled as Apple, Apple can’t be labelled as Orange, i.e., it has to be labeled as A+O. We are acquainted with the fact that everything is wrongly placed, which means A+O jar contains either Apple or Orange (but not both). The candidate picks one fruit from A+O, and let’s assume he gets an apple. He labels the jar as apple, however, jar labelled Apple can’t have A+O. Thus, the third jar left in the process should be labelled A+O. Basically, picking only one fruit

To crack Data Science/Business Analyst interviews, you need to be good at puzzles and case studies.

You can take a look on my book for more such puzzles, before your interview

100 Puzzles and case studies to crack data science interview

## Data Science Interview Questions

Data Science is not an easy field to get into. This is something all data scientists will agree on. Apart from having a degree in mathematics/statistics or engineering, a data scientist also needs to go through intense training to develop all the skills required for this field. Apart from the degree/diploma and the training, it is important to prepare the right resume for a data science job and to be well versed with the data science interview questions and answers. So we have put some important questions below.

How would you create a taxonomy to identify key customer trends in unstructured data?

The best way to approach this question is to mention that it is good to check with the business owner and understand their objectives before categorizing the data. Having done this, it is always good to follow an iterative approach by pulling new data samples and improving the model accordingly by validating it for accuracy by soliciting feedback from the stakeholders of the business. This helps ensure that your model is producing actionable results and improving over the time.

Python or R – Which one would you prefer for text analytics?

The best possible answer for this would be Python because it has Pandas library that provides easy to use data structures and high-performance data analysis tools.

Which technique is used to predict categorical responses?

Classification technique is used widely in mining for classifying data sets.

What is logistic regression? Or State an example when you have used logistic regression recently.

Logistic Regression often referred as logit model is a technique to predict the binary outcome from a linear combination of predictor variables. For example, if you want to predict whether a particular political leader will win the election or not. In this case, the outcome of prediction is binary i.e. 0 or 1 (Win/Lose). The predictor variables here would be the amount of money spent for election campaigning of a particular candidate, the amount of time spent in campaigning, etc.

What are Recommender Systems?

A subclass of information filtering systems that are meant to predict the preferences or ratings that a user would give to a product. Recommender systems are widely used in movies, news, research articles, products, social tags, music, etc.

Why data cleaning plays a vital role in the analysis?

Cleaning data from multiple sources to transform it into a format that data analysts or data scientists can work with is a cumbersome process because – as the number of data sources increases, the time take to clean the data increases exponentially due to the number of sources and the volume of data generated in these sources. It might take up to 80% of the time for just cleaning data making it a critical part of analysis task.

Differentiate between univariate, bivariate and multivariate analysis.

These are descriptive statistical analysis techniques which can be differentiated based on the number of variables involved at a given point in time. For example, the pie charts of sales based on territory involve only one variable and can be referred to as univariate analysis.

If the analysis attempts to understand the difference between 2 variables at a time as in a scatterplot, then it is referred to as bivariate analysis. For example, analyzing the volume of sale and a spending can be considered as an example of bivariate analysis.

What do you understand by the term Normal Distribution?

Data is usually distributed in different ways with a bias to the left or to the right or it can all be jumbled up. However, there are chances that data is distributed around a central value without any bias to the left or right and reaches normal distribution in the form of a bell-shaped curve. The random variables are distributed in the form of a symmetrical bell-shaped curve.

What is Linear Regression?

Linear regression is a statistical technique where the score of a variable Y is predicted from the score of a second variable X. X is referred to as the predictor variable and Y as the criterion variable.

What are Interpolation and Extrapolation?

Estimating a value from 2 known values from a list of values is Interpolation. Extrapolation is approximating a value by extending a known set of values or facts.

What is power analysis?

An experimental design technique for determining the effect of a given sample size.

What is Collaborative filtering?

The process of filtering used by most of the recommender systems to find patterns or information by collaborating viewpoints, various data sources, and multiple agents.

Are expected value and mean value different?

They are not different but the terms are used in different contexts. Mean is generally referred when talking about a probability distribution or sample population whereas expected value is generally referred in a random variable context.

Do gradient descent methods always converge to the same point?

No, they do not because in some cases it reaches local minima or a local optimal point. You don’t reach the global optimal point. It depends on the data and starting conditions

For more such questions, do give this book a try

100 Questions to crack data science interview

100 Questions to crack business analyst interview

## Is data science a risky career opportunity ?

Many people see a data science career as an easy path to wealth, fame, and glory, but the reality is that data science is hard to understand, but also one of the most worth doing thing.

• You need to know some math and statistics, since you’ll be analysing data.
• You need some programming skills, since you’ll be writing programs or at least composing queries and scripts to perform that analysis.
• You need communications skills, since your work is likely to be highly collaborative and cross-functional.

Heres complex flowchart showing what a data scientist do basically:

Before you dive in data scientist wonder, you must know that you are going to deal with these concepts. The reason why it is blurred is because I don’t want you people to think or get demotivated thinking about the hefty work ahead.

1. Fundamentals
2. Statistics
3. Programming
4. Machine Learning
5. Text Mining / Natural Language Processing
6. Data Visualisation
7. Big Data
8. Data Ingestion
9. Data Munging
10. Toolbox

Each area / domain is represented as a “metro line”, with the stations depicting the topics you must learn / master / understand in a progressive fashion. The idea is you pick a line, catch a train and go through all the stations (topics) till you reach the final destination (or) switch to the next line.

Data-savvy youngsters who are thinking about which approach to take their skills may need to take note. There’s no risk at all if you want more job opportunities, and perhaps more job security, becoming a data scientist might be a better career choice. So unless analytics drives business impact, it is not analytics, it is just statistics, it is just data science.

With the phenomenal growth and the significance of big data-will grow bigger. The stack of data will keep going up at a quick pace and it is anticipated that our capability to turn big data into structured information that can be used by businesses will likewise enhance dramatically in the upcoming years.

It is risky because the data science field is relatively young and evolving fast, which could potentially make some skills obsolete / less useful in rather disruptive ways.

It is not risky because the demand for data scientists (with different skills) from many different industries will keep being very strong in near future

If the job were easy, there wouldn’t be

such a demand for people who can do it, But if you are fresher and having the right aptitude and attitude, the rest can all be learned.

End if you have the skills and have already proven yourself, just knuckle down and get the work done.

Keep Learning.

## What is data science and why its important for you?

Data science is a disciplinary blend of data inference, algorithms development, and technology in order to solve relative complex problems.

At the core is data. Stacks of raw information, streaming in and stored in enterprise data warehouses. Much to learn by mining it. Advanced capabilities we can build with it. Data science is ultimately about using this data in creative ways to generate business value and eventually make something out of it.

Data science and  discovery of data insight

This aspect of data science is all about uncovering findings from data. Diving in at a raw level to mine and understand complex behaviours, trends, and inferences. It’s about surfacing hidden insight that can help enable companies to make smarter business decisions to increase their profit. For example:

Netflix data mines movie viewing patterns to understand what drives user interest, and uses that to make decisions on which Netflix original series to produce and make sequel to.

Target identifies what are major customer segments within it’s base and the unique shopping behaviours within those segments, which helps to guide messaging to different market audiences.

Proctor & Gamble utilises time series models to more clearly understand future demand, which help plan for production levels more optimally.

How do data scientists mine out insights? It starts with data exploration. When given a challenging question, data scientists become detectives. They investigate leads and try to understand pattern or characteristics within the data. This requires a big dose of analytical creativity.

Then as needed, data scientists may apply quantitative technique in order to get a level deeper – e.g. inferential models, segmentation analysis, time series forecasting, synthetic control experiments, etc. The intent is to scientifically piece together a forensic view of what the data is really saying.

This data-driven insight is central to providing strategic guidance. In this sense, data scientists act as consultants, guiding business stakeholders on how to act on findings.

How data mining and sorting algorithms finds and engineer your decisions

Amazon’s recommendation engines suggest items for you to buy, determined by their complex algorithms. Netflix recommends movies to you. Spotify recommends music to you and so on.

Gmail’s spam filter is data product – an algorithm behind the scenes processes incoming mail and whether decides if a message is junk or not.

Computer vision used for self-driving cars is also data product – machine learning algorithms are able to recognise traffic lights, other cars on the road, pedestrians, or any obstacle etc.

Data scientists play a central role in developing data product. This involves building out algorithms, as well as testing, refinement, and technical deployment into production systems. In this sense, data scientists serve as technical developers, building assets that can be leveraged at wide scale.

## How to crack a business analyst interview ?

Did the question anytime occur to you as to who manages the problems of a firm or a project, who is the person in charge to find out solutions to the problems in the firm, who manages a single project and is responsible for the flourishing of the project?

• Behavioral skills
• Communication skills
• Analytical and problem solving skills
• Requirements related skills
• Data analysis skills
• Domain knowledge
• SQL

These are the skill requirements to become a business analyst. Now let’s discuss the degree and the field to be studied to become a business analyst. Any bachelor’s degree is efficient to become a business analyst, but recommended is you do a master’s degree too as employers prefer postgraduates than undergraduates.

The degree field must be business administration, management, accounting, marketing, economics, statistics, computer and information science. Experiences in these fields are also required to be employed by a reputed company. Usually experience of 2-3 years is mandatory to be hired as a business analyst. Graduates can look for entry level jobs positions in business management, human resources, information technology and related fields to gain work experience. Working under the supervision of a senior analyst or a team of consultants can help you grow and understand the basics and the different tactics used by the top companies.

Doing this until you have gained enough experience to work independently must be the goal. Getting certified in this field after the experience can be a good boost to your resume. Thedatamonk.com is a good place to understand and know more about this topic. Especially if you have a business analyst interview and would like to know many interview questions asked by the interviewer, here’s a book which will help you get par in your interview!

Do check this books out yourself and drop your valuable reviews and comments on thedatamonk website- http://thedatamonk.com/

## Average pay of a data analyst is way more than I.T. standard. How to be a data analyst?

Data analytic is a process of examining and extracting meaning from raw data [ historical or new] using specialized computer systems. It is used by various organizations and companies to make better decisions and also verify and disapprove existing theories or models. The most important skills in data analytic are not technical skills, it’s the thinking skills. Being able to separate good data from bad data and knowing where exactly can add value. Now let’s talk about the skills required to become a data analyst.

• Programming skills
• Statistical skills and mathematics
• Machine learning skills
• Data wrangling skills
• Communication and Data visualization skills
• Data institution

These are the required skills to become a data analyst. Any person can learn these skills as it is very easy. You just need to have some basic knowledge about programming and some basic mathematics and statistics. The programming languages you need to learn to become a data analyst are R and python(at max). These languages are very important. It is very important for professional to be able to think like a data analyst. The ability to find out raw data and convert it to another format that allows for a more convenient consumption of the data. Data analysts work mostly in industries like Healthcare, travel, gaming and energy management. The need of data analysts are growing and there are not many data analysts in the world. The package for a data analyst is handsome and there are many jobs up for grabs. Thedatamonk.com is a website which provides the right platform to get going in the field of data analysis. It has published many books which can be bought from amazon that provides you the perfect guidance to establish yourself in the modern world.

The path followed by most of the data analyst is more or less the same:-

1. Start with any Query Language (SQL preferred). Try to get the basics clear on the very first go. Keep updating your knowledge
2. Try to clear your statistical concepts. A data scientist is of no use until and unless he/she knows the stats behind the number
3. Solve case studies – Be it pharmaceutical, web analytic, telecom, airlines, retail, etc. Just try to understand the way how people approach a problem

To help you in this preparation, we have – 100 Questions to solve data science interview
Check it out! You can also leave your review and comments on the amazon site after purchasing the book!

## What is data science? Why is it the sexiest job of 21st century?

The Sexiest Job of the 21st century i.e Data Science, is becoming a renowned job in the top MNC’s. Data is everywhere and is rapidly growing throughout the world!

What is Data Science?

The preparation, cleansing and analysis of data is what we call Data Science. Data science is the combination of statistics,programming, mathematics, problem solving and capturing data in numerous ways. To become a data scientist one must know the following skills;

-Python coding
-SQL database/coding
-Working with unstructured data
-In-depth knowledge in SAS
-In-depth knowledge in R.

The pay scale of this job is huge and no one should miss a chance to bring it to their lives!  Data science will become a must have for enterprise companies looking to scale data science operations. The number of smart cities will continue to grow and the need for data science in the government will grow with it. These smart cities are making data available publicly to promote innovation. The intention behind these projects are to encourage engineers, scientists and policymakers to work together in order to improve life. From CCTV’s on the roads to the nodes on traffic signals- collect real time data on environment, infrastructure and activity. There are a lot more data present than your imagination but the one true fact is that most companies and also the government have come to see the importance of data science. Data science is not new, but it has gained a lot of attention and popularity in this century. thedatamonk.com brings you an opportunity by providing a book that will help you understand the concept of data science easily and also understand how to get a career through data science. The book’s name is ‘How to start a career in Data Science’ and is available on amazon for purchase. Here’s the link to the book

The book 100 Questions to crack data science interview has helped 100s of professionals as well as final year students in cracking their data science interview. Do give it a try 🙂

https://www.amazon.in/How-Start-Career-Data-Science-ebook/dp/B01M98U3OJ/ref=sr_1_4?s=digital-text&ie=UTF8&qid=1495401123&sr=1-4

Do check out the book and drop off your reviews and comments at thedatamonk.com. You can also leave your review and comments on the amazon site after purchasing the book!

## 7 Most important Interview Questions to crack Data Science Interview

Hey all,
Data Science has always been around. We have been crunching numbers to get more and more insight out of the data. If you want to experience this challenging job, then you need tp be prepared for these 10 questions:-

1. Introduce yourself – Cliche but very important question, you need to know how to pitch yourself and where to leave a loose end tp trap the interviewer in asking those questions which you want him to ask. If I were at your place, I would have introduce something like below:-

“Hey, this is John Doe, I studied from XYZ and have been a part of ABC company for the last 1 year. I love to play clash of clans and counter strike. I have been part of 2 important analytical project, If you want we can discuss these projects for more context (This is like an open end question and the interviewer will definitely ask you about your project, you just have to pick it from here)

2. What tools and technologies do you use in your current organization?
Ans. – Again you have to put your positive points upfront. Basically you should have the following bucket:-
Any query language – SQL (This is the most important query language) You can learn SQL here 112 questions to crack Business analyst interview using SQL
For analysis – R/SAS/Python – 100 questions to learn R in 6 hours

Ans.) This should be a winning stroke for you. You got to tell them the best analytical project you have done. In case you don’t have any experience in analysis, I strongly suggest going through Complete analytical project before data science interview. This contains a proper Linear regression project from head to toe.

4. Any case study –
Example – You work at a restaurant and have data of customers i.e. age, name, address, pincode, item ordered, etc. How will you recommend something to the customer from your database
Ans – You can go for co-occurrences , take the age of the customer and look for all those customers with tha same age. Then filter on those customers who ordered the same item and then look for the top 5 co-occurrence item. Now look at the historic data of the customer and look if anything matches. If something matches, then recommend that else recommend the top product from co-occurrence.

For more case studies –

100 puzzles and case studies to crack data science interview

5. The interviewer will definitely ask some logical problem to check your thinking, example – How would you know whether the light of a refrigerator glows or not when you shut its door
Ans.) Put a video camera or phone and shut the door. Replay the video to see the result.
b. Put a radium inside the fridge and then take it out after 10 mins in a dark room. If it’s sparking/glowing then there was light inside

For more logical puzzles –  100 puzzles and case studies to crack data science interview

6. Why do you want to leave your company?
Ans. It’s mostly likely a trap to see if you are trust worthy. Answers to these questions should be there in your mind. You can blame the slow learning experience or lack of exposure, but never blame the company or your boss.

In order to get the answers to all the HR questions try 100 puzzles and case studies to crack data science interview

7. Any question on SQL – These are the most asked questions :-
a. Dense rank
b. rank over partition by
c. where vs having
d. group by, order by

Ans. These are the hot topics. Go through them at least once. If you want to learn more important question then do try
112 questions to crack Business analyst interview using SQL

Do comment if you need article on a particular topic

Thanks for hearing us out,
The Data Monk

## Data Science Interview Question

Data Science is not an easy field to get into. This is something all data scientists will agree on. Apart from having a degree in mathematics/statistics or engineering, a data scientist also needs to go through intense training to get into the industry.
Consider our top Data Science Interview Questions and Answers as a starting point for your data scientist interview preparation. Following are our top questions

What are Recommender Systems?

A subclass of information filtering systems that are meant to predict the preferences or ratings that a user would give to a product. Recommender systems are widely used in movies, news, research articles, products, social tags, music, etc.

Differentiate between univariate, bivariate and multivariate analysis.

These are descriptive statistical analysis techniques which can be differentiated based on the number of variables involved at a given point of time. For example, the pie charts of sales based on territory involve only one variable and can be referred to as univariate analysis.

If the analysis attempts to understand the difference between 2 variables at time as in a scatterplot, then it is referred to as bivariate analysis. For example, analysing the volume of sale and a spending can be considered as an example of bivariate analysis.

Analysis that deals with the study of more than two variables to understand the effect of variables on the responses is referred to as multivariate analysis.

What does P-value signify about the statistical data?

P-value is used to determine the significance of results after a hypothesis test in statistics. P-value helps the readers to draw conclusions and is always between 0 and 1.

•           P- Value > 0.05 denotes weak evidence against the null hypothesis which means the null hypothesis cannot be rejected.

•           P-value <= 0.05 denotes strong evidence against the null hypothesis which means the null hypothesis can be rejected.

•           P-value=0.05is the marginal value indicating it is possible to go either way.

What is the difference between Supervised Learning an Unsupervised Learning?

If an algorithm learns something from the training data so that the knowledge can be applied to the test data, then it is referred to as Supervised Learning. Classification is an example for Supervised Learning. If the algorithm does not learn anything beforehand because there is no response variable or any training data, then it is referred to as unsupervised learning. Clustering is an example for unsupervised learning.

How can outlier values be treated?

Outlier values can be identified by using univariate or any other graphical analysis method. If the number of outlier values is few then they can be assessed individually but for large number of outliers the values can be substituted with either the 99th or the 1st percentile values. All extreme values are not outlier values.The most common ways to treat outlier values –

1) To change the value and bring in within a range

2) To just remove the value.

During analysis, how do you treat missing values?

The extent of the missing values is identified after identifying the variables with missing values. If any patterns are identified the analyst has to concentrate on them as it could lead to interesting and meaningful business insights. If there are no patterns identified, then the missing values can be substituted with mean or median values (imputation) or they can simply be ignored.There are various factors to be considered when answering this question-

• Understand the problem statement, understand the data and then give the answer.Assigning a default value which can be mean, minimum or maximum value. Getting into the data is important.
• If it is a categorical variable, the default value is assigned. The missing value is assigned a default value.
• If you have a distribution of data coming, for normal distribution give the mean value.
• Should we even treat missing values is another important point to consider? If 80% of the values for a variable are missing then you can answer that you would be dropping the variable instead of treating the missing values.

How can you deal with different types of seasonality in time series modelling?

Seasonality in time series occurs when time series shows a repeated pattern over time. E.g., stationary sales decreases during holiday season, air conditioner sales increases during the summers etc. are few examples of seasonality in a time series.

Seasonality makes your time series non-stationary because average value of the variables at different time periods. Differentiating a time series is generally known as the best method of removing seasonality from a time series. Seasonal differencing can be defined as a numerical difference between a particular value and a value with a periodic lag (i.e. 12, if monthly seasonality is present)

Can you explain the difference between a Test Set and a Validation Set?

Validation set can be considered as a part of the training set as it is used for parameter selection and to avoid Overfitting of the model being built. On the other hand, test set is used for testing or evaluating the performance of a trained machine leaning model.

In simple terms ,the differences can be summarized as-

• Training Set is to fit the parameters i.e. weights.
• Test Set is to assess the performance of the model i.e. evaluating the predictive power and generalization.
• Validation set is to tune the parameters.

For more questions, stay tuned

## How to crack Data Science Interview

Hey..This is TheDataMonk and we are a group of subject matter expert in the field of Business Analysis.
Data Science is crowned as the “sexiest job of 21st Century” and we get a lot of requests to write the path one should follow in order to start career in data science or to switch to a better analytics firm.
So, here we are with a road map to success in a data science interview.

The whole road map will be divided into parts:-
1. Who is a Data Scientist?
2. How does a regular day looks like as a Data Scientist?
3. What are the technical requirements? A complete guide on tools and technologies
4. Types of analysis we do
5. How to use Statistics?
6. Dark secrets of Data Science