Register Now

Login

Lost Password

Lost your password? Please enter your email address. You will receive a link and will create a new password via email.

Login

Register Now

It will take less than 1 minute to register for lifetime. Bonus Tip - We don't send OTP to your email id Make Sure to use your own email id for free books and giveaways

Moonfrog Labs Data Science Interview Question

Moonfrog Labs is a Bangalore based start-up that makes mobile-first games for masses. It was established in 2013 with a total funding of $16M.

Location – Bangalore
Job Title – Data Scientist
Experience Required – 2-3 Years
Number of Rounds – 4 

This was one of those companies where the interview was to the point and statistics-heavy.

Round 1 – Open book SQL and/or R test
There were 10 SQL questions in 30 minutes and a separate section of R which was not mandatory to attempt for the role but added as an advantage in the further rounds.

Consider the following database schema for an e-commerce website:Given the following schema of tables:
Event table (no index / key) – contains event data recorded when an event occurs:
event_name [VARCHAR],
user_id [VARCHAR],
time [TIMESTAMP]Transaction table (no index / key) – contains transaction data for item_id [VARCHAR],
quantity [INTEGER],
price [NUMERIC],
user_id [VARCHAR],
time [TIMESTAMP]
Provide detailed queries to retrieve the answers to the following questions:-


-Average time between first visit and first purchase across users. [consider event name ‘visit’]
-Median and 80 percentile time between first visit and first purchase across users. [consider event name ‘visit’]
Average time between first visit and purchase per item_id.   
– Median and 80 percentile time between first visit and purchase item per item_id.
– Given an item_id [consider item_id ‘shoe_01’], retrieve an ordered list of items that are most likely to be purchased by a user who purchased the item. 

There were 4 questions on R with the following topics:-
1. ggplot
2. Linear Regression implementation
3 and 4 – Table and Question below


Write an automated script in excel VBA or in R (assume input in data.frame) to generate a stacked area chart Inflow vs. Date across different Sources in such a fashion that the maximum total inflow source is closest to the X (Date) axis and the least inflow source the farthest.

Assume the date, source and inflow values to be dynamic.


Round 2 – Case Study
The case study round was more about solving any real life business problem. I was asked about my hobbies and I answered with “playing Clash of Clans”, the case study topic :

“How will you change the UI of the game to increase the number of people buying coins from the shop section”

Home screen of Clash of CLans

This was a business case study, we had the discussion on the following points:-
1. Project the offer directly on the home page instead of clicking on the shop button
2. Put the price to buy elixir and gold near the collector(the pink and yellow images)
3. After every attack resulting in a loss, give an option to buy back the points
4. Get the country-wise data and see the most engaging time of the player and project the offers accordingly
5. Give a variable discount to the players on the basis of their day-to-day performance to allure them in buying coins on the day they performed well

It was a non-elimination and open-ended discussion. The following points were forwarded to the next round which received good feedback from the interviewer. You can think of many more questions

Round 3 – Technical Interview

This was a tough round, it lasted for 45 minutes and was heavily inclined towards conceptual statistics questions and confusing SQL queries. Following are the questions asked(memory based)

1. Explain Normal Distribution
2. Give a real life example of normal distribution.
3. What are the tests you are familiar with in statistics?
4. What is p-value?
5. How to generate row number in a table without using ROWNUM()

SELECT name, sal, (SELECT COUNT(*)  FROM EMPLOYEE i WHERE o.name >= i.name) row_num
FROM EMPLOYEE o
order by row_num

6. What are the differences among ROWNUM, RANK and DENSE_RANK? (Standard Question)

7. What is the probability of getting 2 numbers whose products are even when two dices were thrown?
Ans – 3/4 (Take all the cases of getting an even number using two numbers)

8. What is the probability of getting all the three cards red, when there are 5 black, 6 red and 7 blue cards?

9. How to increase the marketing of online games?
A. Always start with the most obvious points and then move towards the unorthodox solution. We started the discussion with digital marketing, moved to offline marketing and then discounts to different categories of players, etc.

Round 4 – HR and Leadership interview
Even this was a tough round to crack, there were people getting rejected after 2 hours of intensive last round. It was mostly around the mistakes you did in all the three previous rounds and some sound questions on Supervised Learning using either R or Python. You can go through the most asked Supervised Learning questions, link below

Supervised Learning interview questions

The HR round was cool, nothing fancy 🙂

About TheDataMonkGrand Master

I am the Co-Founder of The Data Monk. I have a total of 6+ years of analytics experience 3+ years at Mu Sigma 2 years at OYO 1 year and counting at The Data Monk I am an active trader and a logically sarcastic idiot :)

Follow Me

Comment ( 1 )

  1. How did you solve the Visualisation problem using R?