The world’s largest brewer is Anheuser-Busch InBev. Their vision is to be recognized as the Best Beer Company, bringing people together for a better world.
For centuries, the act of sharing a beer has been known to unite people. Even in today’s hyper-connected, always-on world, this simple gesture retains its significance. AB InBev, a multinational beverage and brewing company headquartered in Leuven, Belgium, is committed to driving growth that enhances the lives of individuals worldwide. This is achieved through brands and experiences that foster connectivity, dedication to brewing top-quality beer with premium ingredients, and support for the growth of farmers, retailers, entrepreneurs, and communities.
A company is being built to withstand the test of time, not just for a decade, but for the next 100 years. Through their brands and investments in communities, they aim to unite more people, ensuring that their company becomes an integral part of consumers’ lives for generations to come. Their diverse portfolio of over 400 beer brands includes globally recognized names like Budweiser, Corona, and Stella Artois, as well as international brands such as Leffe and Hoegaarde, along with local favorites like Haywards and Knockout. The tradition of brewing the finest beer stretches back over 600 years, spanning continents and generations.
Skills Required
SQL
Statistics
Python
Algorithms
Probability
Machine Learning
Analytics
A/B Testing
Questions Asked
SQL Questions
Question: The CFO of a large corporation wants to analyze the expenses in all departments for the previous financial year (2022). Write a SQL query to calculate the total expenditure for each department in 2022. Additionally, include the average expense across all departments in 2022 for comparison. Ensure that the output includes the department name, total expense, and average expense (rounded to 2 decimal places). Sort the data in descending order by total expenditure.
Input Tables:
departments table:
Column
Type
id
INTEGER
name
VARCHAR
expenses table:
Column
Type
id
INTEGER
department_id
INTEGER
amount
FLOAT
date
DATE
Output Table:
Column
Type
department_name
VARCHAR
total_expense
FLOAT
average_expense
FLOAT
Question: Design a function named rain_days to determine the probability of rain on the nth day following today, taking into account the current and previous day’s weather conditions. If it is raining both today and yesterday, there’s a 20% likelihood of rain tomorrow. If it rained on one of the past two days, there’s a 60% chance of rain tomorrow. If neither day saw rain, the probability of rain tomorrow is 20%.
Example: Input: n=5
Output:
def rain_days(n) -> 0.39968
Behavioral Questions
At Anheuser-Busch, we advocate for employees surpassing their personal benchmarks. How do you push your limits and strive for growth in your professional environment?
Discuss a recent setback you encountered at work and elaborate on the lessons you derived from it.
Describe a recent learning experience you encountered at work, specifically focusing on a situation where things didn’t go as planned. What insights did you gain from this experience?
What strategies do you employ to maintain your motivation, particularly during challenging times?
Anheuser-Busch values employees who embrace change and innovation. Can you provide examples of how you’ve successfully adapted to changes in your previous workplace?
Describe an instance where you spearheaded a workplace initiative or improvement.
Could you share any recent insights or information you’ve come across regarding one of Anheuser-Busch’s brands?
Should Anheuser-Busch consider incorporating nutritional information on its beer labels?
What strategies or approaches do you believe Anheuser-Busch could adopt to enhance innovation within the company?
Are there any inquiries you’d like to raise regarding this position or opportunities within Anheuser-Busch InBev?
The Data Monk services
We are well known for our interview books and have 70+ e-book across Amazon and The Data Monk e-shop page . Following are best-seller combo packs and services that we are providing as of now
YouTube channel covering all the interview-related important topics in SQL, Python, MS Excel, Machine Learning Algorithm, Statistics, and Direct Interview Questions Link –The Data Monk Youtube Channel
Website – ~2000 completed solved Interview questions in SQL, Python, ML, and Case Study Link –The Data Monk website
E-book shop – We have 70+ e-books available on our website and 3 bundles covering 2000+ solved interview questions. Do check it out Link –The Data E-shop Page
Instagram Page – It covers only Most asked Questions and concepts (100+ posts). We have 100+ most asked interview topics explained in simple terms Link –The Data Monk Instagram page
Company – OYO Designation – Data Analyst Year of Experience required – 0 to 4 years Technical expertise – SQL, Python, Case Study, and Statistics Salary offered – 10 to 18 LPA (no Stocks, 10% variable) – 80% hike Number of Rounds – 4
There were 4 to 5 rounds in the recruitment process, the number of rounds depend on the candidate’s performance in the technical round.
Round 1 – Written SQL round Round 2 – SQL (based on the questions asked in the first round) Round 3 – Project-based questions and statistics (basic) Round 4 – Case Study Round 5 – Hiring Manager as well as case study
Round 1 – Written SQL round
There were 4 SQL questions(mostly from Hacker Rank) that need to be solved in 1 hour. Question split:- – 2 easy – 1 medium – 2 hard
For easy questions, concentrate on the basics of rank, lead, lag, and aggregate functions.
Round 2 – SQL Interview
This round was mostly around the written questions asked in the previous round and the approach of your solution. You need to have at least 3 correct answers to get to this round
Tips – Concentrate on communicating the approach, the questions in this round is completely on the 5 written questions, so you can revise the approach or concepts of these questions before the second round.
Round 3 – Project Based Questions and Statistics
I had a project in Machine Learning (ARIMA forecasting) so questions were mostly around the problem that we were trying to solve and some statistics concepts:- – What is the p-value? – What is correlation? Give an example of a negative correlation – Complete walk-through of the ARIMA model – What is multicollinearity? – Difference between regression and classification model – What is the degree of freedom?
Questions were mostly based on the type of project that you had written in your resume and the statistics or concepts associated with it.
So, for this round do prepare your project in as much detail as possible
Round 4 – Case Study
The technical rounds were the most important rounds. If you have performed decently in the first 3 rounds then there is a high chance of converting the role.
Case Study asked to me – How can Netflix increase its revenue by 50% in the next couple of years? It was a mix of guesstimates and business case studies.
So, I started with some approx numbers and their current split. For Example – Netflix has a total revenue of $100 Million and they are currently in 4 verticals and 10 countries. The current verticals are Hollywood, Bollywood, TV Series, and Other Country shows. The 10 countries are India, the USA, the UK, and 7 more small population countries. Assumption – India has 60% of the total revenue and 100% of the revenue is coming from Bollywood movies.
After a set of assumptions, we had to discuss the approach, the important points that we discussed were:- – Moving to or acquiring already performing OTT or their most-watched series – Advertisement to screen time ratio. To either increase the advertisement length or the frequency of it in a show or movie – Reducing the number of users that can use one subscription in parallel – Making a provision of taking the phone numbers that would be associated with one account at the time a user is buying the subscription. This will reduce the frequency of distribution of subscription
There were discussions on each of these points, you just need to bring as many diverse points in the discussion as possible. Do comment your approach in the comment box below.
Round 5 – Hiring Manager Round
This round was mostly around cultural fit wherein the candidate’s previous experience was checked along with the work culture he/she was working in. But, I was asked one more question i.e. to decide the price of a micro stay in OYO rooms. SO, OYO rooms were moving to a micro stay model where you can book a room for 6-12 hours, so the question was to have a dynamic rate charter for the booking of the room.
My approach was to have a Linear Regression model to get the rate of the room. And the independent variables that I suggested were:- – Daily price of the room – Day of booking – Price of the adjacent rooms – Time of booking – Customer Life Time Value who is booking the room – Number of rooms and number of booked rooms for that day – Holiday season impact
OYO SQL Interview Questions
There were 10+ SQL questions, 6-7 easy/theoretical , a couple of medium problem and 1 hard problem. The hard problem was picked directly from Hacker Rank, so practice all the problems. The medium difficulty problems were like the one give below:
Question 1:You have data on people have applied for a lottery ticket. The data consists of their name and ticketnumber. You have to choose winners by selecting the people present in the alternate rows (the first winner starting from row number 3). Write a query to make things easy to select the winners.
Answer:
select * from (select name, ROW_NUMBER() over (order by ticket_no) as srNo from db) t where (t.srNo % 2) = 1
Question 2:Find all the students who either are male or live in Mumbai ( have Mumbai as a part of their address). Answer:Select name From students Where lower(gender) in (‘male’,’m’) Or lower(address) = ‘%mumbai%’
Question 3:Can you join two table without any common column? Answer:Yes we can do cross join without any common column. Eg: We have Roll Number, Name of Students in Table A and their Class (let’s say 5th) in Table B. 21 We will use cross join to append class against each student.
SELECT B.CLASS,A.ID,A.NAME FROM A, B WHERE 1=1
Question 4:
Select case when null=null then ‘Amit’ else ‘Rahul’ end from dual. What will be the output of the above query? Answer:The Null value has a memory reference.2 Null values cannot have same memory Reference. So output will be ‘Rahul’.
Question 5: List the different types of relationships in SQL.
There are different types of relations in the database:
One-to-One – This is a connection between two tables in which each record in one table corresponds to the maximum of one record in the other.
One-to-Many and Many-to-One – This is the most frequent connection, in which a record in one table is linked to several records in another.
Many-to-Many – This is used when defining a relationship that requires several instances on each sides.
Self-Referencing Relationships – When a table has to declare a connection with itself, this is the method to employ.
Question 6: What are the differences between OLTP and OLAP?
Answer:OLTP stands for online transaction processing, whereas OLAP stands for online analytical processing. OLTP is an online database modification system, whereas OLAP is an online database query response system.
Question 7: What is the usage of the NVL() function?
Answer: You may use the NVL function to replace null values with a default value. The function returns the value of the second parameter if the first parameter is null. If the first parameter is anything other than null, it is left alone.
OYO Case Study Questions
Case Study – Suggest as many important KPIs as possible that you want to put on the CXOs dashboard
Following were the suggested KPIs
Average Daily Rate (ADR)
Occupancy rate
Revenue per Available Room (RevPAR)
Gross Operating Profit per Available Room (GOPPAR)
Average Length of Stay (ALOS)
Customer Acquisition Cost (CAC)
Customer Lifetime Value (CLV)
Net Promoter Score (NPS)
Online Reputation Score (ORS)
Room Revenue Contribution by Channel
Website Conversion Rate
Direct Booking Ratio
Repeat Guest Ratio
Housekeeping Productivity Ratio
Employee Turnover Rate
Revenue per Employee (RPE)
Cost per Occupied Room (CPOR)
Cost per Available Room (CPAR)
Total Revenue by Property
Total Expenses by Property
OYO Statistics and Python Interview Questions
I had a couple of projects on Machine Learning, so a few questions were asked on statistics
What is Skewness?
Skewness is a measure of the asymmetry of a distribution. This value can be positive or negative.
A negative skew indicates that the tail is on the left side of the distribution, which extends towards more negative values.
A positive skew indicates that the tail is on the right side of the distribution, which extends towards more positive values.
A value of zero indicates that there is no skewness in the distribution at all, meaning the distribution is perfectly symmetrical.
2. What is Kurtosis?
Kurtosis is a measure of whether or not a distribution is heavy-tailed or light-tailed relative to a normal distribution.
• The kurtosis of a normal distribution is 3.
• If a given distribution has a kurtosis less than 3, it is said to be platykurtic, which means it tends to produce fewer and less extreme outliers than the normal distribution.
• If a given distribution has a kurtosis greater than 3, it is said to be leptokurtic, which means it tends to produce more outliers than the normal distribution.
3.How are covariance and correlation different from one another?
Covariance measures how two variables are related to each other and how one would vary with respect to changes in the other variable. If the value is positive, it means there is a direct relationship between the variables and one would increase or decrease with an increase or decrease in the base variable respectively, given that all other conditions remain constant.
Correlation quantifies the relationship between two random variables and has only three specific values, i.e., 1, 0, and -1.
1 denotes a positive relationship, -1 denotes a negative relationship, and 0 denotes that the two variables are independent of each other.
4.What is Multicollinearity ?
Multicollinearity occurs when two or more independent variables are highly correlated with one another in a regression model. This means that an independent variable can be predicted from another independent variable in a regression model.
5.What is VIF?
Variance inflation factor (VIF) is a measure of the amount of multicollinearity in a set of multiple regression variables. In general, a VIF above 5 indicates high correlation and is cause for concern. Some authors suggest a more conservative level of 2.5 or above and it depends on the situation.
6.What is a confusion matrix and why do you need it?
Confusion matrix is a table that is frequently used to illustrate the performance of a classification model i.e., classifier on a set of test data for which the true values are well-known. It allows us to visualize the performance of an algorithm/model. It allows us to easily identify the confusion between different classes. It is used as a performance measure of a model/algorithm. It is summary of predictions on a classification model.
7.What do you mean when you say “Strings are immutable”?
Strings in Python are immutable i.e you can not change the defined string.
You can not change a part of the string, as it is immutable.
8.Are lists mutable ? Lists are mutable i.e. you can change the values already present in the list.
9.Is dictionary zero indexed? Can we pull something like Team[0] from the above example?
The whole purpose of having a dictionary is that you can have your own index i.e. key. So, to answer the question, Dictionary is not zero indexed.
You can not use the basic index thing example, you can not use Team[0] to pull the first value because you have already specified an index to all the values
10.What is the function range() ?
Range(10) will get you numbers from 0 to 9. But you need to put this range in some data type. Suppose you want to put this in a list.
There were a 2-3 more questions on Python, mostly around for loop and pattern printing.
The Data Monk services
We are well known for our interview books and have 70+ e-book across Amazon and The Data Monk e-shop page . Following are best-seller combo packs and services that we are providing as of now
YouTube channel covering all the interview-related important topics in SQL, Python, MS Excel, Machine Learning Algorithm, Statistics, and Direct Interview Questions Link –The Data Monk Youtube Channel
Website – ~2000 completed solved Interview questions in SQL, Python, ML, and Case Study Link –The Data Monk website
E-book shop – We have 70+ e-books available on our website and 3 bundles covering 2000+ solved interview questions. Do check it out Link –The Data E-shop Page
Instagram Page – It covers only Most asked Questions and concepts (100+ posts). We have 100+ most asked interview topics explained in simple terms Link –The Data Monk Instagram page
The Atlassian interview process comprises various stages, starting with an initial phone screening and progressing through technical assessments, coding challenges, and multiple rounds of in-person or virtual interviews. These interviews encompass a mix of technical inquiries, behavioral assessments, and interactive discussions with potential team members or managers. Reflecting Atlassian’s emphasis on collaboration, the process may involve group activities or presentations to gauge candidates’ teamwork skills. Adequate preparation for both technical and behavioral questions is crucial, along with showcasing a solid understanding of Atlassian’s products and core values.
At Atlassian, proficiency in SQL is essential for extracting and analyzing user behavior data to improve software tools and for managing databases to ensure efficient retrieval and updating of product information. Consequently, SQL-related questions are a common feature in interviews for Data Science, Data Engineering, and Data Analytics positions at Atlassian.
If you’re preparing for a SQL interview, here are few Atlassian SQL interview questions for you to practice
Questions
Provide an example of ticket closure and outline the steps involved in conducting hypothesis testing for the given scenario.
What strategies would you employ to enhance the efficiency of the ticket closure process?
How would you address a situation where collaboration and support from colleagues and managers are lacking in your work environment?
SQL Questions
1. Why is the UNIQUE constraint used in SQL?.
Answer:
The UNIQUE constraint in SQL serves the crucial function of ensuring that all values within a specified column are distinct. This constraint is commonly utilized in conjunction with other constraints, such as NOT NULL, to impose specific rules on the data.
For instance, let’s consider a scenario in a university database where student information is stored:
In this example, the UNIQUE constraint is applied to the “student_email” field to guarantee that each student’s email address is unique. This helps maintain data integrity within the student_records table, preventing the possibility of having two students with the same email address. The UNIQUE constraint acts as a safeguard against potential errors and duplicates in the dataset, ensuring the reliability of the information stored.
2. Explain the concept of a database index and provide examples of various index types.
Answer: A database index is employed to enhance database performance by reducing the search scope required to retrieve a record.
There exist different types of indexes within a database:
Primary Index: Utilizes a unique identifier for direct row access.
Unique Index: Enforces uniqueness for the indexed columns in a table.
Composite Index: Established on multiple columns, expediting searches across various columns.
Clustered Index: Dictates the physical data order within a table.
3. Scenario: Calculate the Average Processing Time for Orders Across Regions
Imagine a scenario where a company manages orders across various regions using a database with two tables: regions and orders. The regions table contains information about different regions and their region_id, while the orders table includes details such as order_id, the region in which it was processed, and the timestamps for order creation and fulfillment.
Objective: Write a SQL query to determine the average processing time for orders in each region.
regions Example Input:
region_id
region_name
1
North Region
2
South Region
3
East Region
orders Example Input:
order_id
region_id
created_at
fulfilled_at
1
1
2021-07-05 08:00:00
2021-07-07 12:00:00
2
1
2021-07-06 10:00:00
2021-07-08 14:00:00
3
2
2021-07-07 12:00:00
2021-07-09 16:00:00
4
2
2021-07-08 14:00:00
2021-07-10 18:00:00
5
3
2021-07-09 16:00:00
2021-07-11 20:00:00
Answer: A SQL query can be formulated to join the two tables on region_id, calculate the time difference between fulfilled_at and created_at for each order, and then group by region_id to compute the average processing time.
Here’s a potential PostgreSQL query:
This query adapts the previous example to calculate the average processing time for orders in each region, with the output providing a list of regions and their corresponding average processing time in hours. Adjust the divisor as needed for different time units.
4. Scenario: Monthly Average User Engagement
Imagine Atlassian is keen on understanding user engagement across its products, Jira and Confluence. For this analysis, a database table named user_activity_logs is used, documenting entries whenever a user interacts with either product. The table comprises columns for log_id (a unique identifier), user_id (identifying the user), product (indicating the tool used – ‘Jira’ or ‘Confluence’), timestamp (the date and time of the activity), and duration (the time spent on the activity in minutes).
Objective: Construct a SQL query to determine the average user engagement duration for each product on a monthly basis for the year 2022.
user_activity_logs Example Input:
log_id
user_id
product
timestamp
duration
1001
101
Jira
01/05/2022 09:30:00
25
1002
102
Confluence
02/10/2022 12:45:00
40
1003
103
Jira
03/15/2022 14:00:00
20
1004
104
Confluence
04/20/2022 10:00:00
30
1005
105
Jira
05/25/2022 16:30:00
35
1006
106
Confluence
06/30/2022 11:00:00
15
1007
107
Jira
07/05/2022 13:45:00
18
1008
108
Confluence
08/12/2022 10:30:00
28
Example Output Table:
month
product
avg_duration
1
Jira
25.00
2
Confluence
40.00
3
Jira
20.00
4
Confluence
30.00
5
Jira
35.00
6
Confluence
15.00
7
Jira
18.00
8
Confluence
28.00
Answer: To determine the average user engagement duration for each product monthly in the year 2022, a SQL query can be crafted as follows:
This query filters activities from the year 2022, groups the data by the month and product, and calculates the average duration for each group. The final result is ordered by month and then product. Adjust column names and specifics based on the structure of your user_activity_logs table.
5. Scenario: Determining Most Frequently Used Software Tools
Imagine you have a database that tracks user interactions with various software tools provided by Atlassian. Your goal is to identify the software tool that was most frequently used over the last month.
To achieve this, you have a ‘User’ table storing user data and a ‘Usage’ table recording instances of users interacting with the software tools.
users Sample Input:
user_id
user_name
101
Alice
102
Bob
103
Carol
104
David
usage Sample Input:
usage_id
user_id
product_name
date
1
101
Jira
07/22/2022
2
102
Jira
07/15/2022
3
101
Confluence
07/02/2022
4
103
Bitbucket
07/15/2022
5
104
Trello
07/25/2022
6
103
Bitbucket
07/30/2022
7
101
Jira
07/19/2022
8
102
Jira
07/15/2022
9
103
Bitbucket
07/15/2022
10
101
Confluence
07/23/2022
Task: Write a query to retrieve the name of the software tool that was most frequently used in the last month and provide the total number of usages.
Answer:
This query groups records in the ‘usage’ table by ‘product_name’, counts occurrences for each group, and limits the result to the tool with the highest count. The WHERE clause ensures only records from the current month are considered. The EXTRACT function retrieves the month part of the date in PostgreSQL, and DATE_TRUNC returns the first day of the current month. If there are ties for the most frequent usage, this query will return one of them. If you want all usages in case of ties, you can remove the LIMIT clause.
6. Scenario: Analyzing Issue Resolution Duration
Imagine you’re tasked with analyzing issue resolution patterns for projects at Atlassian, a software development company that extensively uses issue tracking systems like Jira. Your objective is to determine the average duration in days it takes to resolve issues for each project.
issues Sample Input:
issue_id
project_id
status
reported_date
resolved_date
201
987
resolved
03/12/2022
04/05/2022
302
654
resolved
04/15/2022
05/10/2022
453
987
open
05/20/2022
null
604
321
resolved
07/01/2022
07/25/2022
705
654
resolved
08/10/2022
08/28/2022
806
321
resolved
09/05/2022
10/02/2022
Example Output Table:
project_id
avg_duration
987
24.50
654
18.50
321
19.00
Task: Write a query to calculate the average duration in days it takes to resolve issues for each project.
Answer:
In this query, we first filter issues by their resolution status, excluding those that aren’t resolved. The subtraction of reported date from resolved date calculates the duration it took to resolve each issue. The results are then grouped by project_id, and the AVG function computes the average duration for each project. The final output provides insight into the average resolution time for issues in different projects. Adjust the column names and specifics based on the structure of your actual dataset.
7. Scenario: Analyzing Monthly Sales per Product
In your role as a data analyst at Atlassian, you’re tasked with determining the average monthly sales per product to gain insights into product performance. This involves grouping the data by both month and product and calculating the average sales using the AVG function.
sales Sample Input:
sale_id
product_id
sale_date
unit_price
quantity
101
Jira
08/05/2022 12:00:00
120
4
102
Confluence
08/10/2022 14:30:00
180
3
103
Bitbucket
08/18/2022 11:45:00
40
8
104
Jira
09/12/2022 10:00:00
110
6
105
Bitbucket
09/22/2022 09:15:00
45
5
Example Output Table:
month
product
avg_sales
8
Jira
480
8
Confluence
540
8
Bitbucket
320
9
Jira
660
9
Bitbucket
225
Task: Write a query to calculate the average monthly sales per product.
This query first extracts the month from each sale date using the EXTRACT function. It then groups the data by month and product_id and calculates the average sales for each group using the AVG function. The ORDER BY clause ensures the results are ordered first by month and then by product for better readability. Adjust column names and specifics based on the structure of your actual dataset.
8. Scenario: Distinctive Features of Cross Join and Natural Join
In the realm of SQL JOIN operations, both cross join and natural join serve unique purposes. A cross join, also known as a cartesian join, combines every row from one table with every row from another, creating a new table. Conversely, a natural join merges rows from two or more tables based on shared columns, reflecting a natural relationship.
Example of Cross Join: Suppose you’re part of the Marketing Analytics team at <company_name>, tasked with correlating advertising copy and creative images. A cross join between the ad_copy and ad_creative tables could look like this:
This query generates combinations of each ad copy with every ad creative, producing a comprehensive set of possibilities.
Example of Natural Join: Consider sales data exported from Atlassian’s Salesforce CRM stored in a data warehouse with tables for sales and atlassian_customers. An INNER JOIN, a type of natural join, could link these tables based on the common customer_id field:
This query retrieves matching rows from both tables, using the shared customer_id as the linkage. Notably, natural joins necessitate common columns for merging.
Key Differences:
Common Columns Requirement:
Cross Join: Does not require any common columns between the tables being joined.
Natural Join: Requires shared columns for merging.
Resultant Table Size:
Cross Join: Can create large tables, especially if the input tables have a substantial number of rows.
Natural Join: Returns a table with rows equal to the number of matching rows in the input tables.
These differences underscore the distinct characteristics and use cases of cross joins and natural joins in SQL. Adjustments to the examples can be made based on your actual dataset structure and requirements.
9. Scenario: Customer and Product Analysis
In the realm of customer and product analysis, you have two tables – customers and products. The customers table contains customer_id, region, and signup_time, while the products table holds product_id, customer_id, and purchase_time. Your task is to craft a SQL query to identify the count of customers from each region who made their first purchase within one month of signing up.
customers Example Input:
customer_id
region
signup_time
101
APAC
2022-01-03 00:00:00
102
AMER
2022-02-08 00:00:00
103
EMEA
2022-03-12 00:00:00
104
AMER
2022-04-18 00:00:00
105
APAC
2022-05-25 00:00:00
products Example Input:
product_id
customer_id
purchase_time
1001
101
2022-01-25 00:00:00
2002
102
2022-03-10 00:00:00
3003
103
2022-03-20 00:00:00
4004
104
2022-04-05 00:00:00
5005
102
2022-04-10 00:00:00
Task: Compose a SQL query to determine the count of customers from each region who made their first purchase within one month of signing up.
Answer:
This query combines the customers and products tables based on customer_id, filtering rows where the purchase_time falls within one month of the signup_time. The data is then grouped by region, and the count of distinct customer_ids is calculated for each region. This provides insights into the number of customers from each region who made their initial purchase within the first month of signing up. Adjust the column names and specifics based on your actual dataset structure.
Behavioral Questions:
Describe a significant challenge you’ve faced in your life and discuss how you successfully navigated through it.
In what ways do you believe your personal values resonate with the culture at Atlassian?
Share an instance when you successfully managed a workplace conflict.
Describe a situation in which you anticipated potentially missing a deadline. How did you handle and resolve the challenge?
The Data Monk services
We are well known for our interview books and have 70+ e-book across Amazon and The Data Monk e-shop page . Following are best-seller combo packs and services that we are providing as of now
YouTube channel covering all the interview-related important topics in SQL, Python, MS Excel, Machine Learning Algorithm, Statistics, and Direct Interview Questions Link –The Data Monk Youtube Channel
Website – ~2000 completed solved Interview questions in SQL, Python, ML, and Case Study Link –The Data Monk website
E-book shop – We have 70+ e-books available on our website and 3 bundles covering 2000+ solved interview questions. Do check it out Link –The Data E-shop Page
Instagram Page – It covers only Most asked Questions and concepts (100+ posts). We have 100+ most asked interview topics explained in simple terms Link –The Data Monk Instagram page
Established in 2007, Flipkart, an e-commerce company, initiated its sales journey by offering online books. Over the years, it has expanded its product range to encompass over 100 million items across 120+ categories. With a user base exceeding 120,000 registered users, Flipkart achieved a significant milestone in 2016 by becoming the first Indian app to surpass 50 million downloads.
Skills Required for Data Analyst Role
Technical Skills:
Proficiency in Python, R, or any programming language with a strong foundation in Data Structures and Algorithms.
Acquired analytical skills and a willingness to learn Data Analytics.
Familiarity with the Software Development Life Cycle (SDLC) and a solid understanding of Structures and Database Management Systems (DBMS).
Expertise in working with Excel, including proficiency in Macros.
In-depth knowledge of Power Bi and QlikView.
Familiarity with Data Warehousing & Mining.
Hands-on experience with Microsoft Word and Microsoft PowerPoint.
Expertise in data modeling using Entity Relationship (ER).
Behavioral Skills:
A collaborative team player.
Strong communication skills with fluency in written and verbal English.
Logical thinking ability.
Capable of handling multiple tasks simultaneously.
Strong verbal and presentation skills.
Interview Process
Interviews at Flipkart may differ based on the role and team; however, Data Analyst interviews generally follow a standardized process covering specific question topics.
Aptitude Test Round – Basic logical and quantitative aptitude test
Technical Round – Questions will be asked from SQL, Python, Excel questions, power BI, also few will be asked related to regression, hypothesis.
Technical Round – DI, Probability, Statistics, Time & Work
Case Study Round – Case study based on your experience.
Hiring Manager Round – Personal & Cultural Fit
Questions Asked
Aptitude Test Round Questions-
The length of a rectangle is increased by 25% and its breadth is decreased by 20%. What is the effect on its area?
Three men start together to travel the same way around a circular track of 11 kilometres in circumference. Their speeds are 4, 5 and 8 kilometres per hour respectively. When will they meet at a starting point?
An order was placed for the supply of a carpet whose breadth was 6 m and length was 1.44 times the breadth. What be the cost of a carpet whose length and breadth are 40% more and 25% more respectively than the first carpet. Given that the ratio of carpet is Rs. 45 per sq m?
Technical Round Questions-
A can do a work in 15 days and B in 20 days. If they work on it together for 4 days, then the fraction of the work that is left is?
Data Interpretation – A bar graph was given and we were asked to solve some question by looking into bar graph
What is the ratio of the total sales of branch B2 for both years to the total sales of branch B4 for both years?
Total sales of branch B6 for both the years is what percent of the total sales of branches B3 for both the years?
Project – Brief discussion about the project with some basic question related to project.
What is Excel Array Formula?
The different classes of relations created by the technique for preventing modification anomalies are called?
When the values in one or more attributes being used as a foreign key must exist in another set of one or more attributes in another table, we have created a?
2 ways to extract unique values in excel
How to concatenate two string.
How joins the two tables.
what are the types of SQL command.
What is pivot table in SQL.
what is variable in python
Difference between list and tuple.
Hiring Manager RoundQuestions-
Can you describe a time when you needed to balance multiple projects?
what are your salary expectations?
What are your projects related to job role you have applied?
Why do you want to join Flipkart?
The Data Monk services
We are well known for our interview books and have 70+ e-book across Amazon and The Data Monk e-shop page . Following are best-seller combo packs and services that we are providing as of now
YouTube channel covering all the interview-related important topics in SQL, Python, MS Excel, Machine Learning Algorithm, Statistics, and Direct Interview Questions Link –The Data Monk Youtube Channel
Website – ~2000 completed solved Interview questions in SQL, Python, ML, and Case Study Link –The Data Monk website
E-book shop – We have 70+ e-books available on our website and 3 bundles covering 2000+ solved interview questions. Do check it out Link –The Data E-shop Page
Instagram Page – It covers only Most asked Questions and concepts (100+ posts). We have 100+ most asked interview topics explained in simple terms Link –The Data Monk Instagram page
India’s leading e-commerce company, Myntra, is dedicated to ensuring the accessibility of fashion and lifestyle products to everyone. Solutions are created by us that disrupt the ordinary and contribute to making the world a happier and more fashionable place.
A company that is consistently evolving into newer and better forms, seeks individuals who are prepared to evolve with it. From its origins as a customization company in 2007 to being at the forefront of technology and fashion today, Myntra is going places, and it is encouraged for individuals to join this journey.
Skills Required
Apart from qualifications, the essential skills required for a Data Analyst position at Myntra include:
Mastery of database fundamentals with a demonstrated ability to translate diverse business requirements into effective SQL queries.
Proficient skills in Excel and PowerBI at an advanced level.
Valued hands-on experience in R, Python, Tableau, Qlikview, and Data Studio, especially in roles related to customer growth or customer analytics.
Demonstrated adaptability and the ability to excel in a dynamic and fast-paced work environment.
A collaborative team player who is comfortable engaging with individuals from various professional backgrounds.
Interview Process
The Myntra interview process comprises the following stages:
Application and Resume Screening: Applicants submit their online applications, and HR or recruiters review them to confirm qualifications and experience.
Technical Assessment: Candidates undergo a technical assessment, which includes exercises in data analysis and SQL to assess their technical proficiency.
Technical Interviews: Shortlisted candidates participate in technical interviews, where their experience, problem-solving skills, and proficiency in tools such as Excel, PowerBI, R, and Python are evaluated by experienced data professionals.
Case Study/Scenario-Based Interviews: Some candidates are presented with a real-world data analysis problem or scenario. They are then asked to articulate their approach and methodology for solving it during the interview.
Final Round Interviews: In certain instances, there may be a final round of interviews with senior team members or management to assess a candidate’s strategic thinking and alignment with the company’s goals.
Questions Asked
Create the pivot table, sort the data in ascending order.
Use lookup with the product based given data and find the needed data.
Write SQL queries to perform operations such as joining, filtering, and aggregating data from multiple tables?
Describe your approach to utilizing data analysis for resolving business challenges and how you communicate your findings through the use of data visualization tools.
Which statistical methods and tools do you use in your data analysis practices?
Case Study Question: How many cars are sold in your city in a month?
The Data Monk services
We are well known for our interview books and have 70+ e-book across Amazon and The Data Monk e-shop page . Following are best-seller combo packs and services that we are providing as of now
YouTube channel covering all the interview-related important topics in SQL, Python, MS Excel, Machine Learning Algorithm, Statistics, and Direct Interview Questions Link –The Data Monk Youtube Channel
Website – ~2000 completed solved Interview questions in SQL, Python, ML, and Case Study Link –The Data Monk website
E-book shop – We have 70+ e-books available on our website and 3 bundles covering 2000+ solved interview questions. Do check it out Link –The Data E-shop Page
Instagram Page – It covers only Most asked Questions and concepts (100+ posts). We have 100+ most asked interview topics explained in simple terms Link –The Data Monk Instagram page
Walmart stands as one of the world’s leading discount department store chains, boasting a global presence with thousands of stores that provide a diverse array of products at budget-friendly prices. The company offers competitive salaries, attractive incentives like stock options and 401(k) matching, and the opportunity to tackle intriguing business challenges. With Walmart’s strategic emphasis on boosting online sales while maintaining its commitment to affordable pricing, the demand for data analysts has surged. These professionals play a crucial role in optimizing pricing strategies, enhancing operations and supply chain efficiency, establishing robust data architecture, and monitoring key success metrics. In this comprehensive interview guide, we will navigate you through the Walmart data analyst interview process, explore important questions, and provide valuable tips to help you secure your ideal position with the retail giant.
Nature of Questions Asked in Walmart Data Analyst Interviews
Walmart Data Analyst interviews are tailored to assess a combination of problem-solving abilities, critical thinking skills, and proficiency in essential technologies such as SQL and reporting tools. Familiarity with machine learning, statistics, and coding in languages like Python or R is essential, and experience with big data technologies is considered advantageous.
It’s crucial to align your preparation with the specific role you’re applying for, whether it’s related to product analysis, risk assessment, or staff analytics. The advertised position may require expertise in building data architecture, analyzing user behavior, or managing information security. For example, if the role is within the transportation analytics team, understanding business operations and solving supply chain case study problems should be part of your interview preparation.
A valuable tip is to thoroughly read the job description, gaining insights into your daily responsibilities, the tools you’ll be using, and the specific business challenges the team aims to address. This understanding will guide your interview strategy effectively. Additionally, Walmart provides a helpful guide on their careers page to assist candidates in excelling during the interview process.
Data Analyst Interview Process
The Walmart Data Analyst interview process is structured to assess candidates’ technical proficiency, critical thinking skills, and alignment with the company culture. The key stages include:
Preliminary Screening: Initiated by a recruiter, this step aims to understand the candidate’s background and potential fit for the role. It’s an opportunity for candidates to inquire about the position and strategically highlight their skills.
Technical Interviews: Following the screening, candidates undergo technical rounds via phone or video calls. Questions may cover SQL, Excel, Tableau, and include behavioral and case study inquiries. The focus is on evaluating both technical competence and problem-solving abilities.
Onsite Interview: Successful candidates from the technical interviews proceed to onsite interviews, typically with a panel from the intended team. This stage combines technical and behavioral questions, allowing the team to assess the candidate’s suitability for the specific role.
It’s important to note that while the overall interview process follows this general format, the questions asked are tailored to the specific role and team. The list of popular analyst questions provided below is derived from actual Walmart interviews and similar roles and companies. For additional preparation, candidates can explore a comprehensive collection of interview questions.
Behavioral Questions
During Walmart interviews, expect to encounter several behavioral questions designed to evaluate your soft skills, gauge your future performance, and assess your ability to collaborate and adapt to dynamic situations.
What draws you to our organization and why do you want to work with us?
Share an instance where you went above and beyond expectations in a project.
Describe your approach to resolving conflicts within a team.
How do you manage and prioritize multiple deadlines effectively?
SQL Interview Questions
SQL proficiency is a crucial requirement for the Walmart data analyst role, so thorough preparation for these questions is essential.
Create a SQL query to fetch the latest transaction for each day from a bank transactions table, which includes columns such as id, transaction_value, and created_at representing the date and time for each transaction. Ensure the output contains the ID of the transaction, the transaction datetime, and the transaction amount, with transactions ordered by datetime.
Develop a SQL query to assess user ordering patterns between their primary address and other addresses. Provide a solution based on tables containing transaction and user data.
As the accountant for a local grocery store, you’re assigned the responsibility of determining the cumulative sales amount for each product since its last restocking. Utilizing three tables – products, sales, and restocking – where products provide information about each item, sales document sales transactions, and restocking tracks restocking events, compose a SQL query to fetch the running total of sales for each product since its most recent restocking event.
Formulate a SQL query to pinpoint customers who conducted more than three transactions in both the years 2019 and 2020. Emphasize the logical condition: Customer transactions > 3 in 2019 AND Customer transactions > 3 in 2020.
Write a SQL query to retrieve neighborhoods with zero users based on two provided tables: one containing user demographic information, including the neighborhood they reside in, and another dedicated to neighborhoods. The goal is to identify and return all neighborhoods that currently have no users.
Coding Questions
Explain the implementation of k-Means clustering using Python
Provide a comprehensive guide on constructing a logistic regression model in Python.
Describe the process of reconstructing a user’s flight journey.
Create a function to extract high-value transactions from two provided dataframes: transactions and products. The transactions dataframe includes transaction IDs, product IDs, and the total amount of each product sold, while the product dataframe contains product IDs and corresponding prices. The objective is to generate a new dataframe containing transactions with a total value surpassing $100, and to include the calculated total value as a new column in the resulting dataframe.
Describe the approach to identify the longest substring within a given string that exhibits maximal length.
Case Study Interview Questions
Outline the process for forecasting revenue for the upcoming year.
Describe the steps you would take to address the issue of underpricing for a product on an e-commerce site.
Which key performance indicators (KPIs) would you monitor in a direct-to-consumer (D2C) e-commerce company?
Outline the process of architecting end-to-end infrastructure for an e-commerce company.
What approach would you take to identify the most profitable products for a Black Friday sale, optimizing for maximum profit?
Statistics and Probability Interview Questions
Walmart data analysts frequently engage in quantitative tasks such as statistical modeling, sampling, and extensive analysis of datasets, charts, and model metrics. Possessing robust quantitative skills, especially in statistics and probability, is crucial for excelling in these responsibilities.
Walmart aims to assess customer satisfaction with a recently introduced in-store service. Outline your approach to crafting a survey that ensures a representative sample of customers. Additionally, explain the choice of sampling techniques and their rationale.
What is the drawback of the R-squared (R^2) method when analyzing the fit of a model that aims to establish a relationship between two variables. Discuss the limitations of the R-squared metric, situations in which it is appropriate, and propose alternative strategies. Support your response with examples.
Walmart is interested in examining whether there is a substantial disparity in customer spending between weekdays and weekends. Describe the statistical test you would employ for this analysis and elucidate your approach to interpreting the outcomes.
Outline strategies to minimize the margin of error in a study with an initial sample size of n, where the current margin of error is 3. If the goal is to reduce the margin of error to 0.3, discuss the additional samples required for this reduction. Emphasize the importance of seeking clarifications about the business context and explicitly state any assumptions made, as deviations can impact the margin of error.
Elaborate on the distinctions between a normal distribution and a binomial distribution. Offer instances where each distribution is relevant within a retail context, illustrating their applicability.
The Data Monk services
We are well known for our interview books and have 70+ e-book across Amazon and The Data Monk e-shop page . Following are best-seller combo packs and services that we are providing as of now
YouTube channel covering all the interview-related important topics in SQL, Python, MS Excel, Machine Learning Algorithm, Statistics, and Direct Interview Questions Link –The Data Monk Youtube Channel
Website – ~2000 completed solved Interview questions in SQL, Python, ML, and Case Study Link –The Data Monk website
E-book shop – We have 70+ e-books available on our website and 3 bundles covering 2000+ solved interview questions. Do check it out Link –The Data E-shop Page
Instagram Page – It covers only Most asked Questions and concepts (100+ posts). We have 100+ most asked interview topics explained in simple terms Link –The Data Monk Instagram page
Topic – SQL Questions for Business Analyst Number of questions – 10 Topics – Basic and Tricky Probability of getting asked in an interview – 8/10
Let’s start with the SQL Questions for Business Analyst
Though we have mentioned that the probability of these questions coming in your interview is 8/10 but we are being modest at that part 😛 There is a literally high probability of facing these concepts in your interviews. SQL is a funny language, there is a great amount of restriction as far as concepts and function are concerned. You only have a handful of functions like LEAD(), LAG(), RANK(),ROW_NUMBER,COUNT(), etc. But that is also the beauty of this language. You can convert a complex-looking 200 lines of R/Python code into 50 lines of SQL code with better performance(okay !! far better )
Coming to the concepts – We will be dealing with 10 questions in each blog. If the concept is hard or complex to understand, then we will be having some video also. Every link will be provided in the blogs
Let’s start SQL interview question (1-10)
1) Suppose there are two tables, X and Y, X has just one column A and Y has B.
These are the two tables
X(A)
Y(B)
1
1
2
2
3
3
4
5
SQL Interview Questions
Questions – How many rows will be populated if you do -X left join Y -X inner join Y -X cross join Y -X right join Y
The answer is 3,3,3 and 5 Remember – Join will only take place when the values match
First take the average and name it as A, then cross join this value with the complete table ABC and take the difference
select y.id,(y.salary-a) as diff from (select avg(salary) as a from abc) x cross join abc y ;
3. What is RDBMS? How is it different from DBMS?
A relational database management system (RDBMS) is a set of applications and features that allow IT professionals and others to develop, edit, administer, and interact with relational databases. Most commercial relational database management systems use Structured Query Language (SQL) to access the database, which is stored in the form of tables. The RDBMS is the most widely used database system in businesses all over the world. It offers a stable means of storing and retrieving massive amounts of data.
Databases, in general, hold collections of data that may be accessed and used in other applications. The development, administration, and use of database platforms are all supported by a database management system.
A relational database management system (RDBMS) is a type of database management system (DBMS) that stores data in a row-based table structure that links related data components. An RDBMS contains functions that ensure the data’s security, accuracy, integrity, and consistency. This is not the same as the file storage utilized by a database management system.
The following are some further distinctions between database management systems and relational database management systems:
4. What is a Self-Join? A self-join is a type of join that can be used to connect two tables. As a result, it is a unary relationship. Each row of the table is attached to itself and all other rows of the same table in a self-join. As a result, a self-join is mostly used to combine and compare rows from the same database table.
5. What are the differences between OLTP and OLAP? OLTP stands for online transaction processing, whereas OLAP stands for online analytical processing. OLTP is an online database modification system, whereas OLAP is an online database query response system.
6. What is the difference between the RANK() and DENSE_RANK() functions? The RANK() function in the result set defines the rank of each row within your ordered partition. If both rows have the same rank, the next number in the ranking will be the previous rank plus a number of duplicates. If we have three records at rank 4, for example, the next level indicated is 7.
The DENSE_RANK() function assigns a distinct rank to each row within a partition based on the provided column value, with no gaps. It always indicates a ranking in order of precedence. This function will assign the same rank to the two rows if they have the same rank, with the next rank being the next consecutive number. If we have three records at rank 4, for example, the next level indicated is 5.
7. NoSQL vs SQL
In summary, the following are the five major distinctions between SQL and NoSQL:
Relational databases are SQL, while non-relational databases are NoSQL.
SQL databases have a specified schema and employ structured query language. For unstructured data, NoSQL databases use dynamic schemas.
SQL databases scale vertically, but NoSQL databases scale horizontally.
NoSQL databases are document, key-value, graph, or wide-column stores, whereas SQL databases are table-based.
SQL databases excel in multi-row transactions, while NoSQL excels at unstructured data such as documents and JSON.
8. How to remove duplicate rows in SQL?
If the SQL table has duplicate rows, the duplicate rows must be removed.
Let’s assume the following table as our dataset:
ID
Name
Age
1
A
21
2
B
23
2
B
23
4
D
22
5
E
25
6
G
26
5
E
25
The following SQL query removes the duplicate ids from the table:
DELETE FROM table WHERE ID IN ( SELECT ID, COUNT(ID) FROM table GROUP BY ID HAVING COUNT (ID) > 1);
9. What are Constraints? Constraints in SQL are used to specify the limit on the data type of the table. It can be specified while creating or altering the table statement. The sample of constraints are:
NOT NULL CHECK DEFAULT UNIQUE PRIMARY KEY FOREIGN KEY
10. What are Entities and Relationships? Entities: A person, place, or thing in the real world about which data can be stored in a database. Tables store data that represents one type of entity. For example – A bank database has a customer table to store customer information. The customer table stores this information as a set of attributes (columns within the table) for each customer.
Relationships: Relation or links between entities that have something to do with each other. For example – The customer name is related to the customer account number and contact information, which might be in the same table. There can also be relationships between separate tables (for example, customer to accounts). SQL Questions for Business Analyst Now, if you are confused about starting SQL as a beginner and to reach to an interview ready state then do read this blog on How to prepare for SQL in 15 days
The Data Monk services
We are well known for our interview books and have 70+ e-book across Amazon and The Data Monk e-shop page . Following are best-seller combo packs and services that we are providing as of now
YouTube channel covering all the interview-related important topics in SQL, Python, MS Excel, Machine Learning Algorithm, Statistics, and Direct Interview Questions Link –The Data Monk Youtube Channel
Website – ~2000 completed solved Interview questions in SQL, Python, ML, and Case Study Link –The Data Monk website
E-book shop – We have 70+ e-books available on our website and 3 bundles covering 2000+ solved interview questions. Do check it out Link –The Data E-shop Page
Instagram Page – It covers only Most asked Questions and concepts (100+ posts). We have 100+ most asked interview topics explained in simple terms Link –The Data Monk Instagram page
In this article, we will deal with ‘almost’ all types of SQL questions asked in any Analytics interview. These questions are mostly asked from candidates in the experience range of 0 to 6 years.
SQL questions are asked mostly in the 1st and 2nd rounds of your interviews and good feedback in it will definitely help you sail through the screening round.
How to use this tutorial?
There will be 4 sections to this article:-
1. General introduction on the types of questions asked
2. A few solved examples which you should first try and then gauge through the answer
3. Link to a few youtube videos which are created only on the most asked concepts in SQL
4. A few unsolved questions which you should attempt after going through 1,2 and 3
5. Optional – There will be a few books listed on our website with SQL most asked interview questions. If you think you need to master the concepts even further, then please go through it
What is the expectation of the SQL interviewer?
In a SQL interview, there are three types of questions asked:-
1. Simple definition like, what is the primary key, what is 1NF, 2NF, BCNF, etc.
Thumb rule – Don’t worry a lot about these questions, if you give a reading of any website on ‘SQL most asked questions’ then you will get answers to these questions. These articles are flooded with subjective questions. Do give it a read. We, at The Data Monk concentrate only on tricky and advance interview questions and we would not like to go that road. Having said that, it’s definitely an added advantage if you answer these questions. So, just go through it. Some sample questions will be given below
Weightage in interview -> 10-15%
2. Write query type questions
You will be recited or given a group of tables and you are supposed to write a query to find a given output. A very simple example would be to
‘Write a query to extract the 2nd most ordered item from the hotel with the least revenue’
Well, these are simple questions, all you need to do is to play around ranking, group by, order by, etc.
Weightage in interview -> 40-50%
In these types of questions, you are expected to provide the correct answer either on a server or you are supposed to write the complete code on a shared document/notepad.
We will surely look into these questions in the below section
3. Solving a real project problem
Every interviewer has solved a handful of complicated problems using SQL. These projects or queries are written in a span of a month or so, the interviewer will very wisely pick up a snippet or subset of the problem and would like to have a discussion on the approach to solve the problem.
Here, you are not expected to come up with a solution, but you need to think aloud with your logic. Most probably there will be multiple levels of the problem and you need to solve it in a methodological way.
Example:-
I am working for a multiplex and I need to create a logic to find out where to allot a seat to a group of people. The solution needs to be dynamic. Suppose, there are 4 people who came to buy the ticket, then how you code will provide options to that person.
I started an application and overnight it got 100000 clicks. Now each day the data is growing and we are processing all the data at once. How will you design the database infrastructure? (These questions are mostly asked to 3+ years, experienced candidates)
4. Creating or designing a complete dashboard pipeline
SQL is not the only thing that is practiced in the analytics domain. Analysts are required to create complete pipeline that should help create a working dashboard/ or report.
There are multiple tools to create pipeline like Airflow, Oozie, etc. SQL is the backbone of all these pipelines, you create codes that will refresh for every day. These SQL codes contain variable like date between ‘D-1’ and ‘D-2’ where D-1 and D-2 is for yesterday and day before yesterday, this way the code runs everyday and refreshes numbers for the last two days.
Now in the interview you might be asked to create a simple pipeline where there are 3 codes A,B,C and it should run in the same sequence.
Try to practice such questions (at least two)
5. Understanding how SQL works
You should know the importance of optimization, partition, table and view creation, etc. You will get a few questions from the interviewer to check if you have worked on Big Data because you need to know the concepts of partition and optimization.
For example, if you want to read the data for last one week then you need to add clause like year>= 2022 and month >= 2 (If you are in March’22) Now these columns should be included in the partition while creating the table.
You must have observed that with one question, we can check if you know the importance of scanning only relevant data and to scan this data you need to have partition in your table and that you should also know how to create a table. Boom !! The interviewer knows how much you know about SQL .
Now, if you are confused about starting SQL as a beginner and to reach to an interview ready state then do read this blog on How to prepare for SQL in 15 days
The Data Monk services
We are well known for our interview books and have 70+ e-book across Amazon and The Data Monk e-shop page . Following are best-seller combo packs and services that we are providing as of now
YouTube channel covering all the interview-related important topics in SQL, Python, MS Excel, Machine Learning Algorithm, Statistics, and Direct Interview Questions Link –The Data Monk Youtube Channel
Website – ~2000 completed solved Interview questions in SQL, Python, ML, and Case Study Link –The Data Monk website
E-book shop – We have 70+ e-books available on our website and 3 bundles covering 2000+ solved interview questions. Do check it out Link –The Data E-shop Page
Instagram Page – It covers only Most asked Questions and concepts (100+ posts). We have 100+ most asked interview topics explained in simple terms Link –The Data Monk Instagram page
Cross Join Unnest and Lateral View Explode This is the growing age of data and it’s a challenge to store the data efficiently. What would you do if you have to store phone numbers or courses in a table against every student in your class? There is also a possibility that one student can have 1/2/3/4 phone numbers.
Will you create 4/5 columns with the name Phone number 1, Ph_2,Ph_3 ? Or will you just ask for the most used phone number?
Right now in all the industries, people are storing data in arrays and maps. When you create an array type column then you can put as many values in it as you have.
Ex – Phone[12345,13443,5456]
But how to extract these values while querying the table?
We will learn this in Cross Join Unnest and Lateral View Explode
The column subject is of Array type. How will you query the above table? What if you want to know which person is reading Perl?
Select * from table where Subject = ‘Perl’ won’t work for obvious reasons !!
Depending upon what tool you use, you can explore the command. In this article, we will be taking PrestoSQL and Hive into account
Basically, we need to flatten this column Subject and we will be creating as many rows as there are Subject elements in the filed. Our final output for the above should be
Name
Emp_id
expertise
Rizwan
123
C
Rizwan
123
JAVA
Rizwan
123
SQL
Nitin
4231
Hive
Nitin
4231
Prest
Gaurab
322
Ruby
Gaurab
322
Perl
Amod
421
Python
Amod
421
R
Now you can easily query the
CROSS JOIN UNNEST
Select Name,Emp_id,expertise from Employee CROSS JOIN UNNEST(Subject) as t(expertise)
If there are multiple arrays
Select Name,Emp_id,expertise,phone_num from Employee CROSS JOIN UNNEST(Subject,Phone) as t(expertise,phone_num)
What we are doing here is that we are cross joining the table with all the value present in every row depending on the number of elements in the array
HIVE
Select Name,Emp_id,expertise from Employee LATERAL VIEW explode(Subject) myTable1 as expertise
If there are two columns to be unnested then
Select Name,Emp_id,expertise,Phone from Employee LATERAL VIEW explode(Subject) myTable1 as expertise LATERAL VIEW explode(Phone) myTable2 as Phone
When you just explode a column like
EXPLODE(Subject) then it will provide you the list of all the elements present in the column, like C,JAVA,SQL,etc.
We are well known for our interview books and have 70+ e-book across Amazon and The Data Monk e-shop page . Following are best-seller combo packs and services that we are providing as of now
YouTube channel covering all the interview-related important topics in SQL, Python, MS Excel, Machine Learning Algorithm, Statistics, and Direct Interview Questions Link –The Data Monk Youtube Channel
Website – ~2000 completed solved Interview questions in SQL, Python, ML, and Case Study Link –The Data Monk website
E-book shop – We have 70+ e-books available on our website and 3 bundles covering 2000+ solved interview questions. Do check it out Link –The Data E-shop Page
Instagram Page – It covers only Most asked Questions and concepts (100+ posts). We have 100+ most asked interview topics explained in simple terms Link –The Data Monk Instagram page
We are well known for our interview books and have 70+ e-book across Amazon and The Data Monk e-shop page . Following are best-seller combo packs and services that we are providing as of now
YouTube channel covering all the interview-related important topics in SQL, Python, MS Excel, Machine Learning Algorithm, Statistics, and Direct Interview Questions Link –The Data Monk Youtube Channel
Website – ~2000 completed solved Interview questions in SQL, Python, ML, and Case Study Link –The Data Monk website
E-book shop – We have 70+ e-books available on our website and 3 bundles covering 2000+ solved interview questions. Do check it out Link –The Data E-shop Page
Instagram Page – It covers only Most asked Questions and concepts (100+ posts). We have 100+ most asked interview topics explained in simple terms Link –The Data Monk Instagram page