Order of Execution of SQL commands is among the most asked questions in Data Science or Data Analytics interview. You need to understand how the command flows in order to write optimized queries.
Why do we need to optimize our queries?
When you are dealing with a dummy dataset then you don’t actually care about optimizing your queries, but once you start hitting real-time data with millions of rows then each query you trigger costs some dollar to your company. The more optimized your query is, the lesser is its cost to the company.
To start optimizing your queries, you need to understand the flow of execution. And this is why it’s one of the most asked question in any Analytics interview (between 0 to 5 years of experience)
Order of Execution of SQL commands
Each SQL query starts with finding the data first and then moves to filter it based on conditions specified. Below is the order of execution of SQL commands
1. From and Joins : since these two forms the basis of the query
2. Where : Filters out the rows
3. Group By : Grouping values based on the column specified in the Group By clause
4. Having : Filters out the grouped rows
5. Select
6. Distinct : Rows with duplicate values in the column marked as Distinct are discarded
7. Order By : Rows are sorted based on Order By clause
8. Limit, Offset : Finally the limit or offset is applied
Query order of execution
1. FROM and JOINs
The FROM clause, and subsequent JOINs are first executed to determine the total working set of data that is being queried. This includes subqueries in this clause, and can cause temporary tables to be created under the hood containing all the columns and rows of the tables being joined.
2. WHERE
Once we have the total working set of data, the first-pass WHERE constraints are applied to the individual rows, and rows that do not satisfy the constraint are discarded. Each of the constraints can only access columns directly from the tables requested in the FROM clause. Aliases in the SELECT part of the query are not accessible in most databases since they may include expressions dependent on parts of the query that have not yet executed.
3. GROUP BY
The remaining rows after the WHERE constraints are applied are then grouped based on common values in the column specified in the GROUP BY clause. As a result of the grouping, there will only be as many rows as there are unique values in that column. Implicitly, this means that you should only need to use this when you have aggregate functions in your query.
4. HAVING
If the query has a GROUP BY clause, then the constraints in the HAVING clause are then applied to the grouped rows, discard the grouped rows that don’t satisfy the constraint. Like the WHERE clause, aliases are also not accessible from this step in most databases.
5. SELECT
Any expressions in the SELECT part of the query are finally computed.
6. DISTINCT
Of the remaining rows, rows with duplicate values in the column marked as DISTINCT will be discarded.
7. ORDER BY
If an order is specified by the ORDER BY clause, the rows are then sorted by the specified data in either ascending or descending order. Since all the expressions in the SELECT part of the query have been computed, you can reference aliases in this clause.
8. LIMIT / OFFSET
Finally, the rows that fall outside the range specified by the LIMIT and OFFSET are discarded, leaving the final set of rows to be returned from the query.
There are a few more answers to this question, you can add your approach as well on this link – https://thedatamonk.com/question/what-is-the-order-of-execution-of-sql-commands/#comments
The Data Monk Interview Books – Don’t Miss
Now we are also available on our website where you can directly download the PDF of the topic you are interested in. At Amazon, each book costs ~299, on our website we have put it at a 60-80% discount. There are ~4000 solved interview questions prepared for you.
10 e-book bundle with 1400 interview questions spread across SQL, Python, Statistics, Case Studies, and Machine Learning Algorithms – Ideal for 0-3 years experienced candidates
23 E-book with ~2000 interview questions spread across AWS, SQL, Python, 10+ ML algorithms, MS Excel, and Case Studies – Complete Package for someone between 0 to 8 years of experience (The above 10 e-book bundle has a completely different set of e-books)
12 E-books for 12 Machine Learning algorithms with 1000+ interview questions – For those candidates who want to include any Machine Learning Algorithm in their resume and to learn/revise the important concepts. These 12 e-books are a part of the 23 e-book package
Individual 50+ e-books on separate topics
Important Resources to crack interviews (Mostly Free)
There are a few things which might be very useful for your preparation
The Data Monk Youtube channel – Here you will get only those videos that are asked in interviews for Data Analysts, Data Scientists, Machine Learning Engineers, Business Intelligence Engineers, Analytics managers, etc.
Go through the watchlist which makes you uncomfortable:-
All the list of 200 videos
Complete Python Playlist for Data Science
Company-wise Data Science Interview Questions – Must Watch
All important Machine Learning Algorithm with code in Python
Complete Python Numpy Playlist
Complete Python Pandas Playlist
SQL Complete Playlist
Case Study and Guesstimates Complete Playlist
Complete Playlist of Statistics