SQL Query Optimization Techniques
SQL Query Optimization Techniques
Query optimization is an important skill for SQL developers and database administrators (DBAs). In order to improve the performance of SQL queries, developers and DBAs need to understand the query optimizer and the techniques it uses to select an access path and prepare a query execution plan. Query tuning involves knowledge of techniques such as cost-based and heuristic-based optimizers, plus the tools an SQL platform provides for explaining a query execution plan. The best way to tune performance is to try to write your queries in a number of different ways and compare their reads and execution plans. In this paper, I proposed various techniques that you can use to try to optimize your database queries.
1. Use Column Names Instead of * in a SELECT Statement – If you are selecting only a few columns from a table there is no need to use SELECT *. Though this is easier to write, it
will cost more time for the database to complete the query. By selecting only the columns you need, you are reducing the size of the result table, reducing the network traffic, and also in turn boosting the overall performance of the query.
2. Avoid including a HAVING clause in SELECT statements – The HAVING clause is used to filter the rows after all the rows are selected and it is used like a filter. It is quite useless in a SELECT statement. It works by going through the final result table of the query parsing out the rows that don’t meet the HAVING condition.
3. Eliminate Unnecessary DISTINCT Conditions – Considering the case of the following example, the DISTINCT keyword in the original query is unnecessary because the table_name contains the primary key p.ID, which is part of the result set.
Original query:
SELECT DISTINCT * FROM SH.sales s
JOIN SH.customers c
ON s.cust_id= c.cust_id
WHERE c.cust_marital_status = ‘single’;
Improved query:
SELECT * FROM SH.sales s JOIN
SH.customers c
ON s.cust_id = c.cust_id
WHERE c.cust_marital_status=’single’;
4. Un-nest sub-queries – Rewriting nested queries as joins often lead to more efficient execution and more effective optimization. In general, sub-query un-nesting is always done for correlated sub-queries with, at most, one table in the FROM clause, which is used in ANY, ALL, and EXISTS predicates. An uncorrelated sub-query, or a sub-query with more than one table in the FROM clause, is flattened if it can be decided, based on the query semantics, that the sub-query returns at most one row.
5. Consider using an IN predicate when querying an indexed column – The IN-list predicate can be exploited for indexed retrieval and also, the optimizer can sort the IN-list to match the sort sequence of the index, leading to more efficient retrieval. Note that the IN-list must contain only constants, or values that are constant during one execution of the query block, such as outer references.
6. Use EXISTS instead of DISTINCT when using table joins that involve tables having one-to-many relationships. The DISTINCT keyword works by selecting all the columns
in the table and then parsing out any duplicates. Instead, if you use subquery with the EXISTS keyword, you can avoid having to return an entire table.
Original query:
SELECT DISTINCT c.country_id, c.country_name
FROM SH.countries c,SH.customers e
WHERE e.country_id = c.country_id;
Improved query:
SELECT c.country_id, c.country_name
FROM SH.countries c
WHERE EXISTS (SELECT ‘X’ FROM SH.customers e
WHERE e.country_id = c.country_id);
7. Try to use UNION ALL in place of UNION. The UNION ALL statement is faster than UNION, because UNION ALL statement does not consider duplicates, and the UNION statement does look for duplicates in a table while selecting of rows, whether or not they exist.
8. Avoid using OR in join conditions. Any time you place an ‘OR’ in the join condition, the query
will slow down by at least a factor of two.
Original query:
SELECT *
FROM SH.costs c
INNER JOIN SH.products p ON c.unit_price =
p.prod_min_price OR c.unit_price = p.prod_list_price;
Improved query:
SELECT *
FROM SH.costs c
INNER JOIN SH.products p ON c.unit_price =
p.prod_min_price
UNION ALL
SELECT *
FROM SH.costs c
INNER JOIN SH.products p ON c.unit_price =
p.prod_list_price;
9. Avoid functions on the right-hand side of the operator – Functions or methods are used very often with their SQL queries. Rewriting the query by removing aggregate functions will increase the performance tremendously
Original query:
SELECT *
FROM SH.sales
WHERE EXTRACT (YEAR FROM TO_DATE (time_id, ‘DDMON-RR’)) = 2001 AND EXTRACT (MONTH FROM
TO_DATE (time_id, ‘DD-MON-RR’)) =12;
Improved query:
SELECT * FROM SH.sales
WHERE TRUNC (time_id) BETWEEN
TRUNC(TO_DATE(‘12/01/2001’, ’mm/dd/yyyy’))
10. Remove any redundant mathematics – There will be times when you will be performing
mathematics within an SQL statement. They can be a drag on the performance if written improperly. Each time the query finds a row it will recalculate the math. So eliminating
any unnecessary math in the statement will make it perform faster.
Now we are also available on our website where you can directly download the PDF of the topic you are interested in. On Amazon, each book costs ~299, on our website we have put it at a 60-80% discount. There are ~4000 solved interview questions prepared for you.
10 e-book bundle with 1400 interview questions spread across SQL, Python, Statistics, Case Studies, and Machine Learning Algorithms – Ideal for 0-3 years experienced candidates
23 E-book with ~2000 interview questions spread across AWS, SQL, Python, 10+ ML algorithms, MS Excel, and Case Studies – Complete Package for someone between 0 to 8 years of experience (The above 10 e-book bundle has a completely different set of e-books)
12 E-books for 12 Machine Learning algorithms with 1000+ interview questions – For those candidates who want to include any Machine Learning Algorithm in their resume and to learn/revise the important concepts. These 12 e-books are a part of the 23 e-book package
Individual 50+ e-books on separate topics
Important Resources to crack interviews (Mostly Free)
There are a few things that might be very useful for your preparation
The Data Monk Youtube channel – Here you will get only those videos that are asked in interviews with Data Analysts, Data Scientists, Machine Learning Engineers, Business Intelligence Engineers, Analytics managers, etc.
Go through the watchlist which makes you uncomfortable:-
Complete Python Playlist for Data Science
Company-wise Data Science Interview Questions – Must Watch
All important Machine Learning Algorithm with code in Python
Complete Python Numpy Playlist
Complete Python Pandas Playlist
Case Study and Guesstimates Complete Playlist
Complete Playlist of Statistics
Keep Learning !!