Introduction to SQL
Structured Query Language is the base of your Data Science career. You will always be surrounded with SQL queries and it’s very important to understand the basics of the language.
We will not waste time on defining terminologies associated with the language. If you want to start from the very basic then do go through www.tutorialspoint.com.
We will start with the most asked question in a Data Science interview.
“Okay Aman, tell me the sequence of SQL command execution”
And Aman starts remembering what he has by-hearted last night 😛
Let’s make this easy to understand.
Where does your data comes from ?
How many tables? Any number, depending on the number of joins.
So, first of all the query will execute the “FROM” and all the “JOIN” tables, because you need to extract some data to process it, right?
So, the first command to run is FROM and JOIN.
Now, i have the mammoth looking data dump(10,000 rows), what next?
I will apply a WHERE condition, remember, Where condition cuts down the number of “rows” from the table.
The second command to get executed is “WHERE”.
Now i have all the rows which i need, Group by simply groups the data on one or more field. Cool?
I have 50,000 rows of 50 students. I apply a group by and the number of rows is reduced to 1000.
GROUP BY is the next command to be executed after WHERE clause.
FROM,JOIN -> WHERE -> GROUP BY
What else do we have?
Ummm, We have HAVING, LIMIT,SELECT,ORDER BY
We all know that HAVING is always used along with Group by to apply condition on aggregated value. If you are unsure about the difference between WHERE and HAVING command then, God bless you 😛
Well WHERE is applied on rows and HAVING is applied on aggregate value like – HAVING sum(Marks) > 1000
So, HAVING further applies condition on your group by. Thus HAVING follows Group by
FROM,JOIN -> WHERE -> GROUP BY -> HAVING
Now we have SELECT,LIMIT and ORDER BY
We have filtered out everything, now is the time to SELECT the columns which you need.
You have the selected columns, which you will order and then apply Limit to it.
There is no point ordering 10 columns and then selecting 4 ordered columns, right?
You first limit the columns, then apply the order and then if there is a need to apply a limit to the fetched record, apply a LIMIT
Thus, the final order is
Read it twice if you are unable to get the concept. You should be able to recite the whole story even at 3 in the morning 😛
2. What is fetch and offset?
This might look fancy but the dictionary meaning will do the trick. Offset is like, “how many rows you want to chuck” and fetch is like “how many rows do you want to include”
So If my table has data of top 100 students.
OFFSET 10 FETCH 5 will get you the 11th to 15th rank student.
Where will these two clause fall in the execution ?
FETCH and OFFSET are parallel to the LIMIT command
3. What is the difference between ‘HAVING’ CLAUSE and a ‘WHERE’ CLAUSE?
HAVING clause can be used only with SELECT statement.
It is usually used in a GROUP BY clause and whenever GROUP BY is not used, HAVING behaves like a WHERE clause.
Having Clause is only used with the GROUP BY function in a query
whereas WHERE Clause is applied to each row before they are a part of the GROUP BY function in a query.
4. What is self join?
– The SQL SELF JOIN is used to join a table to itself as if the table were two tables, temporarily renaming at least one table in the SQL statement.
SELECT a.column_name, b.column_name…
FROM table1 a, table1 b
WHERE a.common_field = b.common_field;
SELECT – select records from a table
INSERT – insert new records
UPDATE – update/Modify existing records
DELETE – delete existing records
CREATE – create a new Table, database, schema
ALTER – alter existing table, column description
DROP – delete existing objects from database
GRANT – allows users to read/write on certain database objects
REVOKE – keeps users from read/write permission on database objects
6. A field with a NULL value is a field with no value. It is very important to understand that a NULL value is different than a zero value or a field that contains spaces.
Yahan light nai le lena interview main
7. Like clause is an important concept in SQL. Whenever you want to apply a like clause in a dataset with varchar datatype then do cast everything in either lower or upper case.
lower(name) like ‘nitin kamal’
is always preferred over
name like ‘Nitin Kamal’
Your code should be able to minimize margin of errors.
You should definitely try SQLZOO to practice SQL queries. The link is given below:-
What ?? Go and Google it daa 😛
Keep Learning 🙂
The Data Monk