Top 20 Pandas functions in Python
Top 20 Pandas functions in Python will help you understand the most used functions present in Pandas library. We have put every thing at one place. Please comment any other function which you think we should include next time.
Pandas is no doubt one of the most important library for any Analytics professional. In 8/10 cases your data will be stored in a data frame and you need a good understanding of the capability of Pandas to understand this data
Let’s go through the Top 20 Pandas functions in Python
I will always advice people to create their own data set. Don’t go after already available data set on the internet.
By doing this you will get fluent in Excel.
By creating data I mean to say, use functions like RANDBETWEEN, RAND, IF ELSE, etc.
Use them extensively 🙂
So, I created my own data in Excel, which looks like this
- Read a file using read_csv()
import pandas as pd
xyz = pd.read_csv(‘/Users/Nitin/Downloads/Kaggle/THD_Inventory.csv’)
- Get the top and bottom data set
- Get the complete information about all the columns in the table
4. You can also take a look at the shape and size of the dataset. We have 100 rows and 6 columns in the original dataset
5. Choose n number of random sample from the dataset
6. Get all the standard mathematical analysis of each column of data set
Use xyz.describe().T will transpose the table
7. Find number of distinct values – This is an important function as it will directly tell you how many categorical variables are there in your dataset
8. How to find if there is any variable/column with missing values in it?
Use isna().any() – This will check if there is even one null value in a column or not
9. isnull() – This function shows all the row and column with a boolean for the data present in that cell. TRUE means null value, FALSE means Not a null value
10. Find the number of null values in each column, this set of function tells you if you can ignore a column or not
Unfortunately we do not have any NULL value in our data set
11. Get the name of all the columns
12. Get the nsmallest or nlargest values from a column
13. Now comes loc and iloc – There are a few interviewers who tries to check your basics with loc and iloc
loc – When you use loc function then you need to specify the name of the columns
This will fetch row number 1,2,3,4 for the column Sales and Size
iloc – When you use iloc function then you need to specify the index of the columns
This will fetch row number 1 to 4 from the 2nd and 3rd column index (Remember Python is zero indexed)
14. Slice the date – It means cutting the dataset vertically or horizontally
This will fetch row number 1,2,3,4 from all the columns
15. Group by in Pandas – Very useful pandas function
xyz[[‘Markdown’,’inventory’]].group by ([‘Markdown’]).mean()
16. Sort the complete data frame according to one column
xyz.sort_values(by = ‘Sales’, ascending = False)
17. Query in data frame
xyz.query(‘Population > 10000’)[:5]
This gets 5 complete rows with Population more than 10000
18. Get unique values from a column
19. If you want. to know how many space columns are taking into your computer then use memory_usage
20. Write a file to csv
This will write the data frame xyz to the given name. Don’t forget to put csv in the file name extension.
Please comment more usages of Pandas function in Python
We are pleased to inform that we have launched our Live Training session for anyone who wish to learn about Analytics domain. It was invite based for the last 3 batches. Now we are open to all.
Check all the details here – The Data Monk Super 10 and Super 20 Live Classes
There are some good interview questions on Guru.com
For any help, issues, resume overview, buying books, reviewing courses, etc. You can email us at firstname.lastname@example.org or email@example.com