Register Now

Login

Lost Password

Lost your password? Please enter your email address. You will receive a link and will create a new password via email.

Login

Register Now

It will take less than 1 minute to register for lifetime. Bonus Tip - We don't send OTP to your email id Make Sure to use your own email id for free books and giveaways

Top 20 Pandas functions in Python

Top 20 Pandas functions in Python will help you understand the most used functions present in Pandas library. We have put every thing at one place. Please comment any other function which you think we should include next time.

Pandas is no doubt one of the most important library for any Analytics professional. In 8/10 cases your data will be stored in a data frame and you need a good understanding of the capability of Pandas to understand this data

Let’s go through the Top 20 Pandas functions in Python

Top 20 Pandas functions in Python

I will always advice people to create their own data set. Don’t go after already available data set on the internet.
By doing this you will get fluent in Excel.

By creating data I mean to say, use functions like RANDBETWEEN, RAND, IF ELSE, etc.
Use them extensively 🙂

So, I created my own data in Excel, which looks like this

  1. Read a file using read_csv()

    import pandas as pd
    xyz = pd.read_csv(‘/Users/Nitin/Downloads/Kaggle/THD_Inventory.csv’)
  2. Get the top and bottom data set

    xyz.head(5)
    xyz.tail(5)

  3. Get the complete information about all the columns in the table

    xyz.info()
Pandas functions

4. You can also take a look at the shape and size of the dataset. We have 100 rows and 6 columns in the original dataset

xyz.shape
xyz.size

shape and size in pandas

5. Choose n number of random sample from the dataset

xyz.sample(n=5)

6. Get all the standard mathematical analysis of each column of data set

xyz.describe()

Use xyz.describe().T will transpose the table

describe function in pandas

7. Find number of distinct values – This is an important function as it will directly tell you how many categorical variables are there in your dataset

xyz.nunique()

nunique() in Python

8. How to find if there is any variable/column with missing values in it?
Use isna().any() – This will check if there is even one null value in a column or not

xyz.isna().any()

We don’t have any NULL value

9. isnull() – This function shows all the row and column with a boolean for the data present in that cell. TRUE means null value, FALSE means Not a null value

xyz.isnull()

isnull() in Python

10. Find the number of null values in each column, this set of function tells you if you can ignore a column or not
Unfortunately we do not have any NULL value in our data set

xyz.isnull().sum()

11. Get the name of all the columns

xyz.columns

12. Get the nsmallest or nlargest values from a column

xyz.nsmallest(10,’Sales’)

nsmallest and nlargest in pandas

13. Now comes loc and iloc – There are a few interviewers who tries to check your basics with loc and iloc

loc – When you use loc function then you need to specify the name of the columns
xyz.loc[1:5, [‘Sales’,’Size’]]
This will fetch row number 1,2,3,4 for the column Sales and Size

iloc – When you use iloc function then you need to specify the index of the columns
xyz.iloc[1:5,2:4]
This will fetch row number 1 to 4 from the 2nd and 3rd column index (Remember Python is zero indexed)

14. Slice the date – It means cutting the dataset vertically or horizontally

xyz[1:5]

This will fetch row number 1,2,3,4 from all the columns

15. Group by in Pandas – Very useful pandas function

xyz[[‘Markdown’,’inventory’]].group by ([‘Markdown’]).mean()

Group by in python
Group by in Python

16. Sort the complete data frame according to one column

xyz.sort_values(by = ‘Sales’, ascending = False)

sort_values in python

17. Query in data frame

xyz.query(‘Population > 10000’)[:5]

This gets 5 complete rows with Population more than 10000

Query in python

18. Get unique values from a column

xyz[‘Column Name’].unique()

unique in Python


19. If you want. to know how many space columns are taking into your computer then use memory_usage

xyz.memory_usage()

20. Write a file to csv

xyz.to_csv(‘File Name.csv’)

This will write the data frame xyz to the given name. Don’t forget to put csv in the file name extension.


Please comment more usages of Pandas function in Python

Daily Quiz Repository

Daily Quiz Day 1 Questions
Daily Quiz Day 2 Questions
Daily Quiz Day 3 Questions
Daily Quiz Day 4 Questions
Daily Quiz Day 5 Questions

We are pleased to inform that we have launched our Live Training session for anyone who wish to learn about Analytics domain. It was invite based for the last 3 batches. Now we are open to all.
Check all the details here – The Data Monk Super 10 and Super 20 Live Classes
There are some good interview questions on Guru.com

For any help, issues, resume overview, buying books, reviewing courses, etc. You can email us at nitinkamal132@gmail.com or contact@thedatamonk.com

Case Study for Data Science Interview



About TheDataMonkGrand Master

I am the Co-Founder of The Data Monk. I have a total of 6+ years of analytics experience 3+ years at Mu Sigma 2 years at OYO 1 year and counting at The Data Monk I am an active trader and a logically sarcastic idiot :)

Follow Me

Leave a reply