Register Now

Login

Lost Password

Lost your password? Please enter your email address. You will receive a link and will create a new password via email.

Login

Register Now

It will take less than 1 minute to register for lifetime. Bonus Tip - We don't send OTP to your email id Make Sure to use your own email id for free books and giveaways

Top 20 Pandas functions in Python

Top 20 Pandas functions in Python will help you understand the most used functions present in Pandas library. We have put everything in one place. Please comment any other function which you think we should include next time.

Pandas is no doubt one of the most important library for any Analytics professional. In 8/10 cases your data will be stored in a data frame and you need a good understanding of the capability of Pandas to understand this data

Let’s go through the Top 20 Pandas functions in Python

Top 20 Pandas functions in Python

I will always advice people to create their own data set. Don’t go after already available data set on the internet.
By doing this you will get fluent in Excel.

By creating data I mean to say, use functions like RANDBETWEEN, RAND, IF ELSE, etc.
Use them extensively 🙂

So, I created my own data in Excel, which looks like this

  1. Read a file using read_csv()

    import pandas as pd
    xyz = pd.read_csv(‘/Users/Nitin/Downloads/Kaggle/THD_Inventory.csv’)
  2. Get the top and bottom data set

    xyz.head(5)
    xyz.tail(5)

  3. Get the complete information about all the columns in the table

    xyz.info()
Pandas functions

4. You can also take a look at the shape and size of the dataset. We have 100 rows and 6 columns in the original dataset

xyz.shape
xyz.size

shape and size in pandas

5. Choose n number of random sample from the dataset

xyz.sample(n=5)

6. Get all the standard mathematical analysis of each column of data set

xyz.describe()

Use xyz.describe().T will transpose the table

describe function in pandas

7. Find number of distinct values – This is an important function as it will directly tell you how many categorical variables are there in your dataset

xyz.nunique()

nunique() in Python

8. How to find if there is any variable/column with missing values in it?
Use isna().any() – This will check if there is even one null value in a column or not

xyz.isna().any()

We don’t have any NULL value

9. isnull() – This function shows all the row and column with a boolean for the data present in that cell. TRUE means null value, FALSE means Not a null value

xyz.isnull()

isnull() in Python

10. Find the number of null values in each column, this set of function tells you if you can ignore a column or not
Unfortunately we do not have any NULL value in our data set

xyz.isnull().sum()

11. Get the name of all the columns

xyz.columns

12. Get the nsmallest or nlargest values from a column

xyz.nsmallest(10,’Sales’)

nsmallest and nlargest in pandas

13. Now comes loc and iloc – There are a few interviewers who tries to check your basics with loc and iloc

loc – When you use loc function then you need to specify the name of the columns
xyz.loc[1:5, [‘Sales’,’Size’]]
This will fetch row number 1,2,3,4 for the column Sales and Size

iloc – When you use iloc function then you need to specify the index of the columns
xyz.iloc[1:5,2:4]
This will fetch row number 1 to 4 from the 2nd and 3rd column index (Remember Python is zero indexed)

14. Slice the date – It means cutting the dataset vertically or horizontally

xyz[1:5]

This will fetch row number 1,2,3,4 from all the columns

15. Group by in Pandas – Very useful pandas function

xyz[[‘Markdown’,’inventory’]].group by ([‘Markdown’]).mean()

Group by in python
Group by in Python

16. Sort the complete data frame according to one column

xyz.sort_values(by = ‘Sales’, ascending = False)

sort_values in python

17. Query in data frame

xyz.query(‘Population > 10000’)[:5]

This gets 5 complete rows with Population more than 10000

Query in python

18. Get unique values from a column

xyz[‘Column Name’].unique()

unique in Python


19. If you want. to know how many space columns are taking into your computer then use memory_usage

xyz.memory_usage()

20. Write a file to csv

xyz.to_csv(‘File Name.csv’)

This will write the data frame xyz to the given name. Don’t forget to put csv in the file name extension.


Please comment more usages of Pandas function in Python

The Data Monk e-books

Tired of online courses costing 2 to 8 lakh and taking more than a year to complete?
Tired of going through 500+ hours of videos at a super slow pace?
We at The Data Monk believe that you have to start and complete things as quickly as possible. We believe in a target-based study where we break a topic into 100 questions and make sure that if you cover these questions you will surely be able to crack the interview questions. Rest all theory and practical can ONLY be learned while working in an organization.


Pick any of our books from our e-shop page and complete it in 6-8 hours, learn the 100 questions and write it in your resume. We guarantee you that you will nail 8 out of 10 interviews

We also have 3 bundles at a price that is affordable to everyone. We are a group of people placed in best of the product-based companies and we take 100+ interviews per week. Do we know what is being asked and what is not? So, just grab any of the following book bundles and give not more than 30 days to LEARN all the questions. We guarantee you that you will become a very strong candidate in any analytics interview

Set A – [3rd/4th year/ and 0 to 3 years of experience]

Crack any analytics or data science interview with our 1400+ interview questions which focus on multiple domains i.e. SQL, R, Python, Machine Learning, Statistics, and Visualization. – https://thedatamonk.com/product/books-to-crack-analytics-interview/

Set B – [0-5 Years of Experience]

1200+ Interview Questions on all the important Machine Learning algorithms (including complete Python code) Ada Boost, CNN, ANN, Forecasting (ARIMA, SARIMA, ARIMAX), Clustering, LSTM, SVM, Linear Regression, Logistic Regression, Sentiment Analysis, NLP, K-Mean – https://thedatamonk.com/product/machine-learning-interview-questions/

Set C – [0-7 Years of Experience]

2000+ interview questions that include 100 questions each on 12 most asked Machine Learning Algorithms, Python, Numpy and Pandas – 300 Interview Questions, Pandas,PCA,AWS,Data Preprocessing,Case Studies, and many more
https://thedatamonk.com/product/the-data-monk-e-book-bundle/

Note – Set C contains all the questions of Set B


Youtube Channel – The Data Monk

Unlike any other youtube channel, we do not teach basic stuff, we teach only topics that are asked in interviews. If the interviewer asks about p-value, we will have a video on that topic,
If the interviewer is interested in asking the sequence of execution of SQL commands then we will give you an overview of all the commands but stress so much on the question that can answer it comfortably in the interview. We definitely recommend you to follow our youtube channel for any topic that you are interested in or weak at
.

If you wish to get all the study material and topics to cover for an interview at one place, then you can subscribe to our channel. We have covered the complete syllabus of
Get all the youtube videos playlist on our youtube Channel – The Data Monk

Code in Python for Data Science â€“ Understand one algorithm at a time in 30 minutes (theory and python code)
Company-wise Data Science Interview Questions â€“ 15 videos on how to crack analytics interview
Complete Numpy Tutorial â€“ 14 videos on all the functions and questions on Numpy
Complete Python Pandas Tutorial â€“ 15 videos to completely cover Pandas
SQL Complete Playlist â€“ 20 highly recommended videos to cover all the interview questions
Case Study and Guesstimates Complete Playlist â€“  Real-life interview case study asked in 2021
Statistics– 10 videos to completely cover Statistics for interviews


Lastly,
If you are in dire need of any help, be it book-wise or guidance-wise, then you can definitely connect with me on Linkedin. We will try to help as much as possible



About TheDataMonkGrand Master

I am the Co-Founder of The Data Monk. I have a total of 6+ years of analytics experience 3+ years at Mu Sigma 2 years at OYO 1 year and counting at The Data Monk I am an active trader and a logically sarcastic idiot :)

Follow Me