Top 20 Pandas functions in Python
Top 20 Pandas functions in Python will help you understand the most used functions present in Pandas library. We have put everything in one place. Please comment any other function which you think we should include next time.
Pandas is no doubt one of the most important library for any Analytics professional. In 8/10 cases your data will be stored in a data frame and you need a good understanding of the capability of Pandas to understand this data
Let’s go through the Top 20 Pandas functions in Python
I will always advice people to create their own data set. Don’t go after already available data set on the internet.
By doing this you will get fluent in Excel.
By creating data I mean to say, use functions like RANDBETWEEN, RAND, IF ELSE, etc.
Use them extensively 🙂
So, I created my own data in Excel, which looks like this
- Read a file using read_csv()
import pandas as pd
xyz = pd.read_csv(‘/Users/Nitin/Downloads/Kaggle/THD_Inventory.csv’)
- Get the top and bottom data set
- Get the complete information about all the columns in the table
4. You can also take a look at the shape and size of the dataset. We have 100 rows and 6 columns in the original dataset
5. Choose n number of random sample from the dataset
6. Get all the standard mathematical analysis of each column of data set
Use xyz.describe().T will transpose the table
7. Find number of distinct values – This is an important function as it will directly tell you how many categorical variables are there in your dataset
8. How to find if there is any variable/column with missing values in it?
Use isna().any() – This will check if there is even one null value in a column or not
9. isnull() – This function shows all the row and column with a boolean for the data present in that cell. TRUE means null value, FALSE means Not a null value
10. Find the number of null values in each column, this set of function tells you if you can ignore a column or not
Unfortunately we do not have any NULL value in our data set
11. Get the name of all the columns
12. Get the nsmallest or nlargest values from a column
13. Now comes loc and iloc – There are a few interviewers who tries to check your basics with loc and iloc
loc – When you use loc function then you need to specify the name of the columns
This will fetch row number 1,2,3,4 for the column Sales and Size
iloc – When you use iloc function then you need to specify the index of the columns
This will fetch row number 1 to 4 from the 2nd and 3rd column index (Remember Python is zero indexed)
14. Slice the date – It means cutting the dataset vertically or horizontally
This will fetch row number 1,2,3,4 from all the columns
15. Group by in Pandas – Very useful pandas function
xyz[[‘Markdown’,’inventory’]].group by ([‘Markdown’]).mean()
16. Sort the complete data frame according to one column
xyz.sort_values(by = ‘Sales’, ascending = False)
17. Query in data frame
xyz.query(‘Population > 10000’)[:5]
This gets 5 complete rows with Population more than 10000
18. Get unique values from a column
19. If you want. to know how many space columns are taking into your computer then use memory_usage
20. Write a file to csv
This will write the data frame xyz to the given name. Don’t forget to put csv in the file name extension.
Please comment more usages of Pandas function in Python
The Data Monk e-books
Tired of online courses costing 2 to 8 lakh and taking more than a year to complete?
Tired of going through 500+ hours of videos at a super slow pace?
We at The Data Monk believe that you have to start and complete things as quickly as possible. We believe in a target-based study where we break a topic into 100 questions and make sure that if you cover these questions you will surely be able to crack the interview questions. Rest all theory and practical can ONLY be learned while working in an organization.
Pick any of our books from our e-shop page and complete it in 6-8 hours, learn the 100 questions and write it in your resume. We guarantee you that you will nail 8 out of 10 interviews
We also have 3 bundles at a price that is affordable to everyone. We are a group of people placed in best of the product-based companies and we take 100+ interviews per week. Do we know what is being asked and what is not? So, just grab any of the following book bundles and give not more than 30 days to LEARN all the questions. We guarantee you that you will become a very strong candidate in any analytics interview
Set A – [3rd/4th year/ and 0 to 3 years of experience]
Crack any analytics or data science interview with our 1400+ interview questions which focus on multiple domains i.e. SQL, R, Python, Machine Learning, Statistics, and Visualization. – https://thedatamonk.com/product/books-to-crack-analytics-interview/
Set B – [0-5 Years of Experience]
1200+ Interview Questions on all the important Machine Learning algorithms (including complete Python code) Ada Boost, CNN, ANN, Forecasting (ARIMA, SARIMA, ARIMAX), Clustering, LSTM, SVM, Linear Regression, Logistic Regression, Sentiment Analysis, NLP, K-Mean – https://thedatamonk.com/product/machine-learning-interview-questions/
Set C – [0-7 Years of Experience]
2000+ interview questions that include 100 questions each on 12 most asked Machine Learning Algorithms, Python, Numpy and Pandas – 300 Interview Questions, Pandas,PCA,AWS,Data Preprocessing,Case Studies, and many more
Note – Set C contains all the questions of Set B
Youtube Channel – The Data Monk
Unlike any other youtube channel, we do not teach basic stuff, we teach only topics that are asked in interviews. If the interviewer asks about p-value, we will have a video on that topic,
If the interviewer is interested in asking the sequence of execution of SQL commands then we will give you an overview of all the commands but stress so much on the question that can answer it comfortably in the interview. We definitely recommend you to follow our youtube channel for any topic that you are interested in or weak at.
If you wish to get all the study material and topics to cover for an interview at one place, then you can subscribe to our channel. We have covered the complete syllabus of
Get all the youtube videos playlist on our youtube Channel – The Data Monk
Code in Python for Data Science – Understand one algorithm at a time in 30 minutes (theory and python code)
Company-wise Data Science Interview Questions – 15 videos on how to crack analytics interview
Complete Numpy Tutorial – 14 videos on all the functions and questions on Numpy
Complete Python Pandas Tutorial – 15 videos to completely cover Pandas
SQL Complete Playlist – 20 highly recommended videos to cover all the interview questions
Case Study and Guesstimates Complete Playlist – Real-life interview case study asked in 2021
Statistics– 10 videos to completely cover Statistics for interviews
If you are in dire need of any help, be it book-wise or guidance-wise, then you can definitely connect with me on Linkedin. We will try to help as much as possible