Before you start implementing these techniques, load the data of your choice in your working environment. For this post, I’m using Iris data.
Stack Overflow Analyzed Data from 60,000+ Software Developers — Hours They Work, Languages They…
Here is what they found…
Hacker Earth surveyed 16000 developers from 76 countries — Here’s What I Found
Amazing Insights ….
Hacker Rank Analyzed Data from 100K+ Developers and Hiring Managers — Here is what I found
Great Analysis Results from 100,000+ Developers and Hiring Managers
Start with importing the necessary libraries such as Pandas and Numpy and loading your data set. Once done, dive in the techniques below —
1. Split data using pandas
In the code below, we are splitting the data into a random sample of rows and removing them from the original data after dropping index values.
2. Binning Data
Binning is a technique to group/bin your data into multiple buckets which is very helpful if you dealing with continuous numeric data. In pandas you can bin the data using functions cut and cut. First check the shape of your data i.e no of rows and columns.
Then bin your data using qcut as shown below —
(2.7, 3.0] 50
(1.999, 2.7] 33
(3.1, 3.4] 31
(3.4, 4.4] 24
(3.0, 3.1] 12
Name: sepal_width, dtype: int64
3. Slicing using loc and iloc functions
You can do position based and label based slicing using iloc and loc functions respectively.
10 “Silicon Valley” Liners/Puns that are So Funny, Apt & Relatable to the Tech World
Hilarious as they sound…
4. Mean Imputation and Interpolate method
Mean Imputation is a technique in which the missing value is replaced by the mean of available data in the chosen column.
First see if your data has missing values or not.
Then calculate the mean and replace the missing value —
Replace the missing value
Interpolate method —
5. Combining Data using Concat and Join
Just like in numpy, pd.concat() function is used for concatenation of Series or DataFrame objects in pandas.
Merging and joining the data is one of the most important skill in the data science. Understanding and Implementing it right is crucial in order to analyze data well.
Learn Data Science in a Flash!? | Data Driven Investor
I was a trained classical pianist in my previous professional life. Remember those infomercials claiming that you could…
In this we will implement —
- Inner Join : keep rows from both the tables/data frames based on the specified merge condition.
- Full Join : keep all the rows form left table and right table with matched rows wherever possible and NaN’s elsewhere.
- Left Join : keep all the rows form left table and wherever there are missing values in the right table, put it as NaN’s, based on the specified merge condition.
- Right Join : keep all the rows form right table and wherever there are missing values in the left table, put it as NaN’s, based on the specified merge condition.
#Full outer join