Data Analysis

7 Techniques to Clean and Structure Data for Analysis

Top hacks to save time. ⏰

Frederik Bussler
Sep 4, 2020 · 8 min read
Image for post
Image for post
Image by Comfreak from Pixabay

1. Data Quality Analysis

import pandas as pd
df=pd.read_csv('Space.csv')
df.head(5)
Image for post
Image for post
Image for post
Image for post

2. Tidying the Data

df.drop(["Unnamed: 0"], axis=1, inplace=True)
df.drop(["Unnamed: 0.1"], axis=1, inplace=True)
df['year'] = df['Datum'].apply(lambda x:x.split()[3])
df['day'] = df['Datum'].apply(lambda x:x.split()[0])
df['month'] = df['Datum'].apply(lambda x: x.split()[1])
df['Date'] = df['month'] + ' ' + df['year']
df['Country'] = df['Location'].apply(lambda x:x.split(',')[-1])
df['center'] = df['Location'].apply(lambda x:x.split(',')[1])
df["Country"].replace({"USA": "United States"}, inplace=True)
Image for post
Image for post

3. Data Merging and Transfer

precincts = precincts.merge(df, on = "Country") # merge on Country
df = pd.concat([Y2003, Y2018]) # add one DF to the bottom of another
Image for post
Image for post

4. Data Cleaning Libraries

dabl.clean(df_original, verbose=1)
Image for post
Image for post

5. Missing Value Libraries

mi.matrix(df, figsize=(12,8))
Image for post
Image for post

6. Use Live Data

Image for post
Image for post
Image for post
Image for post

7. Determine Meaningful Attributes

Towards AI

The Best of Tech, Science, and Engineering.

Sign up for Towards AI Newsletter

By Towards AI

Towards AI publishes the best of tech, science, and engineering. Subscribe to receive our updates right in your inbox. Interested in working with us? Please contact us → https://towardsai.net/contact Take a look

By signing up, you will create a Medium account if you don’t already have one. Review our Privacy Policy for more information about our privacy practices.

Check your inbox
Medium sent you an email at to complete your subscription.

Frederik Bussler

Written by

AutoML enthusiast and no-coder at Obviously.AI.

Towards AI

Towards AI is the world’s leading multidisciplinary science publication. Towards AI publishes the best of tech, science, and engineering. Read by thought-leaders and decision-makers around the world.

Frederik Bussler

Written by

AutoML enthusiast and no-coder at Obviously.AI.

Towards AI

Towards AI is the world’s leading multidisciplinary science publication. Towards AI publishes the best of tech, science, and engineering. Read by thought-leaders and decision-makers around the world.

Medium is an open platform where 170 million readers come to find insightful and dynamic thinking. Here, expert and undiscovered voices alike dive into the heart of any topic and bring new ideas to the surface. Learn more

Follow the writers, publications, and topics that matter to you, and you’ll see them on your homepage and in your inbox. Explore

If you have a story to tell, knowledge to share, or a perspective to offer — welcome home. It’s easy and free to post your thinking on any topic. Write on Medium

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store