ETL Using Python’s Petl

Elijah Ayeeta
Sep 26, 2019 · 3 min read

ETL stands for Extract Transform and Load. There are a number of ETL tools on the market, you see for yourself here. ETL tools are mostly used for transferring data from one database to another or data warehouse to another, manipulating it such that it’s consistent and etc.. In other words, ETL is the general procedure of copying data from one or more sources into a destination system which represents the data differently from the source(s) or in a different context than the source(s).

Image for post
Image for post

In this blog, we’ll build our own simple ETL tool to consume some random free API endpoint. The library in python for ETL a data pipeline is petl. Let’s get started…

You could start by cloning/downloading from my github repo. We’ll import our different libraries, our main focus is on petl, pandas and plotly. You can also delete users.csv file, we’ll see how it comes about later… that’s only if you downloaded or cloned from my github repo.

We’ll load our data from the API endpoint

Since we can now view this data let’s Extract. There are a number of methods petl provides to extract data from, we’ll use fromdicts() to extract into users_table variable

Image for post
Image for post

Above is what our users_table variable holds. Notice the nested dictionary objects in address and company. Let’s take interest in address, our aim is to see where our users come from but in a more organized way.

So we Transform. Petl provides a number of methods to transform tabular data, however, we’ll use unpackdict(), cut() rename()

Result:

Image for post
Image for post

We now have a decent looking table.

Lastly, we’ll Load. Petl provides a number of methods to load data but we’ll use tocsv()

So if you check the directory from which you are running your jupyter-notebook you’ll notice a users.csv file.

We can also do exploratory analysis on our csv file. Our aim is to find out where our users are located

Image for post
Image for post

Let’s plot on a map using the longitude and latitude points for each user

Image for post
Image for post

Since this data was from a random API, those location points are expected otherwise looks like most of our users are mermaids :)

Thank you for following through, I do welcome your feedback
, LinkedIn, Twitter- @ElijahAyeeta

Data Driven Investor

empowering you with data, knowledge, and expertise

Sign up for DDIntel

By Data Driven Investor

In each issue we share the best stories from the Data-Driven Investor's expert community. Take a look

By signing up, you will create a Medium account if you don’t already have one. Review our Privacy Policy for more information about our privacy practices.

Check your inbox
Medium sent you an email at to complete your subscription.

Elijah Ayeeta

Written by

Software Developer | Bassist | Aspiring Data Scientist | Tech Blogger

Data Driven Investor

empowering you with data, knowledge, and expertise

Elijah Ayeeta

Written by

Software Developer | Bassist | Aspiring Data Scientist | Tech Blogger

Data Driven Investor

empowering you with data, knowledge, and expertise

Medium is an open platform where 170 million readers come to find insightful and dynamic thinking. Here, expert and undiscovered voices alike dive into the heart of any topic and bring new ideas to the surface. Learn more

Follow the writers, publications, and topics that matter to you, and you’ll see them on your homepage and in your inbox. Explore

If you have a story to tell, knowledge to share, or a perspective to offer — welcome home. It’s easy and free to post your thinking on any topic. Write on Medium

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store