ETL Using Python’s Petl

Elijah Ayeeta
Sep 26 · 3 min read

ETL stands for Extract Transform and Load. There are a number of ETL tools on the market, you see for yourself here. ETL tools are mostly used for transferring data from one database to another or data warehouse to another, manipulating it such that it’s consistent and etc.. In other words, ETL is the general procedure of copying data from one or more sources into a destination system which represents the data differently from the source(s) or in a different context than the source(s).

In this blog, we’ll build our own simple ETL tool to consume some random free API endpoint. The library in python for ETL a data pipeline is petl. Let’s get started…

You could start by cloning/downloading from my github repo. We’ll import our different libraries, our main focus is on petl, pandas and plotly. You can also delete users.csv file, we’ll see how it comes about later… that’s only if you downloaded or cloned from my github repo.

We’ll load our data from the API endpoint

Since we can now view this data let’s Extract. There are a number of methods petl provides to extract data from, we’ll use fromdicts() to extract into users_table variable

Above is what our users_table variable holds. Notice the nested dictionary objects in address and company. Let’s take interest in address, our aim is to see where our users come from but in a more organized way.

So we Transform. Petl provides a number of methods to transform tabular data, however, we’ll use unpackdict(), cut() rename()

Result:

We now have a decent looking table.

Lastly, we’ll Load. Petl provides a number of methods to load data but we’ll use tocsv()

So if you check the directory from which you are running your jupyter-notebook you’ll notice a users.csv file.

We can also do exploratory analysis on our csv file. Our aim is to find out where our users are located

Let’s plot on a map using the longitude and latitude points for each user

Since this data was from a random API, those location points are expected otherwise looks like most of our users are mermaids :)

Thank you for following through, I do welcome your feedback
, LinkedIn, Twitter- @ElijahAyeeta

Data Driven Investor

from confusion to clarity, not insanity

Elijah Ayeeta

Written by

Software Developer | Bassist | Data Scientist and Data Enthusiast

Data Driven Investor

from confusion to clarity, not insanity

Welcome to a place where words matter. On Medium, smart voices and original ideas take center stage - with no ads in sight. Watch
Follow all the topics you care about, and we’ll deliver the best stories for you to your homepage and inbox. Explore
Get unlimited access to the best stories on Medium — and support writers while you’re at it. Just $5/month. Upgrade