A use case with Python Faker Library

Ketan Sahu
Plumbers Of Data Science
3 min readApr 27, 2021

--

In this article, I will show you how I used the Python Faker library to enrich the App dataset.

The story begins, when I needed a dataset to use in my Data Engineering project. My use case was to understand how some random customers based on their age, and their location uses different Apps. However, I didn’t find any dataset on the internet which can fulfill my requirement. But, I was able to find the Google Android Play Store App dataset from Kaggle.

First, let’s have a look at the App dataset and the data in it.

Android Apps Dataset

So, to fulfill my requirement, I have to tweak the App dataset. I used the Faker library and created a fake customers details dataset and merged it to the App dataset. Before going more into details let’s have a look at what is Faker Library.

Faker library in python helps you to generate fake data. The fake data could be fake names, fake addresses and locations, fake date of births etc. Check out the Faker library, and it’s installation process.

For the sake of simplicity, let’s say we want to understand how 1000 customers based on their location in the US and their age uses Android Apps. Hence, I need 1000 fake customers with their location in the US and age between 18 and 65. I will use the following steps to create it.

Install and Import Faker Library.

Install and Import Faker Library

Create 1000 Fake customers.

Faker data

Assign customer_id to each customer. customer_id will help us to merge the customer's fake dataset to the App dataset.

Assign customer_id to each Customer

In order to merge the Customer's fake dataset to the App dataset, I will randomly distribute 1000 customer_id over the App dataset. This is how I did it.

Import the App dataset.

App Dataset

Assign Customer_id to the App dataset.

App Dataset with Customer_id

Merge the App dataset and Customers fake dataset to create Customers App dataset.

customers_app_dataset

Here we are, the final dataset is ready to go.

Well, In this example, I have only used three faker library functions, but there are multiple options available to use according to your requirement.

If you want to play with my example (dataset and code), you can fork it from my GitHub repo. You can also connect me over LinkedIn.

Note: It is not advisable to use the Faker library in real-life cases. I appreciate it if you use it only to learn and play around with your project.

--

--