Python: Extracting Data using API

Karl Christian
Analytics Vidhya
Published in
6 min readMar 17, 2020

A beginner reading to understand more about using API in python, not recommended for the pro

Extracting data could be in many ways, like exporting data to flat files, downloading as JSON format, etc. Those are the outputs. But, how if using script, you can connect directly to your data (extract it) and transform to your desired format.

WHAT IS API?

illustration: API extract data from database

API stands for Application Programming Interface, it’s a set of protocols that being used as communications between application services.

I believe some of you will be not comfortable (scared) if you heard this thing, and think that:

API is so difficult, for analyst like me, I don’t need to know about this, this thing is for engineers or scientist only.

Well, your thought isn’t wrong at all. Using API is only one of many methods to get your data, the simplest one is downloading/exporting your data to csv manually (by clicking a button).

The difference is you can programmatically connect to API services, export those data, do some transformations, and load it as outputs. Basically, you can create a script to do all those things instead of doing 3–5 steps (download, transform, load, etc). Even you’re able to refresh your data to latest state by just running the script again, otherwise in traditional way, you need to do 3–5 steps again to update your data.

CASE: Extracting Active Users from App Annie

Active Users is becoming one of standard metrics that being used some companies to consider their positions in market, relative to competitors. One of common way to know competitor’s active users is from App Annie.

App Annie is a platform to estimate active users, downloads, or ads based on apps and mobile historical data.

IMAGINE THIS!You want to compare 3 biggest online content provider: Youtube, Instagram, and TikTok active users daily for all time both android and ios devices

Technically, you can extract the data by clicking export or using API. I will run you through both methods.

Method 1: Clicking Export in App Annie Interface

App Annie interface: creating compare reports and export to csv

The steps is really simple though, like:

  1. You just need to create compare report, add what platform you want to compare
  2. Choose Single Apps + Websites so android and ios can be distinguished
  3. Select the timeframe (I choose from Jan 2019 until end of 2019)
  4. Select the aggregation you want (I want this in daily)
  5. Click Export in top right side, just continue until your csv is ready

The drawback is when you do these steps at specific time, the report is only updated until that specific time, if you want to update again, you need to run all those 5 steps again. And you will need to clean your data, it could be done either in excel or other spreadsheet reader or you have another stack to do this easier (trust me this cleaning part is tired to be done manually in spreadsheet)

Method 2: Exporting Data using App Annie API

First, you need to make sure that your access on App Annie is include API access also

checking API key for your account

You can do this by following these steps:

  1. Point to account shape button in top right side, click API Key
  2. If you have access to App Annie API, you should be able to view this page, and click Show Key , that is your secret API key (don’s share this to anyone)
  3. Please be mindful on the usage limit of hitting API, you won’t be able to hit API again if you have been over the limit

Second, explore over App Annie Help Center, this API builder has documented all needed to connect API and extract the data, as example: the URL structure you need, response format, definition of each attributes in response, etc.

In this case, I have to follow the steps from this LINK, at Usage App History part

You need to specify the right URL structure based on this doc, and fill the required parameter to be succeed get your data (please see below picture)

well defined API document

Starting Code !!!

  1. Import required library
import pandas as pd
import json
import requests

2. construct the key to required format (API key is different to each user)

key = f'bearer {api_key}'

3. create a function to pass JSON to pandas dataframe (reason: more readable), then you can just change parameter product_id to change different apps, change parameter market for ios/android, and change parameter country to change the country. Then, you can just call this function to get the result.

def extract_active_user_appannie(product_id, market, country, app, granularity="daily", start_date="2019-01-01"):
#market = ios/all-android
#country = "2 digit country code (ID,SG,TH,etc)"
#granularity = daily, weekly, monthly
url = f"https://api.appannie.com/v1.3/intelligence/apps/{market}/app/{product_id}/usage-history?countries={country}&granularity={granularity}&start_date={start_date}"
response = requests.get(url, headers={'Authorization':key})
df = pd.DataFrame(response.json()["list"])[["device","date","active_users"]]
df["country_code"] = country
df["app"] = app
return df

you can find product_id by clicking each apps, scroll down page, you will find App ID in grey color font

click on apps name to get into apps page and get app ID
copy the App ID as product_id in code above

4. Exciting part: use the function to extract the data from each apps

#android
android_instagram = extract_active_user_appannie(product_id=<instagram_android_app_ID>,app="instagram",market="all-android",country="ID")
android_tiktok = extract_active_user_appannie(product_id=<tiktok_android_app_ID>,app="tiktok",market="all-android",country="ID")
android_youtube = extract_active_user_appannie(product_id=<youtube_android_app_ID>,app="youtube",market="all-android",country="ID")
#ios
ios_instagram = extract_active_user_appannie(product_id=<instagram_ios_app_ID>,app="instagram",market="ios",country="ID")
ios_tiktok = extract_active_user_appannie(product_id=<tiktok_ios_app_ID>,app="tiktok",market="ios",country="ID")
ios_youtube = extract_active_user_appannie(product_id=<youtube_ios_app_ID>,app="youtube",market="ios",country="ID")
df = pd.concat([android_instagram,android_tiktok,android_youtube,ios_instagram,ios_tiktok,ios_youtube])
df = df.loc[df["device"].isin(["android","ios"])]
df = df[["date","app","device","country_code","active_users"]]
sample of end output

❗️WARNING: numbers above is dummy number I generate myself, don’t represent the real active user for respective apps at all (the actual active users is CONFIDENTIAL, copyright to App Annie)

Next time you want to update the data to recent dates, you just run over the python script again, or automate it using cron/airflow. This is clearly can save your time a lot and you can do other things.

CONCLUSION

Based on my experience, you will need both of above methods but under some conditions.

When you can apply method 1:

  1. You just need one time adhoc analysis that doesn’t require data to be updated all the time.
  2. You need the data quickly (ex: your stakeholder request the data at the morning and expect the data is ready at noon/afternoon same day 👌), if this’s the case, I don’t expect you to extract the data using API, because you will be running out of time create and test the script, and you need to clean that data also.

When you can apply method 2:

  1. You need the data should be updated daily/weekly/monthly (ex: to create a dashboard), unless you won’t your life is wasted 😩, I suggest you automate process of data extraction until it’s ready to end output.
  2. You have plenty time to explore this method, I know that this method is never easy but I’m pretty sure you will gain benefit of this method as you will understand more about API itself and how to interact with its output. It will be your life changing moments. After all, this method is so cool instead of just exporting manual. (HAHAHA 😂😂)

— — — — — — — — — —END OF STORY — — — — — — — — — —

Notes: This article only covers from data extraction until data is ready. But, more interesting part is how we utilizing that data become useful insights to stakeholders. I will cover this things in another story. PLEASE STAY TUNED!!!!

--

--

Karl Christian
Analytics Vidhya

Data Engineering | Data Warehouse | Analytics Engineering