Python: Extracting Data using API
A beginner reading to understand more about using API in python, not recommended for the pro
Extracting data could be in many ways, like exporting data to flat files, downloading as JSON format, etc. Those are the outputs. But, how if using script, you can connect directly to your data (extract it) and transform to your desired format.
WHAT IS API?
API stands for Application Programming Interface, it’s a set of protocols that being used as communications between application services.
I believe some of you will be not comfortable (scared) if you heard this thing, and think that:
API is so difficult, for analyst like me, I don’t need to know about this, this thing is for engineers or scientist only.
Well, your thought isn’t wrong at all. Using API is only one of many methods to get your data, the simplest one is downloading/exporting your data to csv manually (by clicking a button).
The difference is you can programmatically connect to API services, export those data, do some transformations, and load it as outputs. Basically, you can create a script to do all those things instead of doing 3–5 steps (download, transform, load, etc). Even you’re able to refresh your data to latest state by just running the script again, otherwise in traditional way, you need to do 3–5 steps again to update your data.
CASE: Extracting Active Users from App Annie
Active Users is becoming one of standard metrics that being used some companies to consider their positions in market, relative to competitors. One of common way to know competitor’s active users is from App Annie.
App Annie is a platform to estimate active users, downloads, or ads based on apps and mobile historical data.
IMAGINE THIS!You want to compare 3 biggest online content provider: Youtube, Instagram, and TikTok active users daily for all time both android and ios devices
Technically, you can extract the data by clicking export or using API. I will run you through both methods.
Method 1: Clicking Export in App Annie Interface
The steps is really simple though, like:
- You just need to create compare report, add what platform you want to compare
- Choose
Single Apps + Websites
so android and ios can be distinguished - Select the timeframe (I choose from Jan 2019 until end of 2019)
- Select the aggregation you want (I want this in daily)
- Click
Export
in top right side, just continue until your csv is ready
The drawback is when you do these steps at specific time, the report is only updated until that specific time, if you want to update again, you need to run all those 5 steps again. And you will need to clean your data, it could be done either in excel or other spreadsheet reader or you have another stack to do this easier (trust me this cleaning part is tired to be done manually in spreadsheet)
Method 2: Exporting Data using App Annie API
First, you need to make sure that your access on App Annie is include API access also
You can do this by following these steps:
- Point to account shape button in top right side, click
API Key
- If you have access to App Annie API, you should be able to view this page, and click
Show Key
, that is your secret API key (don’s share this to anyone) - Please be mindful on the usage limit of hitting API, you won’t be able to hit API again if you have been over the limit
Second, explore over App Annie Help Center, this API builder has documented all needed to connect API and extract the data, as example: the URL structure you need, response format, definition of each attributes in response, etc.
In this case, I have to follow the steps from this LINK, at Usage App History part
You need to specify the right URL structure based on this doc, and fill the required parameter to be succeed get your data (please see below picture)
Starting Code !!!
- Import required library
import pandas as pd
import json
import requests
2. construct the key to required format (API key is different to each user)
key = f'bearer {api_key}'
3. create a function to pass JSON to pandas dataframe (reason: more readable), then you can just change parameter product_id
to change different apps, change parameter market
for ios/android, and change parameter country
to change the country. Then, you can just call this function to get the result.
def extract_active_user_appannie(product_id, market, country, app, granularity="daily", start_date="2019-01-01"):
#market = ios/all-android
#country = "2 digit country code (ID,SG,TH,etc)"
#granularity = daily, weekly, monthly
url = f"https://api.appannie.com/v1.3/intelligence/apps/{market}/app/{product_id}/usage-history?countries={country}&granularity={granularity}&start_date={start_date}"response = requests.get(url, headers={'Authorization':key})
df = pd.DataFrame(response.json()["list"])[["device","date","active_users"]]
df["country_code"] = country
df["app"] = app
return df
you can find product_id
by clicking each apps, scroll down page, you will find App ID
in grey color font
4. Exciting part: use the function to extract the data from each apps
#android
android_instagram = extract_active_user_appannie(product_id=<instagram_android_app_ID>,app="instagram",market="all-android",country="ID")
android_tiktok = extract_active_user_appannie(product_id=<tiktok_android_app_ID>,app="tiktok",market="all-android",country="ID")
android_youtube = extract_active_user_appannie(product_id=<youtube_android_app_ID>,app="youtube",market="all-android",country="ID")#ios
ios_instagram = extract_active_user_appannie(product_id=<instagram_ios_app_ID>,app="instagram",market="ios",country="ID")
ios_tiktok = extract_active_user_appannie(product_id=<tiktok_ios_app_ID>,app="tiktok",market="ios",country="ID")
ios_youtube = extract_active_user_appannie(product_id=<youtube_ios_app_ID>,app="youtube",market="ios",country="ID")df = pd.concat([android_instagram,android_tiktok,android_youtube,ios_instagram,ios_tiktok,ios_youtube])
df = df.loc[df["device"].isin(["android","ios"])]
df = df[["date","app","device","country_code","active_users"]]
❗️WARNING: numbers above is dummy number I generate myself, don’t represent the real active user for respective apps at all (the actual active users is CONFIDENTIAL, copyright to App Annie)
Next time you want to update the data to recent dates, you just run over the python script again, or automate it using cron/airflow. This is clearly can save your time a lot and you can do other things.
CONCLUSION
Based on my experience, you will need both of above methods but under some conditions.
When you can apply method 1:
- You just need one time adhoc analysis that doesn’t require data to be updated all the time.
- You need the data quickly (ex: your stakeholder request the data at the morning and expect the data is ready at noon/afternoon same day 👌), if this’s the case, I don’t expect you to extract the data using API, because you will be running out of time create and test the script, and you need to clean that data also.
When you can apply method 2:
- You need the data should be updated daily/weekly/monthly (ex: to create a dashboard), unless you won’t your life is wasted 😩, I suggest you automate process of data extraction until it’s ready to end output.
- You have plenty time to explore this method, I know that this method is never easy but I’m pretty sure you will gain benefit of this method as you will understand more about API itself and how to interact with its output. It will be your life changing moments. After all, this method is so cool instead of just exporting manual. (HAHAHA 😂😂)
— — — — — — — — — —END OF STORY — — — — — — — — — —
Notes: This article only covers from data extraction until data is ready. But, more interesting part is how we utilizing that data become useful insights to stakeholders. I will cover this things in another story. PLEASE STAY TUNED!!!!