Scrape TikTok Data Using Python

Alec Kunkel
Made by McKinney
Published in
6 min readJan 28, 2022

Today we’re going to chat about something slightly different than 3d development and design! This week we’re going to break down how to scrape data from Tiktok using Python. Specifically, we’re going to scrape all the video data on a user’s timeline. To give you an idea of where we are headed, I’ve broken down the project into 3 steps:

  1. Set up a virtual environment
  2. Install the required dependencies
  3. Scrape and cleaning data

Just a heads up, this tutorial is going to assume you’re already set up with python and at least a little familiar with writing code.

Step 1: Setting up your virtual environment

Before we jump into the code, we’re going to kick off by creating a virtual environment. Open up your terminal and navigate to a place where you want to keep your files organized.

Pro Tip: Use “mkdir” to create folders and “touch FILENAME” to create files directly from your terminal

Start with the following line to create your virtual environment.

python3 -m venv NAME

After you created your venv, activate it using the following command.

source NAME/bin/activate

This will give you an environment isolated from your base python installation. Setting up a virtual environment is a great way to test new code without the risk of breaking anything globally. If you have any additional questions/issues, I recommend checking out the documentation here.

Step 2: Installing the required Dependencies

The next bit of setup we’re going to tackle is installing the dependencies to our venv. To scrape all of the data, we’re going to use this unofficial API. While we’re here, let’s also go ahead and install Pandas.

pip install TikTokApi Pandas

Important Note: At the time of writing this, I was running into issues with this exact package. If you run into a captcha bug, uninstall that package and use the one below.

pip install git+https://github.com/Daan-Grashoff/TikTok-Api

If you cloned the repo, you can install the dependencies using “pip install -r requirements.txt"

Step 3: Writing Our Code

Finally, with everything installed, we’re ready to start writing some code! Create a new file, “Main.py”. This will serve as our entry point to using the API. Open up your file in your preferred text editor. I’m a huge fan of VS Code but use what works for you.

The first thing is to import our dependencies and instantiate the TikTokAPI. The API variable will be used throughout our code.

from TikTokApi import TikTokApi
import pandas as pd
api = TikTokApi.get_instance()

Within our file, I’ve broken down the code into three functions: one that asks for a username, one that scrapes the data, and finally a function to clean the data.

3a. Asking for user input

The first function, “inputUserID”, kicks off by asking for a username. Once we’ve received the username, we get to use the TikTok API for the first time. Specifically, we pull and return the “userID” and “secUID”. These values are required when pulling profile feed data.

def inputUserID():
userName = input("Enter Username: ")
userInfo = api.get_user(userName)
userID = userInfo['id']
secUID = userInfo['secUid']
return userID, secUID

3b. Scrape the data

Let’s start by defining a few variables we will use throughout the function. Don’t worry about these values just yet. As we walk through the remainder of the function I’ll explain the purpose of each.

def getUserVideos(userDefID, secDefUID):  
# Sets up a few variables that will be used throughout the code
cursorValue = 0
hasMore = True
df = pd.DataFrame()

Out of the box, the API is limited to pulling 30 posts. To get around that, we will use a couple of the variables we set up at the beginning of the function. Specifically cursorValue and hasMore.

The cursorValue lets the API know where on the page those 30 videos ended

The hasMore value lets the API if there are more videos on the user’s timeline

Using the power of a while loop, as long as hasMore is true, we’ll continue to pull more data based on where the cursor is.

  while hasMore:

else:
print("No more data")

Within this while loop, we start by making a call to the api. There are a couple of values being passed through:

  • userID/secUID: These came from our initial function
  • cursorValue: The current cursor place on the page

After each iteration, we update those variables and pass them back.

# Function to pull video data until all have been pulled
while hasMore:
# Makes initial call to the API
TikTokList = api.user_page(userID=userDefID,
secUID=secDefUID,
cursor=cursorValue)
# Cleans the data after it's scraped
data = cleanData(TikTokList['itemList'])
df = df.append(data)
# updates our variables based on the latest scrape
cursorValue = int(TikTokList['cursor'])
hasMore = TikTokList['hasMore']

So, that code will run until “hasMore” returns false. Once we’ve got all of our data, the last step is to output it into a csv.

print("No more data")
df.to_csv('UserVideos.csv')

And violà! Here is our final function to scrape user data.

def getUserVideos(userDefID, secDefUID):  
# Sets up a few variables that will be used throughout the code
cursorValue = 0
hasMore = True
df = pd.DataFrame()
# Function to pull video data until all have been pulled
while hasMore:
# Makes initial call to the API
TikTokList = api.user_page(userID=userDefID,
secUID=secDefUID,
cursor=cursorValue)
# Function used to clean
data = cleanData(TikTokList['itemList'])
df = df.append(data)
# updates our variables based on the latest scrape
cursorValue = int(TikTokList['cursor'])
hasMore = TikTokList['hasMore']
else:
print("No more data")
df.to_csv('UserVideos.csv')

At this point, we’ve implemented an input to ask for a username and a function that pulls all of the profile data, but if you were to look at the result, you’d see a nested object not fit for analysis. Let’s convert that object into a nicely formatted CSV.

3c. Processing our data

You probably saw that in our while loop above, we had a “cleanData” function being called, that’s what we are going to be creating here.

Within our API response, we have a whole bunch of data. And a lot of that data is nested data. Here is an example of what that data looks like:

It starts with our itemList and within that, we have information for each video. Each video also has its own nested elements (those are the items with a plus next to them). What we need to do is create a function that uncouples all of these elements and writes their children to a new “row”.

When our data is ran through the function below, we process each data pull into a flattened dictionary that then gets converted to a Pandas dataframe. While converting our data, we also look for any value that is stored in our nested_values array. If the criteria is met, we expand it into it’s own element.

def cleanData(data):  nested_values = ['video', 'author', 'music', 'stats', 'authorStats']  # Creates a dictionary for our df to be stored in
flattened_data = {}
for id, value in enumerate(data):
flattened_data[id] = {}
# Loop through each element
for prop_id, prop_value in value.items():
# Check if nested
if prop_id in nested_values:
for nested_id, nested_value in prop_value.items():
flattened_data[id][prop_id + '_' + nested_id] = nested_value
# If it's not nested, add it back to the flattened dictionary
else:
flattened_data[id][prop_id] = prop_value
return pd.DataFrame.from_dict(flattened_data, orient='index')

Now every time we want to clean our data, we just run it through this function. What gets returned is a pandas dataframe that we can then write to a csv.

And with our last function wr, we’re just left to call them at the end.

userInfo = inputUserID()getUserVideos(userInfo[0], userInfo[1])

Give it a try using python Main.py. You’ll be promoted for a username and then a csv with all of your data!

Final Thoughts

At this point, you should now have a working scraper that pulls data from TikTok. It’s a great start, but it can absolutely be taken further. Maybe you want to deploy your code and automate it to run. Or connect your script to the Google Sheets API so that your data can be sent directly to a sheet. This code can be used as a building block to your own reporting dashboard!

If you’re interested in just getting the code, you can find that in the github repo here: https://github.com/iamkunkel/TikTok-Performance-Scraper

--

--