The spread of mis- and disinformation through social media poses significant risks to public safety and democracy. Understanding how these campaigns are initiated and spread is critical to stemming their negative effects. Twitter has created an API, an application programming interface, that allows users to pull tweets and other data from Twitter’s servers for research.
At the CITRIS Policy Lab, headquartered at UC Berkeley, we use Twitter data to research the effects of automated accounts in spreading disinformation, harassment, and political divisiveness on contentious political issues.
If you’re interested in studying Twitter, this step-by-step guide will lead you through the process of scraping tweets and other relevant data from Twitter’s API. The instructions in this guide are tailored for Mac users, but will also be helpful for PC users.
Step 1. Apply for a Twitter Developer License & Obtain Credentials
You’ll need to have a Twitter developer license in order to scrape tweets. You can apply here. It may take a day or two to get approval.
Once you’re approved, log into your account at developer.twitter.com and go to Apps under your name on the top right-hand of the website. Click Create an App on the top right-hand of the website. Fill out the form to create your app.
After you create your app, go to Keys and Tokens in the menu. Your API Key and API Secret Key will already be listed. Click Generate to get your Access Token and Access Token Secret. Do not share these with others. They are made for your use only.
Copy Your API Key, API Secret Key, Access Token, and Access Token Secret. You will use these in your Python code.
Step 2. Install Python
This tutorial uses scripts written in the Python programming language, so you’ll need to have Python installed.
If you use a Mac or Linux, you should have Python pre-installed, but it may not be the right version of Python. To see what version of Python you have installed, open up the Terminal application and type python -V. Press Enter. On a Mac, you can open the Terminal by clicking command spacebar and typing Terminal into the search bar. If you have Python 3.0.0 or higher, you should be okay, otherwise go to python.org/downloads and follow the instructions to download and install Python 3.8.2 for Mac.
Windows does not come with Python pre-installed. You can download and install Python 3.8.2 at python.org/downloads.
Step 3. Install Sublime Text or Another Code Editor
If you don’t already have a preferred code editor, I recommend you download and install Sublime Text at sublimetext.com. Code editors are often easier to work with than simple text editors because they do “syntax highlighting” and have other helpful features. You can use a plain text editor like Gedit or TextEdit if you prefer.
Step 4. Create a Folder & Save Python File in Folder
Create a folder for this project. For example, you can name your folder “ScrapeTweets.”
Open Sublime Text and create a new file (File → New File). Make sure the file you create in Sublime Text is using the Python syntax. You can check this by going to View → Syntax → Python. Save the file as .py inside the “ScrapeTweets” folder.
Step 5. Open Terminal & Set Up Virtual Environment
Right click on the “ScrapeTweets” folder you created and select New Terminal at Folder. To right click on a Mac, press control while clicking on the folder.
You can check that you’re in the right location within your Terminal window by typing pwd, which means “print working directory.” The last element of the path that is printed should be “ScrapeTweets” or whatever name you gave your folder that you created.
To set up a virtual environment, type python3 -m venv venv in the Terminal. Press Enter.
Type source venv/bin/activate in the Terminal. Press Enter. You have now activated your virtual environment. It should now say (venv) on the left side of your Terminal window.
Type pip3 install tweepy in the Terminal. Press Enter.
Type pip3 install pandas in the Terminal. Press Enter.
DO NOT CLOSE YOUR TERMINAL WINDOW.
Step 6. Python
Copy the Python code shared via GitHub into the Sublime Text file that you set up earlier and saved inside the “ScrapeTweets” folder you created and opened your Terminal from.
Paste in your Twitter Tokens and Keys into the Sublime Text file. Save your file.
You’ll need to update the file save path and file name at the bottom of the code in your Sublime Text file.
You can identify the appropriate file save path to enter by typing pwd into the Terminal and pressing Enter. Copy and paste the file save path that was generated and add a name for the file you will create into your Sublime Text file.
SAVE YOUR FILE in Sublime Text. You must save your Python file after every change in order for the updates to run through the Terminal.
Step 7. Scraping Tweets
Go to your Terminal. Type python filename.csv. Press Enter. Don’t forget that you need to replace “filename” with the file name you gave your Sublime Text file.
Your code should run and create a new CSV file in your folder with the name you gave it. The CSV file should contain data for the 7 fields specified in the code: (1) created_at, (2) tweet_id, (3) tweet_text, (4) screen_name, (5) name, (6) account_creation_date, and (7) urls.
Woot, Woot! You Did It!
Terminal Cheat Sheet & Extras
To stop something running: Ctrl c
To see your file directory: pwd
To set up a virtual environment: python3 -m venv venv then source venv/bin/activate
To install Tweepy: pip3 install tweepy
Tweepy is a Python library for accessing the Twitter API.
To install Pandas: pip3 install pandas
Pandas is a software library written for Python for data manipulation and analysis.
To see a list of the files in your directory: ls
To autocomplete a file name: Type the first letter of the file name and press tab
To repeat the last command you ran: Click the up arrow key
Thank you to Shauna Gordon-McKeon, Founder of Galaxy Rise Consulting, for her guidance and training.
Thank you to Rory Smith at First Draft for their guide that helped me develop this guide.
The CITRIS Policy Lab, headquartered at CITRIS and the Banatao Institute at UC Berkeley, supports interdisciplinary research, education, and thought leadership to address core questions regarding the role of formal and informal regulation in promoting innovation and amplifying its positive effects on society.