Collecting Twitter Data Using R

Abigail Hipolito
4 min readDec 5, 2017

--

I’m currently working on a text analysis project and I wrote a simple script on R for collecting Twitter data through Twitter’s API. I also created a cron job to automate this process for me, which collects data on an hourly basis.

I’m pretty much a beginner in using R, Terminal shell, and the Twitter API, so if you have any suggestions on how to make this collection process better, please let me know.

Tools Used

Step 1: Create a Twitter Application

Follow this tutorial on how to create a Twitter application and how to generate keys. You will need these 4 keys and tokens from your Twitter Application (this is unique to each user so keep this information secret):

  • Consumer Key
  • Consumer Secret
  • Access Token
  • Access Token Secret

Step 2: Run R Script

Running this script manually on RStudio works perfectly fine. So if you only want to get data on a one time basis you can stop here. However, if you’re looking to collect data multiple times, follow the next step on how to run a cron job.

#!/usr/local/bin/Rscriptsetwd("/Users/ahipolito94/Capstone_2/Data")
library(twitteR)
setup_twitter_oauth("consumer-key", "consumer-secret",
"access-token", "access-secret")
terms <- c('"iphonex", "iPhonex", "iphoneX", "iPhoneX", "iphone10", "iPhone10","iphone x", "iPhone x", "iphone X", "iPhone X", "iphone 10", "iPhone 10", "#iphonex", "#iPhonex", "#iphoneX", "#iPhoneX", "#iphone10", "#iPhone10")terms_search <- paste(terms, collapse = " OR ")iphonex <- searchTwitter(terms_search, n=1000, lang="en")
iphonex <- twListToDF(iphonex)
write.table(iphonex,"/Users/ahipolito94/Capstone_2/Data/iphonex.csv", append=T, row.names=F, col.names=T, sep=",")

Here’s a line by line explanation:

#!/usr/local/bin/Rscript — instructs Terminal to run the script using R. To find where your Rscript is stored on your system, type which Rscript in Terminal.

setwd("/Users/ahipolito94/Capstone_2/Data") — sets your working directory. This line allows us to save/append to a CSV file in your working directory.

terms <- c("iphonex", ... "#iPhone10") — variable to store keywords and hashtags you want to search through.

terms_search <- paste(terms, collapse = " OR ") — inserts OR between each term. This is the syntax used in searchTwitter() for multiple search terms.

iphonex <- searchTwitter(terms_search, n=1000, lang="en") — uses twitteR function to search for 1000 tweets in the english language. I think n=3200 is the maximum number of tweets you can search for.

iphonex <- twListToDF(iphonex) — uses twitteR function to convert list of tweets to a dataframe.

write.table(iphonex, "/Users/ahipolito94/Capstone/Data/iphonex.csv", append=T, row.names=F, col.names=T, sep=",") — saves the dataframe into a CSV file in your working directory. append=T allows R to add rows to the file instead of just overriding the data.

Step 3: Run Cron Job

A cron job schedules a command or script to run automatically at a specified time and date. If you’re using Mac OS X (or Linux), you can follow this step to schedule cron jobs. If you’re using Windows, I believe the equivalent of a cron job is a Scheduled Task.

I followed this guide on how to run a cron job.

  1. Open Terminal Window
  2. Give R permission to run on Terminal: type chmod u+x /Users/ahipolito94/Capstone_2/Data/Get_Data.R and press enter. Just replace my filename with your filename.
  3. Add new cron job to crontab: type crontab -e and press enter. This opens vi editor. Personally, I found vi editor hard to navigate so I switched to nano editor for easier use. To switch to nano editor, type export EDITOR=nano and press enter.
  4. Create the cron command: type 0 * * * * /Users/ahipolito94/Capstone_2/Data/Get_Data.R and save it. To save it on nano editor, hit ctrl+x then hit y then press enter. In this example, the cron job runs every hour (ex: runs at 12:00pm, 1:00pm, etc). To change the frequency of the job, change the values of the asterisks. Follow this guide for more information on how to do this.
  5. Check if the cron job is running: type crontab -l and press enter. This lists the cron jobs that are currently running. So it should return the cron command that you typed earlier, in my case 0 * * * * /Users/ahipolito94/Capstone_2/Data/Get_Data.R .
  6. To stop the cron job, type crontab -e and add # before your cron command. So in my case, #0 * * * * /Users/ahipolito94/Capstone_2/Data/Get_Data.R .

Step 4: Check Working Directory for the Data!

If everything ran smoothly, you should see data being added to the file in your working directory.

Step 5: Read CSV File on RStudio

This script reads and views the CSV file on RStudio:

iphonex_csv <- read.csv("/Users/ahipolito94/Capstone_2/Data/iphonex.csv", header = TRUE, encoding = "UCS-2LE")View(iphonex_csv)

Output on RStudio:

--

--