Day 6: Realtime Tweets Analysis using Spark Streaming with Scala

#100DaysOfCode

Project (1 Hour): Create a twitter app and use its API to Stream realtime twitter feed using Spark Streaming with scala.

All the code for this project can be found on my github

Step 1: Download and Setup Spark and Scala IDE

Ensure you have JDK already setup, verify it using the below command, if not go ahead download and setup your JAVA_HOME environment variable.

$ java -version
java version “1.8.0_91”
$ echo JAVA_HOME
/Library/Java/JavaVirtualMachines/jdk1.8.0_91.jdk/Contents/Home

Download Spark from: http://spark.apache.org/downloads.html

Run a test scala code from the downloaded directory using Spark Shell.

./bin/spark-shell
scala> sc.parallelize(1 to 1000).count()
res1: Long = 1000

You can also test using the example python code.

./bin/run-example SparkPi

Finally install scala IDE built on top of eclipse from: http://scala-ide.org/download/sdk.html

Step 2: Create the project with Twitter App credentials

Create a twitter app using https://api.twitter.com/ and then fill in the following in a text file.

consumerKey 
consumerSecret
accessToken
accessTokenSecret

Setup a scala project in IDE and create the following scala code that prints out live tweets as they stream using Spark Streaming.

Building and running the above should continuous stream tweets to your console. English doesn’t seem to be the popular language at this hour!

Day 6 of #100DaysOfCode DONE

If you enjoyed this, please click 👏 so that others can enjoy it as well. Follow me on Twitter @HariniLabs to get the latest updates or just to say Hi :)

PS: I curate a bi-weekly #WomenInTech newsletter for a dose of inspiration from the world of tech and yes men can signup too! Get it here :)