Twitter sentiment analysis using Azure Databricks

Adilson Cesar
7 min readDec 15, 2018

--

In this tutorial, you learn how to run sentiment analysis on a stream of data using Azure Databricks in near real time. You set up data ingestion system using Azure Event Hubs. You consume the messages from Event Hubs into Azure Databricks using the Spark Event Hubs connector. Finally, you use Microsoft Cognitive Service APIs to run sentiment analysis on the streamed data.

*Azure Databricks is an Apache Spark-based analytics platform optimized for the Microsoft Azure cloud services platform. Designed with the founders of Apache Spark, Databricks is integrated with Azure to provide one-click setup, streamlined workflows, and an interactive workspace that enables collaboration between data scientists, data engineers, and business analysts.

The following illustration shows the application flow:

Creating an Azure Databricks workspace

the Azure portal, select Create a resource > Data + Analytics > Azure Databricks.

Under Azure Databricks Service, provide the values to create a Databricks workspace.

Choose between Standard or Premium. For more information on these tiers, see Databricks pricing page.

Creating a Spark cluster in Azure Databricks

In the Azure portal, go to the Databricks workspace that you created, and then select Launch Workspace.

You are redirected to the Azure Databricks portal. From the portal, select New Cluster.

In the New cluster page, provide the values to create a cluster.

  • Enter a name for the cluster
  • I chose 1 worker only but you can read more about Cluster Size and Autoscaling here
  • Make sure you select the Terminate after __ minutes of inactivity checkbox. Provide a duration (in minutes) to terminate the cluster, if the cluster is not being used.

Attaching libraries to Spark cluster

You use the Twitter APIs to send tweets to Event Hubs. You also use the Apache Spark Event Hubs connector to read and write data into Azure Event Hubs. To use these APIs as part of your cluster, add them as libraries to Azure Databricks and then associate them with your Spark cluster.

In the Azure Databricks workspace, select Workspace, and then right-click Shared. From the context menu, select Create > Library.

New Library page, for Source select Maven Coordinate. For Coordinate, enter the coordinate for the package you want to add. Here is the Maven coordinates for the libraries.

Spark Event Hubs connector = azure-eventhubs-spark_2.11:2.3.5
Twitter API = twitter4j-core-4.0.7

Once the library is successfully associated with the cluster, the status immediately changes to Attached.

Repeat these steps for the Twitter package

All set!

Creating an Event Hubs instance

An Event Hubs namespace provides a unique scoping container, referenced by its fully qualified domain name, in which you create one or more event hubs. To create a namespace in your resource group using the portal, do the following actions:

In the Azure portal, search event hub and click Event Hubs Services

After making sure the namespace name is available, choose the pricing tier (Basic or Standard). Also, choose an Azure subscription, resource group, and location in which to create the resource.

Create an event hub

To create an event hub within the namespace, do the following actions:

Depending on the load you expect, message frequency and retention characteristics, you can select different throughput units. After that we should have a namespace created, and the next step is to create an event hub. A namespace is like a container for several event hubs.

Type a name for your event hub, then click Create.

Great! Now, take a note of the following entries:

Event Hubs namespace
Event Hubs name
SAS key name (“Policy name”)
SAS key value (“Primary Key”)

These parameters are necessary for the future steps of working with the Event Hub. SAS key name, its value and Connection String can be found by navigating under “Shared Access Policies” option of Event Hubs namespace page.

Creating a Twitter app to access streaming data

To receive a stream of tweets, you must create an application in Twitter. Follow the steps to create a Twitter application.

In the Create an application page, provide the details for the new app, and then select Create an app

In the application page, select the Keys and Access Tokens tab and copy the values for Consumer Key and Consumer Secret. Also, select Create my access token to generate the access tokens. Copy the values for Access Token and Access Token Secret.

Get a Cognitive Services access key

In this tutorial, you use the Microsoft Cognitive Services Text Analytics APIs to run sentiment analysis on a stream of tweets in near real time. Before you use the APIs, you must create a Microsoft Cognitive Services account on Azure and retrieve an access key to use the Text Analytics APIs.

Under Azure Marketplace, Select AI + Cognitive Services > Text Analytics API.

In the Azure portal, search cognitive and click Cognitive Services

After the account is created, from the Overview tab, select Show access keys.

…..At this point, we completed the prerequisites and we’re ready to set up the Spark cluster.

Creating notebooks in Databricks

You create two notebooks in Databricks workspace with the following names

  • SendTweetsToEventHub — A producer notebook you use to get tweets from Twitter and stream them to Event Hubs.
  • AnalyzeTweetsFromEventHub — A consumer notebook you use to read the tweets from Event Hubs and run sentiment analysis.

In the SendTweetsToEventHub notebook, paste the following code.

Replace the placeholder with values for your Event Hubs namespace and Twitter application that you created earlier.
Also, Replace the placeholder with values Keys and Access Tokens as Twitter steps above.

Getting tweets with keyword “Azure” and sending them to the Event Hub in realtime!

To run the notebook, press SHIFT + ENTER. You see an output as shown in the following snippet. Each event in the output is a tweet that is ingested into the Event Hubs.

Voi-lá!!

In the AnalyzeTweetsFromEventHub notebook, paste the following code, and replace the placeholder with values for your Azure Event Hubs that you created earlier. This notebook reads the tweets that you earlier streamed into Event Hubs using the SendTweetsToEventHub notebook.

Paste the following code, and replace the placeholder with values for your Azure Event Hubs that you created earlier

You get the following output:

…..Let's build All-in-one notebook

This time, we will apply a user-defined function to the stream of tweets, that would call the corresponding APIs to determine language and sentiment of each tweet. This can be useful if we want to find the most positive or negative tweets, and react on them appropriately.

Don’t forget to put your own Endpoint connection and Cognitive Access Key!

…Let’s look at the fragment from processing results:

Below we see that the tweet about some issues returned with the score really close to zero, and the tweet about amazing experience got the score really close to 1.

Enjoy! \o/

The second part for this post is coming up where we are going to insert all data into SQL Database.

--

--

Adilson Cesar

I design, implement and support Linux Data Centers for telecommunications and finance companies.