What are people on Twitter talking about the new iPhone12?

Apple recently released a new version of the iPhone — iPhone 12 and iPhone12 Pro, on October 16, 2020. Like many other prospective buyers around the world, I was also waiting for the new version release to upgrade my phone. Before making any purchase, it is always a wise decision to do some homework about the product. Knowing other people’s feedback on the product helps us decide whether it is worth investing in the current version of the phone or yet wait for the next upgrade.

The new phone launch by Apple always creates a buzz in Social Media. People often wait for the new phone launch and post their feedback about it. Twitter provides a platform for users to express their opinion about any event. These opinions are raw and directly come from consumers. Therefore, I decided to do some analysis of what people are saying about the new iPhone. For the analysis, I collected about 5000 tweets in real-time using Twitter API. The tweets with the hashtags ‘#iPhone12’, ‘iPhone12Pro’, and their variations were targeted for the data collection. The data were collected within 6 days time period from October 28, 2020 to November 2, 2020.

The first step for the analysis was to do some exploration about the tweets. It was found that all the tweets were in English and each tweet had 17.21 words on average. The top 15 hashtags mentioned in the tweets are shown in the figure below.

Most frequently used hashtags in the tweets

As we can see, the most frequently used tweets were related to the iPhone and apple which is obvious because I had used these hashtags to collect the tweets. There were also trending hashtags at the same time mostly related to the phone selling shops like Poorvika or the offers for the new phone. The hashtags related to AirPods and MagSafe chargers were also common which indicates that the new Phone might have new updates for these accessories.

The next step was to do some network analysis of the tweets. The relationships represented by the “mentions”, “retweets”, “quoted” and “replied” were extracted from the tweets, and then an undirected graph was drawn using the NetworkX library in python. There were 3804 nodes and 4236 edges in the graph. As we can see from the graph below, the network was very sparse which is plausible given that the tweets were collected without location restriction. So, there are tweets from all over the world in the dataset with rare connections between them.

Network graph of the users

As I wanted to learn what users are talking about the new iPhone on Twitter, I did content mining of the tweets. For content mining, the text in tweets was first cleaned. The mentions, URLs, hashtags, punctuations, and blank spaces in the tweets were removed. The tweets were lemmatized, tokenized and the stopwords were also removed.

After the text is preprocessed, I did the sentiment analysis of the tweets collected using VADER(Valence Aware Dictionary and Sentiment Reader) via the nltk library in python. VADER is a lexicon and rule-based sentiment analysis tool which is attuned to the sentiment expressed in social media. Following is the result of the sentiment analysis.

Sentiment analysis of all the collected tweets

The result shows that most of the tweets related to the iphone12 were positive, closely followed by neutral. Only 7.3% of the tweets were negative. It indicates that the discussion among the users in the network was mostly in favor of the new phone. However, sentiment analysis does not exactly tell us about the topics of discussion in the tweets. Therefore, topic modeling of the tweets was done to extract the most related topics in the tweets.

The LDA (Latent Dirichlet Allocation) model was implemented using the Gensim library in python for the topic modeling. LDA considers each tweet as a collection of topics in a certain proportion and each topic as a collection of keywords again in a certain proportion. Model perplexity and topic coherence were calculated to measure how good a given topic model is. They were -6.97 and 0.38 respectively for the model. The following top five topics were extracted from the model.

Five topics extracted from the topic modeling

The topics extracted show that most of the discussion was about the marketing campaigns related to the new phone release than the user opinion about the phones. The first topic is about the video which reviews the phone while unboxing it. The other topics are marketing messages about the availability of new phones and their accessories, and the giveaways. The finding is consistent with some of the most frequent hashtags extracted earlier that were related to the phone shops and giveaways. It also indicates that the discussions in social media right after the launch is mostly related to the marketing messages. I wasn’t able to find the topic related to the features of the phone or the individual reaction towards them. It makes sense though because it takes time for users to use the phone and give some reviews. Our data collection period was right after the two weeks of the product release and it is found that during the immediate time following the product release, social media posts are mostly filled with marketing buzz.

After analyzing the overall collected tweets, the analysis was done on the subgraphs of the network. As we can see above in the network graph, it is a largely disconnected network except the one in the center where the nodes are densely clustered together. The nodes in that cluster seem to have a closer relationship with each other. Therefore, it is interesting to see what people are talking about together in this close network. So, the largest component from the network was selected using the NetworkX library in python which is shown below.

Network graph of the largest component

The component had 2301 nodes and 2977 edges in it. It was a fully connected network with a clustering coefficient of 0.14, diameter 15, and the average distance between any two nodes 5.17. After the largest component was selected, the tweets related to the nodes in the component was extracted and then analyzed to know the sentiment of the users in the network. The VADER tool in the nltk library in python was again used to do the analysis. The result is shown below.

Sentiment analysis of the tweets from users in the largest component

We can see in the figure above that most tweets in the component are neutral. The negative tweets were just 4.6%, which is less than the percentage of negative tweets in the overall network. To further understand the tweets, the topic modeling was done using the Gensim library. The following are the five topics extracted from the modeling.

From the result, we can infer that most interactions in this network were also around the marketing offers and giveaways. It might be because people are more interested in knowing offers in the market before making the purchase. Right after the phone release, people are more busy focusing on the purchase than giving the reviews. If we want to get consumer reviews, we would have to wait and collect data after some time gap when the consumer’s purchasing spree is reduced. We can also infer from the topics selected that the reviews are mostly shown in the video. However, I didn’t analyze the URL. If we want some initial reviews of the phone, we should be looking at those videos than just the text in the tweets. The result from the subgraph analysis is somehow consistent with the main graph but the majority of sentiments were neutral which shows that majority of users were interacting about the marketing and sales offers with do not reflect polarized views about the phone.

Ethics and Limitations

Twitter’s protocol for data collection and analysis was properly followed. So, there is no ethical issue involved with it. However, there is some limitation and biases with the analysis result. Especially, the analysis was based on a limited number of data points. If I had collected more data points over a longer time period, the result might be different. For data collection, I only used hashtags related to iphone12 as I didn’t want to mix the dataset with other tweets related to other versions of iPhones or other products from Apple. Therefore, it is possible that some users posted feedback about iphone12 with just the hashtag #iPhone and #apple but I missed them. The sentiment analysis tool often has limitations in recognizing texts related to sarcasm, irony, jokes, and exaggeration which is common in social media posts like tweets. It might create biases in the result that we infer from our analysis.

--

--