Could Borat Have Boosted Kazakh Tourism? Exploring Sentiments Toward Mockumentary

A curious kazakh, mining opinions on Borat reviews

Assel Kassenova
5 min readDec 28, 2023

Not too long ago, I came across an article by Stephen Pratt titled “The Borat Effect: Film-Induced Tourism Gone Wrong.” As a proud Kazakhstani its headline immediately seized my curiosity. In his piece, Pratt argues that despite the mockumentary’s controversial content, it has a positive impact such as raising destination awareness. This got me thinking about the untapped potential of “Borat” to boost tourism interest in Kazakhstan.

Recently, I noticed that Kazakh Tourism has used “Very Nice” as a catchphrase in its advertisements. This caught my attention and got me thinking about the timing of their campaign. IS that worth it? Is it too late to capitalize on “Borat” considering that the 2020 version didn’t achieve the same level of success as the original?

This curiosity led me to conduct a sentiment analysis using TF-IDF on “Borat” reviews from IMDb and Rotten Tomato to gain insights into overall opinions surrounding the 2020 version of the Mockumentary.

While this project contains of several key components, including:

1.Data sourcing and ingestion (web scraping using Selenium)

2. Creating a database & Concatenating two datasets.

3. Performing basic pre-processing

4. Build machine learning model

5. Visualize data. Bigrams, Trigrams and final sentiment proportion.

This article will primarily focus on conveying the ultimate outcomes. If you wish to explore the complete code and delve into the details of data manipulations, click this link.

I trained my classifier using TF-IDF on a dataset with 50,000 labeled movie reviews from IMDb. To assess the classifier’s performance, I plotted an ROC curve. A Receiver Operator Characteristic (ROC) curve illustrates the diagnostic ability of binary classifiers. The closer the curve comes to the top-left corner, the better the performance. As you can see in the plot below, the ROC curve lies at 0.95, very close to 1, and near the top-left corner. This suggests that the classifier performs quite well.

To visually confirm the classifier’s performance, here are word clouds that the classifier has identified as either negative or positive based on their weight. Words like “great,” “perfect,” and “amazing” were classified as positive, while words like “waste,” “worst,” “bad,” and “boring” were classified as negative, which seems accurate.

The word cloud from IMDb 50K training dataset

Now, with the trained model applied to the Borat reviews dataset, let’s take a closer look at the word clouds with negative and positive words based on frequency.

Borat reviews word cloud of most frequent negative(red) and positive(blue) reviews

While classifying single words as negative or positive is straightforward, let’s delve deeper by examining four random reviews with probabilities of positive and negative weight. It might be a bit challenging to read since the text is entirely lemmatized and devoid of stop words. However, the general sentiment of each review is evident.

Random reviews with probability of positivity
Random reviews with probability of positivity

For instance, the first review, “Brilliant, must watch!!,” has a negativity probability of 0.01, which seems very reasonable.

Random reviews with probability of negativity

Text analysis is best digested using frequently used bigrams and trigrams to gain a more in-depth understanding of viewers opinions.

In this case, we can clearly see that out of 250 mentions of “Baron Cohen,” the majority, 190, are positive. Interestingly, “Kazakhstan” does not appear among the top 20 frequently used n-grams, except in the movie title. This suggests that viewers are primarily interested in Sacha Baron Cohen and the movie’s concept itself, rather than associating it closely with the country.

The distribution of sentiment in the reviews didn’t come as a surprise, with 39.6% leaning towards the negative end and 49.4% expressing positive sentiments, leaving only 13.8% with a neutral stance. This is quite typical for controversial content, as it tends to evoke strong emotions among viewers.

However, my curiosity led me to explore further, particularly the absence of any notable mentions of “Kazakhstan” in these reviews, whether in a positive or negative context. To gain more context, I decided to turn to Google Trends.

Surprisingly, despite the massive success of “Borat: Cultural Learnings from Kazakhstan” in 2006, “Google Trends” data showed that the overall search volume for “Borat” accounted for just 13.9% of the total, while “Kazakhstan” dominated with 86.1% from 2006 to 2023. It’s important to note that this data might not be entirely accurate due to potential changes in data collection methods over the years. However, it’s evident that the 2020 release of “Borat” resulted in a significant drop in search interest, with “Borat” queries decreasing to 6.42%.

In conclusion, you might be wondering about my personal opinion as a Kazakh regarding “Borat.” If you were expecting me to denounce the film and say, “Kazakhstan is nothing like this,” you’d be mistaken. The truth is, Kazakhstan is, for the most part, portrayed fairly accurately in the mockumentary, just like any other country depicted in a film. Unfortunately, ignorance knows no geographical bounds; it exists everywhere. Perhaps, that’s the primary reason behind the film’s success.

This analysis has provided valuable insights into the sentiments surrounding the 2020 version of “Borat” and its potential impact on tourism in Kazakhstan. It’s been an engaging journey through the realms of data analysis, sentiment classification, and the intricate dynamics between a controversial film and its representation of a destination.

Thank you for reading! With any questions or recommendation please feel free to connect on LinkedIn.

--

--