Diving into Indonesian Skincare Reviews — Part 1: Sentiment Analysis

6 min readMay 8, 2024

In a world of local Indonesian beauty brand, browsing reviews and testimonials are still an important way to decide for a customer to whether buy certain product or not. From a brand’s point of view, it’s also useful to learn for further development.

On May 3rd, I scrapped all 4,387 reviews of 5X Ceramide Barrier Repair Moisture Gel Moisturizer by Skintific from a beauty platform called Female Daily. So, for people who like it, what do they like from that product? What about those who dislike it, why do they dislike it?

Sentiment Analysis

To get those information, first we need to divide those 4,387 reviews into which ones are positive and which ones are negative. In Female Daily, they use 1 to 5 rating system. We will designate 4 and 5 are positive reviews, 1 and 2 are negative reviews, then what about 3 stars?

Total each of sentiments based on the rating

So, I am adapting the steps from this story https://medium.com/@yasirabd/sentiment-analysis-dengan-logistic-regression-50315cd2c836 to train the already decided Sentiments for 1, 2, 4, 5-stars reviews to decide the Sentiment for 3 stars reviews.

I prepared a much smaller document for this training (200 positives and 177 negatives), because unfortunately in the Preprocessing step, there is a lemmatization process that took too long to process thousand of review data.

# import libraries
import pandas as pd
import numpy as np
import requests
import re

# load dataset into pandas with specified encoding
data = pd.read_csv('review_skincare.csv', encoding='ISO-8859-1')
print(data.head())

# check the number of positive and negative tweets
data['Sentiment'].value_counts()

2. For defining the preprocessing function, I dutifully follow the steps, which are:

Cleaning up each of the reviews from punctuation, URL, extra space, then transform into all lower case.
Removing stopwords
Stemming and Lemmatization
Tokenize each of the words into big array list.

# pipeline preprocess
def preprocess(text):
    # cleaning text and lowercase
    output = cleaning_text(text)

    # remove stopwords
    output = remove_stopword(output)

    # stemming and lemmatization
    output = stemming_and_lemmatization(output)

    # tokenization
    output = tokenize(output)

    return output

Example result on one of the positive reviews.

3. Feature extraction steps are where we trained all of the 377 data of tokenized text.

4/5. At the end of the training steps, we will have a custom function that will give us Prediction of the h value. If it’s bigger than 0.5, it’ll be closer to positive sentiments, less than 0.5, it’ll be closer to negative sentiments.

import warnings

# Ignore all warnings
warnings.filterwarnings("ignore")

# predict single tweet
tweet1 = 'Awal coba ngerasa cocok. Tapi setelah diperhatikan, kondisi muka jadi lebih berminyak dan kusam. Memang dia ini mampu memperbaiki skin barrier ku saat rusak dengan tekstur yang ringan dan mudah menyerap aku oke bgtsih, namun untuk pemakaian rutin bisa membuat wajahku seperti raja minyak. Lama kelamaan dibiarin malah jadi beruntusan;('
tweet2 = 'Di kulitku gak begitu cocok, bikin kulit semakin oily, dan repair barier yang rusak juga gak secepat itu, dulu pas wajahku senstif aku beli ini tapi udah hampir habis 1 jar masih belum ada perubahan. Di aku kurang cocok. Dan aku gak pernah beli lagi'
tweet3 = 'beli karena viral, tp maaf mungkin orang lain cocok tapi di aku nggak, semingguan pakai sempet berjerawat, kupikir itu purging tapi kok sampai jerawat batu. memang enak di muka dingin gitu, tapi aku nggak lanjut bahkan nggak aku habisin. sorry tapi aku repurchase sih tidak.'

print( '%s -> %f' % (tweet1, predict_tweet(preprocess(tweet1), freqs, theta)))
print( '%s -> %f' % (tweet2, predict_tweet(preprocess(tweet2), freqs, theta)))
print( '%s -> %f' % (tweet3, predict_tweet(preprocess(tweet3), freqs, theta)))

So, the next step is to set up that prediction function to analyze a csv file full with all of my 3-stars review, then export the result.

After I get all of my reviews divided well into two category, positive and negative, then we will take a look at what make people like or dislike 5X Ceramide Barrier Repair Moisture Gel Moisturizer by Skintific.

WordCloud

Easiest way to visualize talking points is utilizing WordCloud. For positive reviews, most common words are:

moisturizer/moist/melembabkan/pelembab
tekstur
barrier
wangi (in terms of smell)
calming
hidrasi

There are jerawat/bruntusan/kering/minyak too, but it mostly in neutral form, which means sometimes it helps with those problem, sometimes not, sometimes adding, but overall still good.

Tipe pelembab kayak gini emang paling works di aku! Teksturnya lebih ke gel, warna putih-transparan, ga ada wangi yang mengganggu. Pas dipake gak yang langsung nyerep, harus ditunggu 1–2 menit dulu sampai nge-set. Buat di kulitku yang oily dia harus diset lagi pake bedak karena finishnya mukaku jadi keliatan berminyak banget. Setelah pake 1 jar, kulit aku bener2 ga ada masalah. Ga bikin komedoku tambah banyak, ga breakout, kalau ada jerawat hormonal pun dia bisa bikin area yang inflamed jadi lebih kalem. Cocok banget pokoknya.
wahh inisih kalo bisa kukasih bintang 9 mah udah pasti kukasih sih. sejauh ini dia ini jadi satusatunya moisturizer yang ngga does nothing on my skin alias beneran melembabkan yang sampe ke level menyerap ke dalam kulit dan bikin kenyal dan keliatan a bit lebih kenceng gitu. klaim moisturizenya juarak banget sih. tapi diluar itu ngga ada efek apa apasih, calming jerawat yg. meradang nggak bisa, bikin tekstur lebih halus juga nggak terlalu. tapi ngga ada tandingan sih kalo soal melembabkan yang versi nyaman banget dipake

For negative reviews, most common words are:

jerawat
bruntusan
kering
minyak/oily

Further looks into the sentences, indeed the vibes are negative, which means mostly this moisturizer adds to the problem.

Tergiur sama iklan nya dimana-mana yg katanya bagus banget , tapi udah habis 2 jar untuk kulit kering ku gak ada pengaruh sama sekali malah timbul jerawat batu besar, mungkin kulit wajah aku emang yang gak cocok sama kandungan di skincare nya
tergiur karena banyak review di mereka cocok. alhasil cobain yg travel size dulu belinya buat pertama kalinya. tapi tidak sesuai ekspektasi, karena di aku selalu muncul bruntusan. jadi ya gak deh buat beli lagi. ga cocok di aku ternyata. hiks.

Topic Modelling

There is another way of analyzing text that I am learning now, and will try take this reviews further. Below is one of example result of negative reviews topic modelling using Bertopic. Will we get clearer talking points? I write in part 2.

Diving into Indonesian Skincare Reviews — Part 1: Sentiment Analysis

Sentiment Analysis

WordCloud

Topic Modelling

Written by Syifa Addini