Fast Fashion: Industry Sentiments and Predicting Ratings for Dresses on SHEIN

What trends surround the Fast Fashion (FF) industry? How has it changed over time? Can we predict the ratings of popular FF products?

Bailey Campbell
9 min readApr 30, 2022

Fast Fashion (FF) has become a huge industry over the past decade. The surge in its popularity can be explained by the quick changes in fashion trends that we see from fashion experts and influencers as well as the affordability of the products. Consumers see something they like from their favorite influencers, then it is a race for FF companies to reproduce the look.

Everything about the industry is quite literally “fast”: the changes in trends, the rapid production, the hasty consumer decisions, and the short amount of time before a garment is disposed. For these reasons, the FF industry faces scrutiny for the social and ethical problems it creates. The poor quality of its clothes (made often from plastics), the waste it creates, and the terrible working conditions call into question the net impact of FF.

Consumer Sentiments Surrounding Fast Fashion

For young consumers interested in fashion, the FF industry has provided a cheap and quick way to buy into constantly changing trends. The biggest group of these consumers fall under the GenZ and Millenial category. These consumers are exposed to trends via social media and influencer content. There are tens of thousands of videos across YouTube and Tik Tok of people revealing their “clothing hauls” from various FF websites.

Using the TubeR package on R to access YouTube’s API, I was able to scrape the descriptions of videos appearing under the search of “Fast Fashion.” Here are the results:

a wordcloud of consumer sentiments about FF

Generated from these video descriptions, there are a few themes across the the most common words used. The most prominent theme involves the impact of FF on sustainability. From this, we can also see some of FF’s biggest brands, both brick and mortar (Zara & Primark) as well as online (Shein). We also note the idea of trends and social media (Instagram, TikTok, and YouTube).

Taking a deeper dive into one of the most watched videos, Fast fashion — The shady world of cheap clothing | DW Documentary, we look at sentiments across almost 1500 comments to see if there is a trend in the emotional valence of some of the most liked comments.

Using the syuzhet package in R, I evaluated the average sentiment of individual comments based the number of likes. I had hypothesized that comments with a large number of likes would have more negative sentiments, however, there is not a clear case of correlation between these two factors. (Note: not pictured is comments with over 51 likes, as the comment with the most likes was >800 and would make the image illegible).

It’s also important to know that the method of retrieving sentiments did not capture the emotional valence of some comments as intuitively as a person would. For example, one comment with 688 likes reads “The people making these clothes don’t have proper clothes to wear themselves at times. It’s honestly infuriating when educated people promote these brands without having any empathy for the people producing them…” and has a positive sentiment of 6.9.

The Growing Interest in FF Over Time

We will look at the growing interest in Fast Fashion based on google trends over time. For this I used the gtrendsR package to look at hits for “Fast Fashion” as a topic over time.

From this, we can see that there has been a stable rise in the interest in fast fashion since early 2017 to now. Similar to looking at the words used commonly in YouTube video descriptions concerning fast fashion, I also looked at the related topics that arise under the FF google search.

Again we see themes of sustainability and specific brand names pop up under the related topics.

Using both the wordcloud and the GT related topics concerning FF, I chose to take a closer look at one particular company: SHEIN. I wanted to look at an online-only FF retailer, both because of the access to many products and the detrimental impact of an online FF retailer due to the enormous inventory and quick nature of consumption it provides. SHEIN was the most common example of such retailer based on the previous analysis.

Next, I will look at the GT hits for SHEIN over time.

Hits for SHEIN saw a clear spike that began at the end of March 2020 that has persisted through today. There is almost no doubt that this is due to COVID, which forced brick and mortar stores to close and consumers to spend more time and money online shopping.

SHEIN Dress Data

There were a few reasons I chose to look at data for dresses specifically. The biggest reason was that I wanted the products I looked at to have a theme and for the data to be workable. One of the most notable products in FF are dresses. Kim Kardashian and Kylie Jenner are two influencers who started the surge in FF back in 2018 when Fashion Nova was replicating the tight, curve oriented dresses worn by the sisters. This pattern has continued with all FF sites recreating common influencer looks.

I also wanted to limit the item type so that I could create additional variables based on fashion trends. For example, knitted dresses are in-style but knitted pants are not. When creating a variable for this trend, I would not want other clothing items to fall in the trendy category. It was also easier to research and decipher trends for one item-type.

Using a combination of WebScrapper and rvest, I gathered the stars, price, number of size options, and number of reviews for 1200 dresses. The correlation matrix and plot can be found below.

Most Prices fall below $20, which is another reason FF is accessible. Correlation is not easily evaluated by these plots.

The positive correlation between stars and reviews is strongest. This could possibly be explained by people getting “points” for reviewing items (which, in my experience, people leave five stars without much thought) and the more 5-star reviews a product has, the more negative reviews are canceled out. The slightly positive correlation between size options with reviews and stars also can be explain by more people being included (more consumers to leave reviews) and more flexibility for product fit (higher reviews).

I was surprised that there was no correlation with price and stars, since a common consumer heuristic is that higher price = higher quality.

The Regression Model

Before separating my data into a train and test set, I wanted to create variables to account for fashion trends. Based on this article, by InStyle, knit dresses, dresses with layers, and dresses with ruffles are trending. To create this covariate, I selected dresses that had those words in the dress name.

Note: I wanted to do sentiment analysis on review content, but scraping the page would only give the top 3 reviews which were often biased.

Next, I separated the data randomly, into 80% train data to make the model, and 20% test data to test out of sample fit.

For this regression I wanted to predict the number of stars a dress had (which on Shein includes decimals) to see if price, size options, number of reviews, and trend covariates had an impact on the favorability of a dress. I ran a regression to predict the number of stars, which returned this model summary.

The model gave significant coefficients for number of reviews and number of size option, but no other variable was significant. What’s interesting is the trend variables for knitted and layered dresses were actually negative, which could mean the quality wasn’t up to standard, or the trend was not as popular as one would think.

The in-sample fit seemed to miss a lot of the products, specifically ones with very high stars and 1-star reviews. So I also ran a new model tested predictions for high-rated products (>3.5 stars). The model gave these parameters:

Now only our number of reviews variable was significant and our trend variables still returned negative coefficients. When testing for in sample accuracy, a classification of >0.5 on our predictions was used (after testing what had the best in sample accuracy) and had an accuracy of 58.5%.

Out-of Sample Fit

Finally, I tested both models for out-of-sample fit and generated the plot and accuracy, respectively.

From this, we see the same issues as with the in-sample fit. It is under-capturing very high rated products and 1-star products.

When testing for out-of-sample accuracy on our predictions for whether or not a product was highly-rated, it had an out-of-sample accuracy of 60.7%.

Conclusion + Caveats

Many of my expectations were dismissed when exploring the data and running the models. When I reviewed the wordcloud and sentiments of the fast fashion industry, I saw that certain brands and ethical implications surrounding the sustainability of fashion were highlighted. Words like “cheap” and “cost” were a repeating pattern. This could lead one to believe that price would play an important factor in the ratings of products, however, in exploring the data, this was not the case.

Additionally, fast fashion draws in young consumers due to the constant cycling of products to keep up with fashion and has only grown in popularity throughout the COVID pandemic. SHEIN, just under their “New in” tab, has over 45,000 products. Given that following trends is a major reason for shopping on a FF website, I was shocked to see that the covariates for trend were not significant in the model.

Data I really wish I had access to is the sales data. If I was able to access SHEIN sales data, I would like to explore if the trends covariates were able to explain sales counts, since this seems more plausible than ratings, given how many products on SHEIN have high stars.

One implication of my initial findings, is that sustainable practices need to be addressed in fashion. Because sites like SHEIN are successful because of how they cycle out clothes, environmental agencies should consider finding ways to educate people about their practices as well as lobby for regulations. Additionally, high rated products on SHEIN were partially explained by the amount of sizes available. While I don’t think they should be making even MORE clothes, I suspect this finding could hold true across the fashion industry, and sustainable fashion brands should look to expand their size selection (and many have already done so).

Overall, the model’s fit was not enough to determine the ratings of products very accurately. The base accuracy (if we assumed none of our data had high stars) of our in sample fit for prediction high stars was 56.2%, which is only 2.3% worse that our model’s accuracy.

