Data Analyst With SQL : Optimizing Online Sports Retail Revenue

Muhammad Fahran Januar
12 min readAug 14, 2023

Analyze product data for an online sports retail company to optimize revenue.

Source :https://www.google.com/url?sa=i&url=https%3A%2F%2Fkissfmmedan.com%2Fadidas-dan-nike-nggak-disadari-ternyata-mulai-langka%2F&psig=AOvVaw0e7UZZ33eTbS422rlvtMrX&ust=1692048726202000&source=images&cd=vfe&opi=89978449&ved=0CBEQjRxqFwoTCKio7O3K2oADFQAAAAAdAAAAABAE

Project Description

Sports clothing is a booming sector!

In this notebook, you will use your SQL skills to analyze product data for an online sports retail company.

In this project, i’ll will work with numeric, string, and timestamp data on pricing and revenue, ratings, reviews, descriptions, and website traffic.

And will use techniques such as aggregation, cleaning, labeling, Common Table Expressions, and correlation to produce recommendations on how the company can maximize revenue!

TASK LIST

  1. Count the total number of products, along with the number of non-missing values in description, listing_price, and last_visited
  2. Find out how listing_price varies between Adidas and Nike products.
  3. Create labels for products grouped by price range and brand.
  4. Calculate the average discount offered by brand.
  5. Calculate the correlation between reviews and revenue.
  6. Split description into bins in increments of one hundred characters, and calculate average rating by for each bin.
  7. Count the number of reviews per brand per month.
  8. Create the footwear CTE, then calculate the number of products and average revenue from these items.
  9. Copy the code used to create footwear then use a filter to return only products that are not in the CTE.

1. Counting missing values

Sports clothing and athleisure attire is a huge industry, worth approximately $193 billion in 2021 with a strong growth forecast over the next decade!

In this notebook, we play the role of a product analyst for an online sports clothing company. The company is specifically interested in how it can improve revenue. We will dive into product data such as pricing, reviews, descriptions, and ratings, as well as revenue and website traffic, to produce recommendations for its marketing and sales teams.

The database provided to us, sports, contains five tables, with product_id being the primary key for all of them:

%%sql
postgresql:///sports

SELECT COUNT(info) as total_rows,
COUNT(info.description) as count_description, COUNT(finance.listing_price) as count_listing_price,
COUNT(traffic.last_visited) as count_last_visited
FROM info
INNER JOIN traffic
ON traffic.product_id = info.product_id
INNEr JOIN finance
ON finance.product_id = info.product_id

- `total_rows`: The total number of rows or entries in the analyzed dataset is 3,179 rows.
- `count_description`: The total number of entries in the “description” column that have non-empty values is 3,117 entries.
- `count_listing_price`: The total number of entries in the “listing_price” column that have non-empty values is 3,120 entries.
- `count_last_visited`: The total number of entries in the “last_visited” column that have non-empty values is 2,928 entries.

2. Nike vs Adidas pricing

We can see the database contains 3,179 products in total. Of the columns we previewed, only one — last_visited — is missing more than five percent of its values. Now let's turn our attention to pricing.

How do the price points of Nike and Adidas products differ? Answering this question can help us build a picture of the company’s stock range and customer market. We will run a query to produce a distribution of the listing_price and the count for each price, grouped by brand.

%%sql
postgresql:///sports

SELECT brands.brand, CAST(finance.listing_price AS INTEGER), COUNT (finance.product_id)
FROM brands
INNER JOIN finance
ON finance.product_id = brands.product_id
WHERE finance.listing_price > 0
GROUP BY brands.brand, finance.listing_price
ORDER BY finance.listing_price DESC
| brand  | listing_price | count |
|--------|---------------|-------|
| Adidas | 9 | 1 |
| Adidas | 10 | 11 |
| Adidas | 12 | 1 |
| Adidas | 13 | 27 |
| Adidas | 15 | 27 |
| Adidas | 16 | 4 |
| Adidas | 18 | 4 |
| Adidas | 20 | 8 |
| Adidas | 23 | 1 |
| Adidas | 25 | 28 |
| Adidas | 27 | 18 |
| Adidas | 28 | 38 |
| Adidas | 30 | 37 |
| Adidas | 33 | 24 |
| Adidas | 36 | 25 |
| Adidas | 38 | 24 |
| Adidas | 40 | 81 |
| Adidas | 43 | 51 |
| Adidas | 45 | 1 |
| Adidas | 46 | 163 |
| Adidas | 48 | 42 |
| Adidas | 50 | 183 |
| Adidas | 53 | 43 |
| Adidas | 55 | 2 |
| Adidas | 56 | 174 |
| Adidas | 60 | 211 |
| Adidas | 63 | 1 |
| Adidas | 66 | 102 |
| Adidas | 70 | 87 |
| Adidas | 76 | 149 |
| Adidas | 80 | 322 |
| Adidas | 85 | 1 |
| Adidas | 86 | 7 |
| Adidas | 90 | 89 |
| Adidas | 96 | 2 |
| Adidas | 100 | 72 |
| Adidas | 110 | 91 |
| Adidas | 120 | 115 |
| Adidas | 130 | 96 |
| Adidas | 140 | 36 |
| Adidas | 150 | 41 |
| Adidas | 160 | 28 |
| Adidas | 170 | 27 |
| Adidas | 180 | 34 |
| Adidas | 190 | 7 |
| Adidas | 200 | 8 |
| Adidas | 220 | 11 |
| Adidas | 230 | 8 |
| Adidas | 240 | 5 |
| Adidas | 280 | 4 |
| Adidas | 300 | 2 |
| Nike | 30 | 2 |
| Nike | 40 | 1 |
| Nike | 45 | 3 |
| Nike | 50 | 5 |
| Nike | 60 | 2 |
| Nike | 65 | 1 |
| Nike | 70 | 4 |
| Nike | 75 | 7 |
| Nike | 79 | 1 |
| Nike | 80 | 16 |
| Nike | 85 | 5 |
| Nike | 90 | 13 |
| Nike | 95 | 1 |
| Nike | 100 | 14 |
| Nike | 110 | 17 |
| Nike | 120 |

3. Labeling price ranges

It turns out there are 77 unique prices for the products in our database, which makes the output of our last query quite difficult to analyze.

Let’s build on our previous query by assigning labels to different price ranges, grouping by brand and label. We will also include the total revenue for each price range and brand.

%%sql

SELECT b.brand, COUNT(f.*), SUM(f.revenue) as total_revenue,
CASE WHEN f.listing_price < 42 THEN 'Budget'
WHEN f.listing_price >= 42 AND f.listing_price < 74 THEN 'Average'
WHEN f.listing_price >= 74 AND f.listing_price < 129 THEN 'Expensive'
ELSE 'Elite' END AS price_category
FROM finance AS f
INNER JOIN brands AS b
ON f.product_id = b.product_id
WHERE b.brand IS NOT NULL
GROUP BY b.brand, price_category
ORDER BY total_revenue DESC;

The brand “Adidas” offers a variety of products with different price ranges. The majority of Adidas products fall into the “Average” category, totaling 1060 products. However, the brand also excels in the “Expensive” category, generating a total revenue of $4,626,980.07. Additionally, Adidas presents luxury products in the “Elite” category, although the quantity is lower (307 products), they contribute significantly to a total revenue of $3,014,316.83. Affordable products are not neglected, as the brand offers 359 products in the “Budget” category, even though the total revenue from this category is lower.

On the other hand, the brand “Nike” also exhibits price variation in its products. Products in the “Budget” category are the primary choice for customers, amounting to 357 products, representing the majority of sales. Despite their large quantity, the total revenue from this category is lower compared to the “Adidas” brand, totaling $595,341.02. On the flip side, Nike also presents luxury products in the “Elite” category, with a smaller quantity (82 products), yet generating a total revenue of $128,475.59. Notably, Nike’s products in the “Expensive” and “Average” categories also contribute significantly to the total revenue.

5. Correlation between revenue and reviews

To improve revenue further, the company could try to reduce the amount of discount offered on Adidas products, and monitor sales volume to see if it remains stable. Alternatively, it could try offering a small discount on Nike products. This would reduce average revenue for these products, but may increase revenue overall if there is an increase in the volume of Nike products sold.

%%sql

SELECT CORR(reviews.reviews, revenue) AS review_revenue_corr
FROM reviews
INNER JOIN finance
ON finance.product_id = reviews.product_id

The correlation result between revenue and reviews is approximately 0.651. This indicates a moderate positive relationship between the two variables. With a correlation value exceeding 0.5, it can be concluded that the higher the number of reviews for a product, the tendency is for an increase in the product’s revenue as well. However, it’s important to note that correlation does not imply causation, meaning it cannot be determined whether more reviews directly cause an increase in revenue or vice versa.

6. Ratings and reviews by product description length

Interestingly, there is a strong positive correlation between revenue and reviews. This means, potentially, if we can get more reviews on the company's website, it may increase sales of those items with a larger number of reviews.

Perhaps the length of a product’s description might influence a product's rating and reviews — if so, the company can produce content guidelines for listing products on their website and test if this influences revenue.

%%sql

SELECT TRUNC(LENGTH(i.description), -2) AS description_length,
ROUND(AVG(r.rating::numeric), 2) AS average_rating
FROM info AS i
INNER JOIN reviews AS r
ON i.product_id = r.product_id
WHERE i.description IS NOT NULL
GROUP BY description_length
ORDER BY description_length;

The results indicate a relationship between the length of product descriptions (description_length) and the average product rating (average_rating). It is evident that as the length of product descriptions increases, the average rating tends to rise. For instance, products with a description length of 600 characters have a higher average rating (3.65) compared to products with shorter descriptions.

7. Reviews by month and brand

As we know a correlation exists between reviews and revenue, one approach the company could take is to run experiments with different sales processes encouraging more reviews from customers about their purchases, such as by offering a small discount on future purchases.

%%sql

SELECT b.brand, DATE_PART('month', t.last_visited) AS month, COUNT(r.*) AS num_reviews
FROM brands AS b
INNER JOIN traffic AS t
ON b.product_id = t.product_id
INNER JOIN reviews AS r
ON t.product_id = r.product_id
GROUP BY b.brand, month
HAVING b.brand IS NOT NULL
AND DATE_PART('month', t.last_visited) IS NOT NULL
ORDER BY b.brand, month;
| brand  | month | num_reviews |
|--------|-------|-------------|
| Adidas | 1.0 | 253 |
| Adidas | 2.0 | 272 |
| Adidas | 3.0 | 269 |
| Adidas | 4.0 | 180 |
| Adidas | 5.0 | 172 |
| Adidas | 6.0 | 159 |
| Adidas | 7.0 | 170 |
| Adidas | 8.0 | 189 |
| Adidas | 9.0 | 181 |
| Adidas | 10.0 | 192 |
| Adidas | 11.0 | 150 |
| Adidas | 12.0 | 190 |
| Nike | 1.0 | 52 |
| Nike | 2.0 | 52 |
| Nike | 3.0 | 55 |
| Nike | 4.0 | 42 |
| Nike | 5.0 | 41 |
| Nike | 6.0 | 43 |
| Nike | 7.0 | 37 |
| Nike | 8.0 | 29 |
| Nike | 9.0 | 28 |
| Nike | 10.0 | 47 |
| Nike | 11.0 | 38 |
| Nike | 12.0 | 35 |

On the Adidas side, there’s an interesting variation in their review counts from month to month:
- Starting from January, Adidas received 253 reviews.
- The review count increased to 272 in February.
- March and April also showed high review counts with 269 and 180 reviews, respectively.
- May and June followed with 172 and 159 review counts.
- Then, July, August, and September recorded 170, 189, and 181 reviews.
- October exhibited a significant increase with 192 reviews, followed by November with 150 reviews.
- The year ended with December having 190 reviews.

On the Nike side, the review trend also captures attention:
- In January and February, Nike received an equal number of reviews, 52 each.
- March showed an increase to 55 reviews.
- Subsequent months like April and May recorded 42 and 41 reviews.
- June displayed a slight increase with 43 reviews.
- July had 37 reviews, while August and September had 29 and 28 reviews, respectively.
- October showed a drastic increase with 47 reviews.
- November and December then decreased again with 38 and 35 reviews each.

8. Footwear product performance

Looks like product reviews are highest in the first quarter of the calendar year, so there is scope to run experiments aiming to increase the volume of reviews in the other nine months!

So far, we have been primarily analyzing Adidas vs Nike products. Now, let’s switch our attention to the type of products being sold. As there are no labels for product type, we will create a Common Table Expression (CTE) that filters description for keywords, then use the results to find out how much of the company's stock consists of footwear products and the median revenue generated by these items.

%%sql

WITH footwear AS
(
SELECT i.description, f.revenue
FROM info AS i
INNER JOIN finance AS f
ON i.product_id = f.product_id
WHERE i.description ILIKE '%shoe%'
OR i.description ILIKE '%trainer%'
OR i.description ILIKE '%foot%'
AND i.description IS NOT NULL
)
SELECT COUNT(*) AS num_footwear_products,
percentile_disc(0.5) WITHIN GROUP (ORDER BY revenue) AS median_footwear_revenue
FROM footwear;

The results indicate that there are 2,700 footwear products in the analyzed dataset. The median revenue for these footwear products is $3,118.36. This suggests that half of the footwear products have revenue above $3,118.36 and the other half have revenue below that figure.

9. Clothing product performance

Recall from the first task that we found there are 3,117 products without missing values for description. Of those, 2,700 are footwear products, which accounts for around 85% of the company's stock. They also generate a median revenue of over $3000 dollars!

This is interesting, but we have no point of reference for whether footwear’s median_revenue is good or bad compared to other products. So, for our final task, let's examine how this differs to clothing products. We will re-use footwear, adding a filter afterward to count the number of products and median_revenue of products that are not in footwear.

%%sql

WITH footwear AS
(
SELECT i.description, f.revenue
FROM info AS i
INNER JOIN finance AS f
ON i.product_id = f.product_id
WHERE i.description ILIKE '%shoe%'
OR i.description ILIKE '%trainer%'
OR i.description ILIKE '%foot%'
AND i.description IS NOT NULL
)

SELECT COUNT(i.*) AS num_clothing_products,
percentile_disc(0.5) WITHIN GROUP (ORDER BY f.revenue) AS median_clothing_revenue
FROM info AS i
INNER JOIN finance AS f on i.product_id = f.product_id
WHERE i.description NOT IN (SELECT description FROM footwear);

The results indicate the presence of 417 clothing products in the analyzed dataset. The median revenue of these clothing products is $503.82. This means that half of the clothing products have revenue above $503.82, and the other half have revenue below that figure.

CONCLUSION

  1. The brand needs to explore opportunities to develop products in the “Expensive” and “Elite” categories that have higher revenue potential.
  2. Focusing on product quality, customer service, and holistic marketing strategies can help improve reviews and revenue.
  3. Analyzing factors that influence monthly review fluctuations and planning appropriate marketing strategies.
  4. Continuously monitoring product categories like footwear and clothing and making relevant price adjustments or marketing strategies.
  5. Using this data as a foundation to design more effective and customer-oriented business strategies.
  6. All of these recommendations can assist the brand in enhancing product performance, increasing revenue, and providing a better experience to customers.

--

--