Data Analysis: Improving marketing and distribution efficiency of manufacturer and merchants

Shopping History Exploratory Data Analysis

Ben Niu
4 min readMar 16, 2019

This is a list of 10,000 women’s shoes and their product information provided by Datafiniti’s Product Database. The dataset includes shoe name, brand, price, and more. Each shoe will have an entry for each price found for it and some shoes may have multiple entries. Note that this is a sample of a large dataset. The full dataset is available through Datafiniti. Then I generated a fake 11000 rows datasets, including, id, first_name, last_name, email, gender, ip_address, Postal Code, State, Money spend, for more interesting analysis.

Part 1 Data Cleaning and organizing

  1. Join two CSV files, then remove unwanted columns, leaving the only id, brand, prices.issale, prices. merchant, states, and money spend.
  2. The missing value is processed, and the missing value processing cannot be performed because the specific sales situation of the product cannot be known.
  3. Abnormal value processing, there are individual data garbled in the data, by changing the original file .csv encoding format correction in the editor, from UTF-8 to ANSI format. Some data still have problems after changing the encoding format. Because individual data cannot be manually corrected, etc., it will not be processed.

Part 2 Data Analysis

We will analyze the dataset by three perspectives, customers, items, merchants.

Customer

  1. How many items are purchased?
Sale 6004, Not Sale 27797

2. How many brands are there?

Total for 1361 brands

Items

  1. What are the top ten brands in terms of sales volume?

2. What are the top ten items of sales?

Merchants

  1. How many sellers are there?
290 Merchants Total

2. What are the top ten sales vendors?

Overall

Map based on Longitude (generated) and Latitude (generated). Color shows the sum of Money spent. Details are shown for State.
Map based on Longitude (generated) and Latitude (generated). Color shows details about the State. The marks are labeled by the sum of the Number of sales.

Based on the above visualization, manufacture and merchants can decide which states they should invest more money in marketing and distribute what portion of total items to each stock. This is a sample dataset based on 11000 customers. It may not accurate, but can provide a data-based suggestion to manufacture and merchants.

The plots of the count of Categories and sum of Moneyspend for Date Added Quarter. Color shows details about Categories. The view is filtered on the count of Categories, which includes values less than or equal to 611.
The plot of the count of Categories Sales Quarterly. Color shows details about Categories. The view is filtered on the count of Categories, which includes values less than or equal to 611.

Above Vizs shows the types of shoe sales each quarter and a comparison with total money spend on each category. We can see each year’s fourth quarter always have the highest amount of sales. (Shopping season not surprise). The top sales category are boots, sandals, and athletic shoes. And the relationship between sales of the category and total money spend is positive correlated but not a hundred percent. It varies based on the brand, the types of shoes.

Reflection

The personal dataset combined with sales history can show persons consume ability and situation. More importantly, it shows to manufacture and merchants the detailed regional shopping records and customer’s pocket power. It improves the efficiency of merchants’ target marketing investments, forecasts the number of sales in each region for the manufacturer to avoid excess inventory or insufficient inventory. Furthermore, the large the company size, the more cost those analysis will save. From the consumer perspective, those kinds of dataset contribute us to save the resources of society and get a better design or shopping recommendation from merchants. However, it also made our personal information even daily life activity and living details exposed to those companies. The benefits and disadvantages of this approach require our own consideration.

--

--

Ben Niu

DiDi Global|Information Science at CU-Boulder | Analytics at UChicago | Get in Touch https://www.linkedin.com/in/ben-niu-5314b2107/