Data Analysis: Improving marketing and distribution efficiency of manufacturer and merchants
This is a list of 10,000 women’s shoes and their product information provided by Datafiniti’s Product Database. The dataset includes shoe name, brand, price, and more. Each shoe will have an entry for each price found for it and some shoes may have multiple entries. Note that this is a sample of a large dataset. The full dataset is available through Datafiniti. Then I generated a fake 11000 rows datasets, including, id, first_name, last_name, email, gender, ip_address, Postal Code, State, Money spend, for more interesting analysis.
Part 1 Data Cleaning and organizing
- Join two CSV files, then remove unwanted columns, leaving the only id, brand, prices.issale, prices. merchant, states, and money spend.
- The missing value is processed, and the missing value processing cannot be performed because the specific sales situation of the product cannot be known.
- Abnormal value processing, there are individual data garbled in the data, by changing the original file .csv encoding format correction in the editor, from UTF-8 to ANSI format. Some data still have problems after changing the encoding format. Because individual data cannot be manually corrected, etc., it will not be processed.
Part 2 Data Analysis
We will analyze the dataset by three perspectives, customers, items, merchants.
Customer
- How many items are purchased?
2. How many brands are there?
Items
- What are the top ten brands in terms of sales volume?
2. What are the top ten items of sales?
Merchants
- How many sellers are there?
2. What are the top ten sales vendors?
Overall
Based on the above visualization, manufacture and merchants can decide which states they should invest more money in marketing and distribute what portion of total items to each stock. This is a sample dataset based on 11000 customers. It may not accurate, but can provide a data-based suggestion to manufacture and merchants.
Above Vizs shows the types of shoe sales each quarter and a comparison with total money spend on each category. We can see each year’s fourth quarter always have the highest amount of sales. (Shopping season not surprise). The top sales category are boots, sandals, and athletic shoes. And the relationship between sales of the category and total money spend is positive correlated but not a hundred percent. It varies based on the brand, the types of shoes.
Reflection
The personal dataset combined with sales history can show persons consume ability and situation. More importantly, it shows to manufacture and merchants the detailed regional shopping records and customer’s pocket power. It improves the efficiency of merchants’ target marketing investments, forecasts the number of sales in each region for the manufacturer to avoid excess inventory or insufficient inventory. Furthermore, the large the company size, the more cost those analysis will save. From the consumer perspective, those kinds of dataset contribute us to save the resources of society and get a better design or shopping recommendation from merchants. However, it also made our personal information even daily life activity and living details exposed to those companies. The benefits and disadvantages of this approach require our own consideration.