Haripriyagude
Python’s Gurus
Published in
11 min readJun 15, 2024

--

Analyzing Consumer Behavior with SQL and Python: A Deep Dive into Amazon Sales Data

In the fast-paced world of e-commerce, understanding consumer behavior is paramount. Businesses constantly seek insights that can drive strategic decisions and optimize marketing efforts. In a recent project I explored, a detailed analysis of consumer behavior was conducted using SQL and Python, focusing on Amazon’s sales data. Let’s take a closer look at how data can be harnessed to reveal critical business insights.

Connecting to the Data Source

The project begins by establishing a connection between the Jupyter Notebook environment and a MySQL database. This step is crucial as it allows for direct queries to the database and fetches real-time data for analysis. Using Python’s mysql.connector library, the notebook seamlessly connects to the MySQL server, demonstrating the power of Python in handling database operations.

Tools and Techniques

The project leverages several powerful tools and techniques:

  • SQL for Data Querying: Complex SQL queries are executed to fetch tailored datasets for analysis. This demonstrates the effectiveness of SQL in handling large datasets and performing complex data retrieval operations.
  • Python for Data Analysis: Python’s versatility is showcased through the use of libraries like Pandas for data manipulation, along with Matplotlib and Seaborn for data visualization, providing a comprehensive toolkit for data scientists.

Exploratory Data Analysis

The initial phase of the project involves exploratory data analysis (EDA), where basic properties of the dataset are examined. This includes checking the dimensions of the data, understanding the schema, and previewing the first few rows. EDA is an essential step as it provides a snapshot of the dataset, helping identify patterns, missing values, and potential anomalies that could influence further analysis.

In-depth Analysis

  • Product Line Analysis: This section delves into the performance of different product lines. It aims to identify which products are performing well and which are not, enabling targeted strategies that can enhance product visibility and sales performance.
  • Sales Analysis: Another crucial aspect is the sales analysis, which looks at sales trends over time, seasonal variations, and other factors that affect sales. Understanding these patterns helps in forecasting and planning for future inventory needs.

. Customer Analysis

Checking Missing values

Unique values in our dataset

Converting Categorical Columns

  • Categorical Conversion: The code converts several columns to the ‘category’ data type, which includes columns like invoice_id, branch, city, customer_type, gender, product_line, payment_method, timeofday, dayname, and monthname. Using the 'category' data type is beneficial for performance optimization and is appropriate for columns with a limited number of unique values. This step is crucial for ensuring that operations like grouping and sorting that are based on these categorical variables are more efficient.

Converting Numerical Columns

  • Numerical Conversion: The snippet converts columns such as quantity, VAT, gross_margin_percentage, Rating, unit_price, total, cogs, and gross income to numeric data types using pd.to_numeric(), with an 'errors' parameter set to 'coerce'. This ensures that any non-numeric values that might cause errors during conversion are instead converted into NaNs (missing values), safeguarding the integrity of your dataset.

Converting Temporal Columns

  • Datetime Conversion: Temporal data for the date and time columns is converted into the datetime data type using pd.to_datetime(). This is particularly important for time-series analysis or any operations that need to interpret these columns temporally (e.g., sorting by date, calculating durations, or extracting components like day of the week).

Ensuring Data Integrity

  • Error Handling: The use of errors='coerce' in numerical and datetime conversions is a prudent choice for handling data that might not uniformly conform to expected formats, thereby avoiding runtime errors that would occur due to incompatible data types.
  • Categorical Columns: These include columns like invoice_id, branch, city, customer_type, gender, product_line, payment_method, timeofday, dayname, and monthname. These columns typically contain non-numeric data that represent types or categories.
  • Numerical Columns: Columns such as unit_price, quantity, VAT, total, cogs, gross_margin_percentage, gross_income, and Rating are classified as numerical. These columns contain data that are numeric in nature and suitable for mathematical calculations necessary for statistical analyses.
  • Temporal Columns: The date and time columns are identified as temporal, which are crucial for any analysis involving time series data or trends over time.

Lets Derive Some Insights

Analysis of the Bar Chart
  • Mandalay: Has slightly fewer transactions than the other two cities, as indicated by the blue bar.
  • Naypyitaw: Shows a very similar count to Yangon, represented by the teal bar.
  • Yangon: Appears to have the highest number of transactions among the three, as shown by the green bar.

Insights and Implications

This chart provides a clear visual comparison of transaction volumes across the three cities. Such data is crucial for multiple reasons:

  1. Resource Allocation: Higher transaction volumes might indicate a need for more resources or inventory in those areas.
  2. Marketing Strategies: Knowing which cities have higher engagement can help tailor marketing campaigns to boost sales in areas with fewer transactions or optimize them in areas already performing well.
  3. Operational Decisions: Analysis like this assists in making informed decisions regarding opening new stores, optimizing logistic routes, or planning local promotions.

The bar chart titled “Mapping of Branches to Cities” visually represents the distribution of branches across three cities: Mandalay, Naypyitaw, and Yangon. From the chart, we can observe:

  • Branch A is located in Yangon.
  • Branch B is situated in Mandalay.
  • Branch C is found in Naypyitaw.

Strategic Implications

The distribution implies that the company has a presence in all three major cities, possibly covering a broad market across the region. This can be advantageous for a balanced business operation, ensuring stable growth and risk distribution across different market environments. Moreover, strategic decisions about opening new branches or expanding existing ones can be better informed by understanding the existing geographical footprint.

Analysis of the Bar Chart

  • Cash: Accounts for 34.4% of the transactions, represented by the blue bar.
  • Credit Card: Slightly less used, accounting for 31.1% of the transactions, shown by the orange bar.
  • E-wallet: Matches the frequency of cash usage very closely, covering 34.5% of transactions, indicated by the green bar.

Insights and Implications

This distribution suggests a relatively balanced use of all three payment methods, which has several implications:

Consumer Flexibility: The balanced usage indicates that customers enjoy and utilize the flexibility of choosing from multiple payment methods. This flexibility can enhance customer satisfaction and loyalty as they can choose the payment method that best suits their needs or preferences.

Financial Strategy: For the business, understanding which payment methods are preferred can help in tailoring financial and operational strategies. For instance, negotiating better transaction fees with banks and financial service providers for the most commonly used payment methods could reduce costs.

Marketing and Promotions: Promotions can be designed to encourage the use of a less popular payment method, or to reward the most frequent method, depending on strategic goals. For example, offering cash-back for e-wallet payments could further boost its use.

Risk Management: Diversification in payment methods also helps in mitigating risks associated with the operational issues of any single method. It ensures that the business can continue smooth transactions even if one payment system faces a temporary issue.

Insights and Implications

  1. Market Trends: The high sales in Sports and Travel might suggest a trend or rising interest in outdoor activities or travel, possibly influenced by seasonal factors or recent shifts in consumer lifestyle preferences.
  2. Product Focus: Companies can use this data to focus on high-performing product lines for marketing campaigns or stock inventory adjustments to meet consumer demand effectively.
  3. Strategic Planning: Understanding which product lines are performing well can help in strategic planning, including promotional strategies, bundle offers, and potentially expanding product lines that show promise.
  4. Consumer Preferences: The distribution of sales across these categories provides insights into consumer preferences and spending habits, essential for tailoring product development and marketing strategies.
  • Profit Margins: Comparing these sales figures with profit margins for each category to determine not just revenue, but profitability.
  • Seasonal Variations: Examining how sales for these categories fluctuate seasonally to optimize stock levels and marketing throughout the year.
  • Customer Segmentation: Analyzing sales data along with customer demographics to further refine marketing strategies and product offerings.

Insights and Implications

Revenue Trends: The increase in revenue from February to March might suggest a growing trend or a seasonal influence that boosts sales in March. Understanding the factors contributing to this increase could be crucial for strategic planning.

Marketing and Sales Initiatives: If specific marketing campaigns or sales initiatives were launched in March, their impact appears to be positive and could be worth repeating or maintaining. Conversely, the strategies in February might need reassessment due to the lower revenue.

Consumer Behavior: The chart may reflect consumer purchasing patterns, which could be influenced by factors such as holidays, pay cycles, or seasonal changes. Identifying these patterns can help in tailoring future sales strategies to maximize revenue during peak purchasing times.

Operational Planning: Knowing which months generate more revenue allows for better resource allocation, including staffing, inventory management, and budget planning, to ensure that the business is prepared to meet increased demand.

Insights and Implications

Cost vs. Revenue: By comparing these costs with the previously discussed revenue figures, one can assess the profitability for each month. Although revenue increased from February to March, so did the costs, suggesting that while sales were higher, the associated expenses were also substantial.

Efficiency Evaluation: This trend in COGS might prompt a review of pricing strategies, supplier negotiations, or cost management practices to ensure that the rise in costs doesn’t disproportionately offset revenue gains.

Inventory and Procurement Strategy: High COGS in March could be reflective of either increased sales volume requiring more inventory or possibly higher prices from suppliers during this period. Analyzing these factors can help in better inventory management and cost control.

Profit Margin Analysis: Calculating the profit margins for these months will provide deeper insights into the financial health of the business. If the margins are shrinking as the months progress, it might be a sign to reevaluate the company’s cost structure or sales strategies

Further Analysis

  • Breakdown by Product: Analyzing the COGS by product line might reveal which items are most costly to sell and which are most profitable, guiding decisions about which products to promote or expand.
  • Historical Comparison: Comparing these figures against previous years can indicate whether the rise in COGS is part of a normal seasonal pattern or an anomaly that needs addressing.
  • Expense Management: Investigating other operating expenses and their trends alongside COGS can provide a fuller picture of the company’s overall expenditure and profitability.

Insights and Implications

Product Prioritization: The significant revenues from electronic and fashion accessories suggest these are core areas that could benefit from focused marketing and stock management efforts to maximize profitability.

Market Trends: High sales in electronic accessories could reflect a broader market trend towards technology products, which could influence future product development and procurement strategies.

Cross-Selling Opportunities: For product lines with lower revenues, such as sports and travel, there may be opportunities to increase sales through bundling with more popular products, cross-promotions, or targeted marketing campaigns.

Resource Allocation: Allocation of resources such as marketing budgets, shelf space, and inventory could be adjusted based on the revenue figures to optimize overall profitability.

Insights and Implications

Revenue Concentration: Mandalay, as depicted in the chart, appears to be the leading city in terms of revenue, which could indicate a higher concentration of business activities or a larger customer base in this area

Market Penetration and Strategy: The variations in revenue among the cities could reflect differences in market penetration. This information is crucial for strategic planning, including where to focus marketing efforts and resource allocation.

Investment and Expansion: The data suggests that Mandalay, being a high-revenue generator, might be ripe for further investment or could be used as a benchmark for replicating successful strategies in the other cities.

Data Accuracy and Reporting: The discrepancy between the visual data and the textual information suggests a need for data verification to ensure accurate reporting and decision-making.

Insights and Implications

Tax Contribution Analysis: The varying levels of VAT collected from each product line can help the company understand which categories are most lucrative in terms of tax contributions, which often correlates with sales volume and pricing strategies.

Strategic Focus: High VAT collections from electronic and fashion accessories suggest these areas might be optimal for targeted growth strategies, marketing campaigns, and stock enhancements.

Pricing and Tax Strategy: Understanding VAT contributions can also assist in pricing strategy, ensuring that prices are optimized not just for revenue but also for tax efficiency.

Product Development: Lower VAT collections from categories like sports and travel could prompt a review of the product offerings in this line or encourage promotional activities to boost sales.

Further Analysis

  • Profit Margins vs. VAT: Comparing VAT data with profit margins for each product line could provide insights into the overall profitability of each category, helping to adjust strategies where necessary.
  • Seasonal Variations: Examining how VAT contributions change seasonally could help in planning inventory and marketing efforts to align with peak purchasing times.
  • Market Trends: Analyzing external market trends and comparing them with VAT contributions could reveal how external factors influence sales and tax collections, aiding in more responsive business strategies

For more further analysis Check Out my GitHub repository : https://github.com/priya1100/Resume_Projects_Data_Science/tree/main/Consumer%20Behaviour%20Analysis

Python’s Gurus🚀

Thank you for being a part of the Python’s Gurus community!

Before you go:

  • Be sure to clap x50 time and follow the writer ️👏️️
  • Follow us: Newsletter
  • Do you aspire to become a Guru too? Submit your best article or draft to reach our audience.

--

--