Data Sourcing, Market basket analysis, and SweetViz

Manish Kumar Thota
Analytics Vidhya
Published in
4 min readJun 15, 2020

Exploratory data analysis

Sometimes noticing things which we think are unnecessary could actually be necessary.

There are two kinds of data, public and private data.

Private data 🔐

This is organizational data, and since such data are associated with security and privacy concerns, company approval is needed to access it. It is useful for internal policy-making and for building business strategies by organizations.

Private data includes telecom data, banking data, Energy sectors, media data, Retail data, etc.

What are those strategies and how is it related to data?

Media Industry 📰

we will start from media data and its data journalism which involves electoral results, i.e. which political party has won by how many seats, checking how many seats did BJP take from congress and other minor parties, its a way of enhancing reporting and news writing finding of what strategies they used.

Digging deep into data by scraping, cleansing and structuring it, filtering by mining for specific, visualizing and making a story

Market basket analysis

Retail Industry 🛍️

With retail data, we can perform market basket analysis, for example, when you buy a TV you need to definitely buy a streaming box (eg: Hathway), if you buy a laptop then you should buy its accessories.

Amazon famously uses an algorithm to suggest items that you might be interested in, based on your browsing history or what other people have purchased. Apple’s major profit comes from market basket analysis because we all know that once you buy an Apple product you need to spend double the amount for the accessories 😢, monthly subscriptions, insurance, etc.

Knowing which products sell together can be very useful to any business. The most obvious effect is the increase in sales that a retail store can achieve by reorganizing its products so that things that sell together are found together.

Association strength of the products being purchased

Telecom Industry 📱

You might have observed that when you buy some plan( especially prepaid) it won't be a round figure, let's say you bought a plan for ₹19 and you are not satisfied with it, then the vendor offers you another plan for ₹99 (which includes Hotstar subs/Amazon voucher/extra data, etc) which would be making an extra figure of ₹80, now as a basic intuition you might take the plan where you get more benefits and pay the amount. Thus, they deliberately made you view both the plans and make you buy the one which they want.

A scenario where we can understand Market basket analysis

You can even go through other operators and check out their offers (obviously apart from the one’s you are using)

Public data 🌏

The data that is made publicly available for the purpose of research and learning, which could be accessed via open websites and web scraping.

You can visit data.gov.In, Kaggle, Git to see how the data is made available over various sectors of industry, explore to find out.

Sweetviz

An open-source library, a powerful package that boosts your exploratory data analysis.

You could even use pandas profiling but I feel sweetviz will give you better insights.

pandas profiling that might not be much insightful

Here I used IMDb data set containing 100 rows and 62 columns.

Instead of using DataFrame.describe() and pandas profiling, we could use sweetviz, for comparing, analyzing, and understanding the whole data set based on a target column.

Sweetviz is made on top of pandas profiling and the best part is we could view its demographics very clearly.

Let us see how our data set looks like.

Visualizing the entire data using sweetviz

Its just two lines of code with which you can visualize the entire data set and it gives a boost to your exploratory data analysis.

HTML view
Large scale association heatmap based on all the columns

Sweetviz just adds an extra edge in helping you identify all the particulars and this is just a primary step in exploratory data analysis.

If you have enjoyed my work do give a clap and share it with your friends too.

Stay safe

Support my work ❤️

Keep coding!

Manish Kumar

Data Science Enthusiast

--

--