Let’s discover the data intelligently using LUX

Shubham Patel
Analytics Vidhya
Published in
7 min readApr 25, 2021

An automated way to quickly discover the insights even when you don’t know what to uncover.

Photo by Ashes Sitoula on Unsplash.

This post will cover the basics of LUX: A Python API for Intelligent Visual Discovery and is divided into the below sections.

  1. What is LUX?
  2. What problem does LUX solve?
  3. Installing LUX
  4. Basic data exploration using LUX and its core concepts.
  5. Links to LUX functionalities not covered in this post.
  6. Conclusion.

1. What is LUX?

LUX is a Python Library that automates a part of the data exploration process by intelligently recommending visualizations. LUX uses an interactive jupyter widget using which we can explore the data in the notebook itself? To understand it better we need to first focus on what problem it solves for a data scientist.

2. What’s the problem LUX solves?

Imagine you are working on a huge dataset and initially you don’t have an idea from where to start exploring the data. There are a lot of columns and you may want to use a plotting library like seaborn and try to use different plots for the data or their combinations. It means you have to write code to plot and see the relationship. Then you start wondering and you thought to use a different plot but this time with ‘binning’ to see a closer relationship.

Eventually, you are writing a lot of code to achieve your end goal in the EDA phase which is to gain some insights from the dataset.

But the question is why we need to put more effort into writing code here? is there is any automated way that can plot the data based on our intent? Yes, there is and that is what LUX is all about.

This is the problem that LUX solves.

3. Installing LUX

Its installation is fairly simple. You just need to run the below commands. Please install LUX in a separate virtual environment to avoid any dependency conflicts.

# insall LUX using pip
pip install lux-api
# install and enable Jupyter nbextension
jupyter nbextension install --py luxwidget
jupyter nbextension enable --py luxwidget

Now, let’s start exploring the dataset.

4. Basic Data Exploration using LUX and its core concepts.

Let’s take a simple Diamond dataset from Kaggle for exploration and see what executing the cell containing the df object returns.

diamonds_df = pd.read_csv('./diamonds.csv', index_col=0, nrows=5000)
diamonds_df
LUX Toggle button for switching into LUX visualization mode.

See now we have one toggle button after installing and enabling lux nbextension. This button can be used to toggle between the pandas df display mode(default) and LUX display mode which contains visualizations.

This visualization which LUX provides automatically is called ‘Recommendation’.

4. 1 What is Recommendation?

A recommendation is what LUX returns to highlight some interesting patterns or observations in our dataset. A recommendation highlights the important insights.

See in the above image that it has 3 tabs.

  • Correlation
  • Distribution
  • Occurrence

These three tabs are the Analytical Actions as per the terminology used in LUX.

4.2 What are Analytical Actions?

Analytical Actions are the different analyses that can be performed on the data such as Correlation, Distribution, and Occurrence. These actions are automatically recommended by LUX based on the dataset and the type of variables. The actions are also recommended based on the specified ‘Intent’(covered later) of the user.

  1. Correlation
Correlation between quantitative attributes.

Correlation shows the relationship between two quantitative attributes as mentioned in the above image. This by default shows the highly correlated attributes on the left and least correlated attributes on the rightmost. This way of displaying relationships is useful as a user may want to see the highly correlated attributes first.

2. Distribution

Distribution of quantitative attributes.

The distribution tab plots the distribution of quantitative attributes by automatically binning them. Here also, LUX plots the attributes first which has a skewed distribution. Rightmost plots will most likely contain normally distributed attributes.

3. Occurrence

Occurrence plots for categorical attributes.

Occurrence plots the frequency of categorical attributes using bar charts as shown in the above image. Again, here the LUX will plot the uneven occurrence first and then the more even ones.

So this is how the LUX automatically plots the most interesting relationships when we execute the cell containing the data frame object. Now, after seeing the initial recommendation we may want to know more about the dataset attributes like we want to explore more based on a single attribute such as ‘clarity of diamonds in this case.

This can be achieved using the intent-based language which LUX follows.

4.3 What are Intents in LUX?

Previously we saw how executing the df object can generate the basic recommendations. This recommendation can be controlled using what is called ‘Intent’. Using initial recommendation we have seen the basic relationships which exist in the data and now you want to zoom in based on the ‘clarity’ attribute of a diamond. This is very easy in LUX and can be done using the below single line of code.

diamonds_df.intent = ['clarity']
diamonds_df

You can see the recommendations are now steered by the intent specified. The leftmost plot will show the occurrence of the specified intent is categorical and distribution if the intent is numerical.

On the right we can see that there are two tabs ‘Enhance’ and ‘Filter’ and these are also the analytical actions similar to ‘Correlation’, ‘Distribution’, and ‘Occurence’.

‘Enhance’ action will help you in figuring the relationship of other attributes with the specified intent. Notice in the image above that the ‘clarity of diamonds' is plotted with other attributes such as ‘Mean of carat’, ‘Mean of X and others. This helps in clearly seeing the interesting patterns with other attributes based on the intent.

Whereas ‘Filter’ action will let you explore your intent with the subset of the data.

Filter action based on ‘clarity’ intent

In the above image, see that the LUX is automatically highlighting the relationship between the clarity and the different colors of diamond and we can easily see that the ‘SI2’ clarity diamonds have color=H whereas ‘SI1’ clarity diamonds have color=D.

Now, what if we want to drill down further? like we want to know more about the impact of a particular color=H on the clarity of the diamond. This is again easy to achieve in LUX using the below single line of code.

diamonds_df.intent = ['clarity', 'color=H']
diamonds_df

Notice that the clarity of diamonds along with color=H is plotted with other attributes like ‘Mean of price’ and ‘Mean of carat’ i.e Enhanced. Similarly, the ‘Filter’ action/tab will plot the intent with a subset of the attribute specified.

‘Generalise’ is a new analytical action here and as the name suggests it will generalize the intent i.e. remove the additional attributes of the filter from the intent. Shown in the image below.

Generalize Action/tab for specified intent.

In short consider the purpose of analytical actions as:

  • Enhance — Seeing the relationship of plots with other attributes.
  • Filter — Seeing the relationship with a subset of data, like zoom in.
  • Generalize — seeing the relationship of intent after removing additional attributes, like zoom out.

I know that you have spent quite some time now on this article. But this is not the end of the capabilities of LUX. Hence I will end with covering one last concept of ‘Clause’ in LUX.

4.4 What is Clause?

A clause is nothing but a more powerful way to provide your intent. Till now we have seen how we can provide the intent using a string-based approach. However not all the intents can be specified using that approach and in this case, we can use LUX ‘Clause’ object.

This is just an introduction to ‘Clause’ in LUX. You can read more about it in their official documentation here.

5. Links to LUX functionalities not covered in this article.

I hope I have given an easy-to-understand introduction about LUX, but I will recommend going through the official documentation once to know more in-depth about it. The documentation is concise and can be covered quickly.

I will appreciate feedback in the comment section.

6. Conclusion

LUX is a fairly simple library and easy to use and most importantly it will help you in rapidly gaining insights from the data. Remember that our end goal is to uncover the hidden insights and not to spent time deciding which plots to use for which type of attributes and then write multiple lines of code again and again to achieve this.

--

--

Shubham Patel
Analytics Vidhya

Data Scientist | Full Stack Engineer | MS in Machine Learning & Artificial Intelligence | 6 years of Industry experience