Interactive Data Visualization became much easier with help of Plotly-express.

Chamanth mvs
Analytics Vidhya
Published in
8 min readNov 10, 2019

--

Plotly became easier to use with help of Plotly-express

Interactive plotting with plotly and plotly express

Plotly is one of the awesome library available for Data Visualization in Python, which enables interactive plotting.

Plotly describes Plotly-express as a “terse, consistent, high-level API for rapid data exploration and figure generation”.

In simple terms, we can assume it as an IDE for plotly. In this blog, I will discuss

  1. What is Plotly and Where is it useful?
  2. Difference between Plotly and Plotly-express (in terms of plotting).

What is Plotly and Where is it useful?

What is Plotly?

Plotly is both a company and an open-source library. It is a company that focuses on data visualization for Business intelligence as reporting, creating dashboards, and hosting BI solutions.

Even though it is a company, they have released an open source library under the same name (which focuses on interactive visualizations). Plotly has libraries for Javascript, React, R and Python. Among all, its library for Python became more popular.

Where is it useful?

By using matplotlib or seaborn, we can only create static image files (jpeg or png or any static image format)i.e., which don’t change, once it is created. But assume we have a time-series plot for manufacturing steel metals, created with help of seaborn. We need to find the exact process at some particular time interval, where we need to zoom in and view it carefully, plotly can help in such scenarios.

Plotly in python, by itself, creates interactive plots (as .html files). Users can interact with these plots (by zooming in, selecting a particular area within the plot for detailed analysis and hover-across across the plot, etc..).

But, these plots can’t be connected to changing data sources (live update of the plot is not possible with plotly). Dash (which is a Python framework for building a web applications from plotly) helps in working with live data sources.

Once the interactive plotly plot is generated, the data represented in the plot is “locked in” to the exported state of the plot. We need to re-run the .py script and regenerate the .html file to see any updates.

Difference between Plotly and Plotly-express (in terms of plotting).

Plotly

Please Note: Plotly has been updated recently. Plotly as well as any sources from plotly updates frequently (as they are new libraries when compared to other libraries in python).

Till Plotly 3, we have two modes in plotting with plotly (Online and Offline).

Plotly online

When plotting online, the plot and data will be saved to your plotly’s cloud account. There are two methods to plot online. plotly.plot() — used to return the unique URL and optionally open the URL. plotly.iplot() — used when working in jupyter-notebook, to display the plot within the notebook. Both methods create a unique URL for the plot and save it in your Plotly account and an internet connection is required to use Plotly online.

Plotly offline

Plotly offline allows to create plots offline and save them locally (which doesn’t require any internet connection). There are two methods to plot offline. plotly.offline.plot() — used to create a standalone HTML, that is saved locally and opened inside your web browser. plotly.offline.iplot() — used when working offline in jupyter-notebook, to display the plot in the notebook. When we intend to use plotly.offline.iplot(), we need to run an additional step, i.e., plotly.offline.init_notebook_mode() at start of each session.

From Plotly 4, (which is updated and recent version of Plotly)

Fig-1 : plotly3_vs_4

Plotly 4 made life much easier as it is completely offline (So, there is NO plotly online from plotly 4).

Whoever loves to work with plotly.offline in jupyter-notebook, can avoid connection statements in their code (which includes connecting plotly in offline mode to their notebook ), now they can directly import plotly rather than importing plotly.offline.

plotly.graph_objs — This has several functions, which are useful in generating graph objects. grpah_objs — It is a class, which contains several structures that are consistent across visualizations made in python regardless of type.

From plotly 3 to plotly 4, plotly.graph_objs package has been aliased as plotly.graph_objects “because the latter is much easier to communicate verbally “— according to the official documentation.

Plotly Express

Fig-2:Importing plotly express (careful with versions)

Plotly Express was separately installed using the plotly_express package but it is now part of plotly. Plotly should be updated to plotly 4 before using it Or you will be encountered with error as shown in Fig-2.

Comparing Scatter plot with plotly and plotly express

A Scatter plot allows the comparison of two variables for a set of data. Depending on the trend of the scatter plot, we could interpret a correlation.

With Plotly:

Fig-3:plotting between sepal_length and sepal_width

Plotly follows a particular syntax as seen in Fig-3. Initially, a variable is to be created to assign a plot (Note:plot type should be given in the form of a list, as shown in the first line in Fig-3). In this case, we named it “data” (most followed notation), the variable name can be of your choice. This “data” variable contains a plot type call. go.Scatter is one among many graph objects, each plot type has it’s own graph object. These objects typically accept few parameters. For instance, scatter graph objects requires two mandatory parameters (assigning x-axis and y-axis). go.Layout(this is also one of the graph objects), which is used to define the layout for the plot. Then figure object from graph objects, is created to use both data and layout variables to plot.

The dataset I have used is a very famous and best-known database to be found in the pattern recognition literature. This dataset contains 3 classes(Setosa, Versicolour, Virginica) of 50 instances each, where each class refers to a type of iris plant. In Fig-3, we have plotted all classes to find the relation between (Sepal_length and Sepal_width). But, as all the data points are represented in the same color, we are unable to draw any conclusions from the plot because plotly doesn’t give you a hue in a plot (which is a parameter in seaborn). So, an alternative to it is to either data should be plotted using a grouped-by method or individual traces should be created for each class variable.

Grouping by data to plot with variation in each class

Fig -4:groupby code and it’s respective plot

The code to plot the right side figure of Fig-4 , could be found on the left of it. To briefly understand what is happening in code — Grouping the data based on the target variable (where 3 different types of flowers in the iris dataset are present ). Defining the Layout of the code with necessary parameters, then assigning the layout to the figure object. main part — we are iterating over the target variable to match with similar features in each figure and creating three different scatter traces (plots) with respect to each target variable. Finally, plot all different scatter objects on one single plot with help of fig object.

With Plotly Express:

Fig-5: Importing Iris data

Plotly express has built-in datasets. We can use them by just importing plotly express. Fig-5, describes importing of datasets. But, plotly doesn’t have such an option. We need to use scikit-learn or other public libraries which contain datasets within them.

Fig-6:Scatter plot using Plotly-express

The same plot which is on the right of Fig-4, is plotted within two lines of code using Plotly express. If you have an idea of seaborn, we would assign a target feature to hue, a similar kind of parameter is color (in order to color according to provided feature within data). Even labeling the axis is automatically done in plotly express according to feature names in the dataset. The best part of Plotly express is the code itself is self-explanatory and its documentation is very detailed and easy to understand.

Comparing Line plot with plotly and plotly express

A Line chart displays a series of data points (markers) connected by line segments. It is similar to a scatter plot except that the measurement points are ordered typically by their x-axis value and joined with straight line segments. They are often used to visualize a trend in data over intervals of time (called time-series). The below is just an example (no specific data is considered nor explained).

With Plotly:

Fig-7:code to plot Fig-8

Dataset used to plot this is created and it is a toy dataset. The task is to plot a Line Chart that plots seven days' worth of temperature data in one graph.

Fig-8:output of code Fig-7

In order to plot this figure (Fig-8), we have created individual traces for each day to represent day-wise using loops (which can be seen in Fig-7).

With Plotly Express:

Fig-9: Line-plot with plotly express

When comparing Fig-7 and Fig-8 with Fig-9, it can be seen that plotly express created plot within just one line of code and made interactive plotting easier.

Conclusion

Plotly has a complex syntax when compared to Seaborn or Matplotlib but Ploltly Express has made interactive plotting effortless through its simple functions, by getting rid of complex plotly syntax and it also greatly reduced the number of lines required to plot using plotly. Statistical plotting became simpler when seaborn is introduced (which is built on top of matplotlib). Similarly, plotly express made plotly simple to use. I believe that Plotly Express is built to make interactive plotting easier and handy.

Thank you for reading so far. I am committed to improving my style and presentation methods. So, if you have any suggestions or have something to share, feel free to comment or contact me through LinkedIn here.

--

--

Chamanth mvs
Analytics Vidhya

Data Science and ML practitioner | I share my learnings and thoughts here