Forget Me Not

Leveraging analysis history for visualization recommendation.

Sejal Singh

Published in

VisUMD

4 min readNov 3, 2022

“We are drowning in information, but starving for knowledge.” — John Naisbitt

It is no secret that the age of Big Data is upon us, and data drives just about everything today. But in this endless ocean of data, where and how do we even begin to look for insights crucial to our case? Enter data visualization — an analyst’s weapon of choice when it comes to transforming massive amounts of intangible data into easily digestible insights. Data visualizations make data accessible to everyone, and aid people to promptly identify issues within an organization, forecast trading volumes and sales prices, or make the right business decisions.

While many data visualization tools for data analysis exist in the market today, they are marred with several limitations. This is mostly owing to the fact that generating insights from data that spans across multiple datasets is not easy. Exploratory data analysis, for instance, requires data analysts to iteratively explore different methods for cleaning, aggregating, and filtering to make sense of their data. While existing tools help users through recommended data visualizations in understanding their analysis, they only provide recommendations on a single data snapshot, which fails to capture the dynamic nature of analysis.

Meet Solas

Solas is a visualization recommendation tool that leverages analysis history to offer more insightful and better visualization recommendations to the user, while also tapping into the user’s probable interests and next steps in the analysis process. In broader terms, Solas achieves this in 3 steps:

Track history of user’s data analysis
Model their interest in each column of the dataframe, and
Use this information to provide visualization recommendations

Solas tracks the history of a user’s analysis to provide improved in situ visualization recommendations.

What’s more? Solas does all of this within the user’s native analytical environment, thus not requiring users to give up their tool stack of choice in favor of another. So, how does Solas really work?

First, Solas tracks the data analysis history

Solas maintains its own history of operations for each dataframe or series object. This does not only ensure that the user is shown relevant data visualizations for each dataframe, but also does away with ambiguities when two dataframes share the same column names but different data.

Then, it models the user’s column interest

As analysts go about exploring their dataset, their interests shift over time. Solas reflects this in its recommendations by considering more recent data interactions to be more important than older interactions. At the start of an analysis, Solas considers all the columns to be equally important as no history exists. Over time, as an analyst goes about exploring the data, Solas updates its model of their interest, and uses this to offer recommendations relevant to their recent operations.

Finally, it recommends visualizations from analysis history

Solas enhances its visualization recommendations in three ways:

First, it creates task-specific visualizations using data provenance, which it gathers from analysis history. Solas understands the user’s most recent operation applied to their data to be the strongest indicator of their current interest, and combines this with analysis history to provide meaningful visualizations. Solas’ task-specific visualizations fall into two broad categories:

Those that detect pre-aggregated data. Take the example of value counts, an analysis function which returns the count of each unique value in a column of a dataframe. Solas implements data provenance to represent value counts as category counts, and encodes them as a bar chart, unlike existing recommendation systems which encode these counts as raw values.
Those that use historical data from earlier in the analysis history. For tasks such as filters, data that is no longer in the dataframe offers the benefit of additional context. Solas takes advantage of data provenance to afford this benefit to the users.

Viewership initially represents the count of viewers in 10 millions. Since it has low cardinality, it is visualized as a nominal variable. However, when we re-scale the column by multiplying by 10 million, Solas infers that Viewership must be a quantitative column that supports multiplication and visualizes accordingly.

When filtering, Solas shows the background distribution of each column from the parent data relative to the filtered data. Users can toggle the background distribution on and off.

Second, Solas uses the model of column interest generated earlier to enhance the most recent operation’s visualization, as well as sort other recommendation tabs. Not only does Solas offer visualizations comparing columns of high interest based on the user’s recent operations, it also sorts other recommendations according to this interest model. Thus, visualizations most relevant to a user’s perceived interest are offered upfront, reducing the time spent scrolling to find relevant charts.

Lastly, Solas uses the operations that a user applies to each column to detect the measurement type, which results in better type-appropriate visualizations. By inferring whether the data in a column is nominal, ordinal, interval or ratio, Solas is able to encode the data into more appropriate, meaningful visualizations.

What’s Next?

Solas’ scope does not just end here. In the future, data aggregated from the steps taken by analysts across multiple analyses could inform the tool to not just recommend visualizations, but also possible next steps in an analysis. In addition to next step recommendations, the tool could also use aggregated analysis histories to better understand how analysts typically interact with a particular data source. This could then be used to guide new users with useful recommendations, and to even develop analysis templates based on common practices.

References

Read the full paper here.

Will Epperson, Doris Jung-Lin Lee, Leijie Wang, Kunal Agarwal, Aditya Parameswaran, Dominik Moritz, and Adam Perer. Leveraging Analysis History for Improved In Situ Visualization Recommendation. Computer Graphics Forum, 41(3):145–155, 2022. 10.1111/cgf.14529