UpSet.js — The UpSet.js Ecosystem

an interactive JavaScript re-implementation of UpSet(R)

Samuel Gratzl
12 min readMay 3, 2020

This article is part of three part article series

  1. UpSet.js — The UpSet.js Ecosystem
  2. UpSet.js —JavaScript Tutorial
  3. UpSet.js — Behind the (technical) Scenes

Background and Motivation

Venn diagrams are a common way to show set intersections between sets. However, with more than three sets things get messy. For example, the following picture is taken from a recent Nature method article.

© Nature: https://www.nature.com/articles/d41586-020-00154-w

It shows how frequent COVID-19 symptoms appear together. In 2014, Lex et. al. published an InfoVis paper about UpSet[1] a visualization technique for showing set intersections with more than three sets. UpSet addresses the inherent issues of Venn diagrams for more than three sets by using a completely different approach to visualize the sets and set intersections. The following figure shows the same data as an UpSet plot with UpSet.js.

UpSet.js plot of the Symptoms table, data from: https://github.com/hms-dbmi/upset-altair-notebook

An UpSet plot consists of three areas:

  • The bottom left area shows the list of sets as a vertical bar chart. The length of the bar corresponds to the cardinality of the set, i.e., the number of elements in this set.
  • The top right area shows the list of set intersections as a horizontal bar chart. Again the length corresponds to the cardinality of the set.
  • The bottom right area shows which intersection consists of which sets. A dark dot indicates that the set is part of this set intersection. The line connecting the dots is to visually group the dots.

UpSet was well received and later got some momentum when UpSetR[2] was published. UpSetR is an R package for creating static UpSet plots. In contrast to the original publication, the authors of UpSetR flipped the UpSet plot such that set intersections are shown horizontally instead of vertically.

Lex et al. published an UpSet prototype, a web application allowing to explore pre-loaded UpSet datasets. Moreover, users are able to use their own data by using a public Dropbox file. However, there is no way to share the results or embed the UpSet plot into another website. The authors of the original paper recently published UpSet2 [3] a follow up version of the application that allows users to upload data to a server back-end managed by the authors. Moreover, a reduced version of the application can be embedded via iframes into other website referencing a pre-loaded dataset. However, it is not possible to use the application as a library that can be parametrized, customized, or used without a server back-end.

UpSetR is a generic R package for generating UpSet plots with three different input formats. UpSetR got popular due to its simplicity to use and the expressiveness of UpSet plots for comparing set intersections. However, the plots are static thus removing a powerful aspect of interactive visual exploration. Moreover, the library is limited to R only.

I believe that UpSet plots are sophisticated and well designed visualization technique for exploring set intersections with more than three sets. However, the existing prototypes and libraries dampen it potential by either limiting its broad usage or interactivity. Thus, the idea for UpSet.js was born. UpSet.js should be an easy to use, interactive JavaScript library that can be integrated into various data science tools and web frameworks. To sum up, UpSet.js should meet the following design goals:

  • easy to use JavaScript library with few dependencies
  • full interactivity within the UpSet plot including highlights, tooltips, and linked selections
  • fast amount of customization option especially regarding coloring
  • a pure functional library without any internal state or side effects
  • integrated export options to formats such as PNG, SVG, or Vega Lite
  • integrations in data science tools such as R, Jupyter Python Notebooks, PowerBI, or Tableau

In the remainder of this article, examples are given using a Games of Thrones Character dataset based on https://github.com/jeffreylancaster/game-of-thrones.

Features

Based on this set of design goals UpSet.js was developed. In the following, a high level overview of its features are given. A detailed introduction in how to use the features are the main topic of the second part of this article series at

Interaction

One of the design goals of UpSet.js was interactivity. While static plots are nice, interactive ones are better. When hovering over a bar in UpSet.js all related elements are highlighted in orange.

For example, when hovering over the third vertical bar (was killedmale) UpSet.js will highlight the common elements in all other sets in orange. In this example, 7 out of 9 Lannister characters (first horizontal bar on the left) is part of this subset. This can also be confirmed by looking at the explicit set intersection (was killed male Lannister) which has a cardinality of 7.

Another design goal of UpSet.js has a major influence on the way how interaction are internally handed: UpSet.js should be stateless. This means, that when a user hovers over a bar in the UpSet plot, related sets are not automatically highlighted but the hovered set is just propagated to the caller of UpSet.js. The caller and user of the library has then to decide how to interpret and react on the event. A common way is to use the hovered set as selection input when updating the UpSet plot. Thus, it looks like that hovering over a bar temporary selects it in the plot. A similar event is triggered when the user clicks on a bar on which the caller can react accordingly. For example, clicking on a bar could confirm a selection and make it persistent.

This stateless nature of the library and shifting the interaction logic from the library to the caller allows numerous different user interaction without changing the library itself. For example, another possible user interaction variant can be used to simplify locating a set intersection in the plot: When the user clicks on a set, it is selected and persisted. Moreover, when hovering over a set it temporary selected. However, when hovering over another set while a persisted selection is stored and the Control (CTRL) modifier key is pressed, the interaction mode changes. In this mode, not the hovered set is highlighted but the set intersection that is built by combining the persisted selection and the hovered set. For example, the user selects Stark by clicking on it. Then, she hovers over Female while holding the Control modifier. The result is that the set intersection (Stark ∩ Female) is highlighted in the plot allowing to quickly identify its location among the set intersection. This logic is implemented in the UpSet.js App later described in the article.

Customization

Customization was another design goal of UpSet.js. For example, out of the box a light and a dark theme are supported. The following example shows the dark theme.

Besides this general theme, several small settings can be customized including font sizes, colors, scale (linear or log), or the labels on the axis. UpSet.js guesses the padding needed for showing all labels depending on the font size and the current values, avoiding a fine tuning by the user on the specific dataset.

Queries

Another feature of the original UpSet and UpSet R is to highlight user defined queries. The first and primary query is visualized similar to a selection, whereas all other secondary queries are just indicated using small marks. UpSet.js supports a similar feature. However, the style of secondary queries were adapted by adding an additional line going through the bar. This simplifies the comparison task and makes queries more salient.

Numerical Attributes

Both UpSet and UpSetR have integrated supported for numerical attributes aggregates on set intersections. For example, in the Covid-19 example, each patient could have an associated age. In the Game of Thrones case, the number of words the character spoke through out the show. In UpSet.js each numerical attribute is summarized using a box plot. Thus, one can easily detect interesting patterns by looking at the distribution in a set or set intersection. In addition, since interactivity was a major design criteria also the box plots highlight the selected subset and queries by overlaying a box plot just of the common elements. This allows to quickly see any interesting outliers in the data.

However, rendering box plots are not part of core library but extracted to its own component. These components are then integrated into the UpSet plot by using an add-on mechanism. On the one hand, this reduces the overall complexity of the library. On the other hand, it allows the development and integration of further add-ons for example to summarize categorical attributes.

Implementation

UpSet.js is written in TypeScript using React and hosted on Github at https://github.com/upsetjs. A detailed behind the (technical) scene article is given in the last part of this article series at

Integrations and Applications

UpSet.js is a designed as a library. Thus it is not a standalone application but requires a “host” application utilizing the library.

UpSet.js App

The UpSet.js App is a single page application that allows users to import, explore, and export UpSet.js data. It is a client only application meaning that all uploaded data are just stored in the browser and there is no server involved. This was an intentional design decision to ensure that users can explore their datasets without any data privacy concerns. The applications allows users to explore UpSet.js datasets, configure them, and foremost export and download them. Users can choose among others to export their UpSet plot to CodePen or CodeSandbox both are popular web services for quickly creating examples and snippets. When exporting to an external service their corresponding API is used to ship both the data and the UpSet.js code render the data. Thus, users can continue to customize the UpSet plot at will by editing the exported code.

UpSet.js App with the Game of Thrones dataset

Another major feature of the app is downloading the plot. Besides static images (SVG, PNG), Vega Lite specification, CSV file, and JSON dump format are supported. Moreover, users can generate an R script or Jupyter Notebook for the integrations described in a later section. Last but not least, an embedded link can be generated that has all the data encoded in the URL in a compressed format. Therefore, the link is self contained without the need of a server and can be shared or stored.

R/RShiny/RMarkDown

R is a popular language for statisticians and data scientists. It is an easy to use language which powerful packages especially for the biomedical domain. UpSet.js for R is an R package (currently under CRAN review) that uses the bundled version of UpSet.js to create interactive UpSet plots in R. The glue between the two world is the based on the HTMLWidgets package. UpSet.js for R supports similar data input formats as UpSetR but uses a builder pattern to build the visualization. An introduction into the package is given in this RMarkDown file.

UpSet.js for R Test Data

The library can be used as standalone plots, in RMarkDown files, or R Shiny environments. In the latter case, it will send custom events when the users hovers or clicks on the UpSet plot allowing to link different charts together. A ready-to-use environment can be found at Binder.

Jupyter Notebooks

Project Jupyter is the to go web based computing platform, mostly known for creating Python notebooks. UpSet.js for Jupyter is a Jupyter lab and notebook extension for showing interactive UpSet.js plots in Jupyter notebooks. A highlight of the integration is that it works together with the Jupyter widget interact command allowing again interactive linked updates.

UpSet.js for Jupyter

The integration supports similar to Upset.js for R package two kind of inputs formats: First, a dictionary where the key is the set name and the value the list of elements in the set. Second, a binary Pandas data frame where the index are the element names and the columns the sets. An introduction into the package is given in this Jupyter Notebook.

PowerBI

PowerBI is Microsoft’s data visualization tool. It is web based which makes it easier to create custom extensions. UpSet.js PowerBI Custom Visual is a PowerBI custom visual extension with proper data and synchronized selection support. Users can drag the data elements to one of the defined slots and an UpSet plot will be generated accordingly. Thanks, to the well designed API, selections are bidirectionally support by clicking on a bar in the chart.

UpSet.js PowerBI Custom Visual

Tableau

Tableau is another well known data visualization tool. There is a Desktop version and a server version of the application. Tableau uses its own approach for integrating third party web components by embedding them as websites using iframes in their application. UpSet.js for Tableau is a Tableau Dashboard extension currently under submission to the extension gallery. One can use it to add an UpSet.js plot to a dashboard as shown in the following figure

UpSet.js Tableau Dashboard Extension

Due to limitations of Tableau’s extension API, it is required that the extension in a dashboard is linked to a sheet in the same dashboard. After linking and specifying input data, it is again fully interactive with synchronized selections.

Discussion

Support Charts

Both UpSet and UpSetR provide users the possibility to create support charts. For example, creating a histogram of a numerical attribute or a scatterplot illustrating an interesting correlation within the elements. In the UpSet prototype these support plots could also be used as an input for creating complex queries that are highlighted in the chart. For example, creating a query that highlights all Game of Thrones characters that are both male and younger than 35. UpSet.js does not support support charts out of the box. As discussed in the Interaction section UpSet.js is a stateless component which means that all interactions are directly reported back to the calling component. Thus, it is easy to create support views in an application which uses UpSet.js, because it has the information which elements are available and which elements to highlight anyhow. The same argument holds for all the integrations, since they are using the provided mechanism to notify the developer of any updates. Moreover, using and depending on a charting library would increase the overall complexity of the library without a major benefit.

Integrations

The architecture of UpSet.js allows integrations into other frameworks and data science tools easily. Reasons include its stateless natures, few dependencies, and focusing on the UpSet plot itself. As discussed before integrations for the most popular data science tools exist. However, there is one exception: Spotfire. Spotfire is focusing on data visualization in big companies environments. It is a standalone fat client application. Moreover, there is no public free trial which hampers the development of custom extensions for testing purposes. Finally, their integration possibilities for JavaScript code are limited. There is JSViz which could be used for this purpose but without a running instance, there is no way to test it.

Licensing

UpSet.js and all of its modules and integrations are released in a dual licensing way on Github. On the one hand, as open source via AGPL-3 and on the other hand via commercial licenses. While the library is free to use for personal and academic purposes, companies should contact me for details in commercial scenarios. At some point UpSet.js will be licensed under a complete free license such as MPL or Apache License depending on sponsor opportunities that ensure the further development of the library.

Summary

This article introduces UpSet.js a JavaScript re-implementation of UpSet(R). Its key features include its ease to use, the UpSet.js app for exploring sets, various integrations into popular data science tools, and foremost the integrated interactivity features. In the second part of this article series a technical introduction about how to use UpSet.js is given. The last part of the series looks behind the scenes of UpSet.js, discusses its architecture, and project structure.

See Also

  1. UpSet.js — The UpSet.js Ecosystem
  2. UpSet.js — JavaScript Tutorial
  3. UpSet.js — Behind the (technical) Scenes

Links

Citation

In case you wanna cite UpSet, please cite the following paper by Lex et. al.:

Alexander Lex, Nils Gehlenborg, Hendrik Strobelt, Romain Vuillemot, Hanspeter Pfister
UpSet: Visualization of Intersecting Sets
IEEE Transactions on Visualization and Computer Graphics (InfoVis ‘14),
20(12): 1983–1992, doi:10.1109/TVCG.2014.2346248, 2014.

References

[1] Alexander Lex, Nils Gehlenborg, Hendrik Strobelt, Romain Vuillemot, Hanspeter Pfister, UpSet: Visualization of Intersecting Sets
IEEE Transactions on Visualization and Computer Graphics (InfoVis ‘14), 20(12): 1983–1992, https://doi.org/10.1109/TVCG.2014.2346248, 2014.
[2] Jake R Conway, Alexander Lex, Nils Gehlenborg,
UpSetR: An R Package for the Visualization of Intersecting Sets and their Properties
https://doi.org/10.1093/bioinformatics/btx364
[3] Kiran Gadhave, Hendrik Strobelt, Nils Gehlenborg, Alexander Lex
UpSet 2: From Prototype to Tool, Proceedings of the IEEE Information Visualization Conference — Posters (InfoVis ’19), 2019.

--

--

Samuel Gratzl

Research Software Engineer with a focus on Data Visualization. Author of LineUp.js (https://lineup.js.org) and UpSet.js (https://upset.js.org).