Introducing Semiotic for Data Visualization

Semiotic is a React-based data visualization framework. You can see interactive documentation and examples here. It satisfies the need for reusable data visualization, without committing to a static set of charting types. It came out of a need for a data visualization framework that let us make simple charts quickly without committing ourselves to using only those charts. Semiotic incorporates the design and functionality of more complex data visualization methods as a response to the conversation these simple charts might begin.

I’ve been doing data visualization for a long time — so long that I’ve just finished the 2nd edition of my book, D3.js in Action. Before coming to Netflix I created data visualization in what can be thought of as one of the typical ways: bespoke and innovative and not reusable. Since coming to Netflix I’ve gained an increased appreciation for reusability and maintainability, which led me to working with React. And while I haven’t been working with React quite as long, I’ve been working on it long enough to watch it grow into the major framework it is today.

What I’ve seen is that, in industry, we need data visualization tools that allow us to approach problems with the same flexibility we have in purely custom data visualization but with the maintainability and reusability necessary for a team where not all members may be an expert in visualization. That’s not what we see today. Right now we have two kinds of libraries:

Custom Viz: Libraries that allow you to do anything with visual display of information, such as ggplot2, P5 and D3.

E-Z Bake Charts: Libraries that let you deploy a bunch of componentized charts, like Recharts and Highcharts.

The libraries that try to optimize for composability do so not in visual display of information, but in chart components, like axis and legend pieces. They let you construct a chart from pieces but ironically just add work to enable features that should be available on almost every chart. Or they follow a declarative pattern and try to integrate code via custom templating languages or eval statements.

I’m a D3 expert, so I love the full control and flexibility of D3, but not everyone can or wants to be that kind of expert. And even as an expert, I’m not going to spend the time and touch necessary to make the kind of custom work that catches eyes out in the freelancing world. In industry, data visualization developers are not expected to be auteurs, who disappear into their cave and return with a beautiful finished product. Instead, they’re expected to be part of an ongoing conversation, continuously developing an analytical application until it best reveals the underlying patterns.

Combinatorial charts and custom effects like sketchy rendering are the kinds of approaches that get you retweets and awards and do not fall into E-Z Bake Chart territory.

So I created Semiotic, a framework designed to quickly deploy bar charts and line charts but not lock us into using those charts due to requirements of configuration and data processing. It doesn’t limit us in our data visualization methods as a cost of convenience, but it also doesn’t force us to write new custom D3 code for every minor iteration. It embodies the principle that data visualization methods should not be limited by how we initially format data or the initial guess at how that data should be displayed. It draws its name from the study of symbols and meaning making, topics I feel are integral to data visualization, which is fundamentally a communication and design problem, not an engineering problem.

This is why Semiotic uses the concept of Frames. Each frame shares a metaphor for how your information is modeled:

Some of the XYFrame charts you can create with Semiotic that allow for comparison of two numerical values. XY data comes in a few forms, most commonly line charts, area charts and scatterplots.

Correlation between two numerical values (XYFrame)

  • Scatterplots
  • Area visualizations like contour plots
  • Time series line charts
  • Bump charts and stacked area charts
A few of the ORFrame charts you can make with Semiotic that are categorical data compared numerically. Categorical data isn’t just bar charts and pie charts but parallel coordinates, waterfall charts and even more exotic data visualization methods like Marimekko charts.

Categorical data compared numerically (ORFrame)

  • Pie charts or bar charts or nightingale diagrams
  • Distribution visualizations such as violin plots and boxplots.
Network diagrams are made with the NetworkFrame in Semiotic. Like the common force-directed kind or the less common but still clearly related sankey diagrams and chord diagrams, communicate data relationships through connections. Less clearly topological but still following the same rules, word or tag clouds are in this category, as are tree maps and circle packs, though Semiotic doesn’t currently support these.

Topological data where relationships are more important than exact positions (NetworkFrame)

  • Network diagrams where two data points may be right next to each other even though they’re unrelated
  • Diagrams that encode relationships using enclosure or other symbology, like treemaps and circle packing.

We need to reevaluate entirely what data visualization means and what we’re doing to enable it. Right now it seems like there’s one class of “flashy” data visualization that’s done by impresarios who know the ins and outs of D3 or P5, and then there’s “commodity” data visualization that’s focused on providing every possible variation of a chart as a self-contained widget.

The world of commodity data visualization seems convinced that it can enable data visualization by releasing ever more widgets. Here’s a bar chart widget, and here’s a pie chart widget, and here’s a violin plot widget and here’s a sankey widget and a joy plot widget and so on and so forth.

This approach addresses a problem that no longer exists. Data visualization is not the question: “How do I deploy as many charts as possible as quickly as possible.” Nor is it the question: “How can I encode as many channels as possible with data.” Rather, it’s: “How do I, in collaboration with fellow developers and stakeholders, create an analytical view into a dataset that best enables everyone to understand and navigate the domain area.” That’s not done by enabling more and more charts, it’s done by facilitating information design.

This is especially important for analytical applications intended to be deployed with rich interactivity. The way we model information helps determine the expectations for interacting with that information. For example, brushes show up in a lot of different applications, but the brushing of data in a frame optimized for purely numerical correlation is distinct from that for data where that comparison is split categorically. So in Semiotic that’s abstracted away, along with the other things that are reusable: axes and minimaps and voronoi overlays for improved interactivity. These are things that are only technically difficult. They obscure the real difficulty of creating meaningful visual displays of information. As long as we think of data visualization as that thing we do with the last 5% of our project, and for which we need something that can quickly and conveniently drop a bunch of charts on an application without my thinking about it, then we will continue to produce these poorly-optimized, overly simplistic charts. Instead, we should embrace the difficulty of data visualization and focus on enabling the exploration of data visualization modes among developers and stakeholders.

That’s an intentional distinction with Semiotic. Semiotic is not meant to abstract everything away, rather it is intended to highlight the interconnectivity and relatedness of data visualization modes, to allow for easy ideation and prototyping in tandem with designers and audiences. Semiotic has a strong focus on layering meaning onto blank charts, most prominently through its integration of Susie Lu’s d3-annotation library but also through the prominence of multiple layers of a chart, only one of which holds the traditional marks that make up what we commonly refer to as “data visualization”.

The layers that make up a Frame

Semiotic lets you easily ideate on a dataset and create iterations which show the same data in similar ways to settle on the best case, with built-in interactivity for when you want to deploy those charts. As long as the question is “how do I quickly drop a chart in my app” then we close ourselves off from the design process, which is bad for any application but is particularly terrible for data visualization.

To get started, take a look at the interactive documentation to see how charts are made and changed. After that, you can install semiotic locally with npm i semiotic. The API is documented on the wiki on the Semiotic Github page, along with a few tutorials and deeper explanations of some of the principles of the library. Of course if you find a bug or would like a new feature, file an issue on Github.

Many thanks to Susie Lu, Mara Averick, Matt Herman & Jason Reid for feedback on this piece.

Also double thanks to Susie because Semiotic is the only data visualization framework with an amazing poster.