Designing for Machine Learning — Part 2
Cluster Analysis in Kira — A WIP Case Study
In Part 1 I explained clustering and how it helps Kira’s app users get their jobs done; check it out, it’s a good primer! In Part 2 I’m going to cover some exploration and ideation my colleagues and I have done around clustering.
The designs in Part 1 only consider what would happen if a user wanted information on one field (e.g., clusters in Change of Control). What if we want to cluster all the fields for more insight? Or cluster all documents? Or see all the field clusters in one document type?
If we allow users to choose what data they want to view, we can create charts to answer the questions above. I am a huge Google Analytics fan, from my former life in marketing and communications. Their dashboard creation is an inspiration; you can create custom widgets by selecting a chart type and then the dimensions and metrics to be measured.
The following is an exploration of some configuration options for users to construct their own cluster charts. Essentially what I am designing for is the possibility for the user to layer clusters upon clusters.
Exploring all the chart types
For users to be able to create their own charts, we need to have a ready suite of chart types available, aside from the traditional bar graph above. Ideally, what we would do is recommend a chart type or only make certain types available based on what they have chosen as their base dimension.
- It is agile, fast and takes care of all the DOM manipulation and transitions.
- We have already used it elsewhere on our app (which is written in Clojure), so we knew it could integrate well.
- D3 has an extensive library of examples that we could look through to see what fit well with our data.
The following will outline three charts/graphs and how we think they work well with our data.
Treemaps are a great way to show distinct groups of data as a representation of a whole, while also giving insight into the sub-groups of that data.
Pros of treemaps:
This would be an ideal way to show clustering data, especially if we added a second layer of clustering to this chart, for example, a view that showed all the clusters of a Change of Control field within each of the blocks above.
Cons of treemaps:
It is not always easy to spot which block is the largest, especially if they are arranged in creative ways to preserve space.
Heatmaps show you the intensity of an event when two datasets cross.
Pros of heatmaps:
It allows you to quickly compare data across an entire project and identify trends. For example, you can quickly spot if one document has many outliers, or one field does not have extractions across many documents.
Cons of heatmaps:
There is no easy way to get all the information on one dataset . For example, if you wanted to see all the outliers in this entire project grouped together, it’s not possible on a heatmap because of the limits of the table layout.
These are our absolute favourite, because they are both interactively fun, and allow for nesting clusters easily in more than one dimension.
Our graph is a combination of the two — zooming in when a user selects a section, while also retaining the view of the whole slice, and a lighter version of the entire graph in the background for context.
Pros of sunburst graphs:
You can segment information while showing it as part of the greater dataset, and infinitely nest clusters within clusters! They also have a lot of interactive possibilities.
Cons of sunburst graphs:
Similarly to treemaps, it is difficult to instantly see how large of a slice something is as part of the whole. Bar and line graphs are really the only way to visually represent comparable information, so perhaps it is best to look at these charts as visually bringing other insight to our data.
Since this is a work in progress, I will continue to update this article as I explore other graphs. In the meantime, stay tuned for Part 3, where I will discuss prototyping and user testing!