Three Tools for Human-in-the-loop Data Analytics: Public Software Releases
We’re starting this year on a very good note: not one, but three of our key interactive (or “human-in-the-loop”) data analytics tools are moving from private betas with a few interested parties, to full-fledged open-source releases, available to the world at large. We’re hoping this will catalyze further development, adoption, and research.
The Three Tools: Links to Software
I will now briefly describe these three tools, and provide links to the software releases, as well as further information.
DataSpread, a spreadsheet-database hybrid, with a spreadsheet front-end and a database back-end.
This release of DataSpread can seamlessly scale to billions of cells while maintaining interactivity; it does so by using a database coupled with clever data representation schemes, positional indexing schemes, cache management algorithms, as well as by prioritizing for user attention.
Zenvisage, an “effortless” visual data exploration platform, aimed at allowing users to fast-forward to desired patterns, trends, and insights.
This release of Zenvisage incorporates its interactive exploration interface with drag-and-drop and sketching capabilities, along with a powerful ZQL (or the Zenvisage Query Language) interface for more complex, multi-step requests. The Zenvisage backend translates and optimizes these queries using the SmartFuse query optimizer to return results in an interactive manner.
OrpheusDB, a versioned database system, with versioning “bolted-on” to a regular relational database, in this case, PostgreSQL.
This release of OrpheusDB incorporates data representation schemes that optimize for retrieval efficiency and storage of large volumes of versioned datasets, coupled with powerful analytics capabilities (via versioned SQL queries) across one or more dataset versions.
These three tools fit into a “Maslow’s Hierarchy” for human-in-the-loop data analytics that we’ve been developing at Illinois.
The hierarchy is organized along increasing sophistication of data analytics needs from bottom to top — at the bottom, in order to be able to open, examine, and touch large datasets, we have DataSpread; in the middle, to view interesting insights, and play with datasets, we have Zenvisage; and finally, at the top, to record the results of your analysis, share them with others, and retrieve them on demand, we have OrpheusDB.
We believe these three tools help span the spectrum from simple to complex data analytics needs, and cater to a broad audience of users.
We’d love to hear from you — do tell us if you’d like to deploy our tools for your use cases, or contribute to their development!