Introducing VizHub Alpha

See also Introducing VizHub Beta

I’d like to tell you a bit about VizHub, the next generation of Datavis.tech, a data visualization platform I worked on for about a year, and from which I learned how I wanted to develop VizHub.

VizHub is still early work in progress (alpha software), but the beta release should be ready by September, at which time I plan to use it as the platform for teaching (creating example code) and learning (students doing homework assignments) data visualization with D3.js and SVG in an online course this Fall at @WPI ! Many students are remote and transfer credit from WPI to other universities. If you’re a graduate student in Computer Science anywhere, you can register (see enrollment details). Here’s a taste of what my students made last year.

Video intro tutorial.

Past Experiences — Building Datavis.tech

Being a huge fan and avid user of bl.ocks.org, Blockbuilder, and GitHub Pages, I began to feel limited by those tools as means to create, evolve, and share data visualizations. Creating visualizations is not just about writing code. There’s a host of contextual factors, social factors, and lifecycle stages for visualizations that would benefit from support by the software tools we use to visualize data.

Collaboration

Collaboration is one big thing the Datavis.tech platform intended to address. I was working on a visualization project for a client once, and we were using GitHub Pages to host the project. Week after week, the client suggested different colors to use in our color palette. During the call we’d say “Sure, we can change that right after the call”. Indeed we could change it, but it required 5 minutes of dedicated attention to open up the file, commit the change, wait for the deploy, and send an email. The next week the client would say they didn’t like that color after seeing it in action, and suggested a new one.

I had a nagging feeling that we could do better than this. It should not take a week for a feedback cycle on a color change. The tools we use can support this better. I thought of Google Docs, how you can add collaborators to documents and get changes instantly. Anyone with access can change the document. This is the sort of collaboration tool I desired, but nothing like this existed for visualization as far as I could see.

The Datavis.tech editor interface.

Therefore the first major feature developed in Datavis.tech was real-time collaboration. Similarly to bl.ocks.org, a visualization consists of HTML and JavaScript code. Similarly to Blockbuilder, you can edit that code and get instant feedback within your browser. Similarly to Google Docs, you can add and remove collaborators on your visualization, and they can edit the code too. Whenever any collaborator makes a change, the visualization is immediately updated in the browsers of everyone viewing the visualization.

The UI for adding collaborators, and references to datasets.

Data Separate from Visualization

One of the things that never quite felt right about bl.ocks.org is how datasets are published alongside the visualizations. If a visualization is forked in bl.ocks.org, the dataset is forked along with it. So if the original source data updates, one would need to update each visualization separately. This usually doesn’t happen so we end up with tons of old and rotting works with outdated data. Also, data files are rarely given any credit in bl.ocks.org. It’s just “data.csv” and you have no idea where it came from, who created it, if it’s reliable or even legal to use.

An example dataset page.

In Datavis.tech, datasets were tracked independently of visualizations. Each dataset had its own page, with its own description, and could be referenced by many visualizations. With this structure, each visualization page could automatically show a list of links back to the datasets that it referenced, and each dataset page could automatically show a list of its visualizations. For example the Old Faithful Dataset links to visualizations that demonstrate various ways of visualizing density: semi-transparent circles, contour plot, and binned aggregation, and each of those visualizations link back to the dataset.

Whenever a dataset updated, all visualizations that reference it would automatically update (instantly!). This gave visualizations a fresh quality, making them more like up-to-the-minutes news briefings than framed pieces of art sitting in a museum. It also enabled different people to be responsible for data maintenance and visualizations. Imagine an authoritative dataset, updated weekly, could be maintained by one person, and 20 people could create different visualizations of it. Those 20 visualizations would always be showing up-to-date data.

Longevity

One of the goals of Datavis.tech was to be a visualization publishing platform with longevity. If a visualization was created one day and embedded in a news article page, that page should load and run the same exact visualization if it’s accessed again in a year, 5 years, 10 years, or 20 years. Achieving this requires that version history be stored for each and every document in the platform. Also, it must be possible to access visualizations at any point in their history.

As of now (July 2018) these features have not yet been implemented. Also, we’re abandoning Datavis.tech in favor of its next iteration, VizHub.com, so the longevity bit didn’t really work out… this time. But this is again one of the goals for VizHub.

Adoption

The biggest problem with Datavis.tech was that no one was using it. The core features were there, and you could make visualizations with it, but the UI/UX had a whole host of problems that made it look and feel more like someone’s half-baked senior project from a university CS class than a serious and polished product.

The vision, communicated.

The main issues that plagued usability in Datavis.tech were:

  • The code editing environment was a form, which sucks. People need a richer code editing environment, more like an IDE, to feel comfortable.
  • The collaboration feature was missing presence cursors, so you wouldn’t know who is editing the document or where they are editing. To work around this, we added line numbers, so you can tell your coding buddy “So I’m editing like 57 now” in your voice call, but this is not ideal. It’s pathetic in fact for a product that “offers” real-time collaboration.
  • ES6 modules were not supported in visualization code. The user was forced to edit just one honking HTML file, with inline script tags, OR create totally separate “technology” documents and reference them from the visualization. This idea of a “technology” document as a reusable piece of code perhaps has some merit, but the implementation and UX for this was … regrettable.
  • Datasets were not displayed as tables. The raw text was shown.
  • Importing datasets using “References” was not intuitive at all, and was cumbersome to use.
  • There was no Search feature, so things were not very discoverable.
  • There were no comments section in the visualizations, which was in fact one of the biggest original goals for the project. Wouldn’t it be great if you could comment on a visualization, like GitHub Issue discussions or YouTube comments, to give feedback or present ideas? Yes it would, but unfortunately we got bogged down in technical debt and saw that developing this feature within Datavis.tech would be nearly impossible.

Future Plans — Building VizHub

Having learned quite a lot by the success/failure of Datavis.tech as a platform, we are embarking on building the next generation of this system.

Being an educator, I have struggled in the past to find a way to teach data visualization design and visualization construction using D3.js in a 10 week online course. I’ve got students coming in with little programming background, and I want to focus on coding visualizations. I found the biggest detractor from that process was that the students would get bogged down in the details and process of setting up their own development environment, learning Git, and publishing their work online (so I could grade it).

Perhaps one of the biggest problems with Datavis.tech is that it was not well scoped. It was trying to do too many things at once, without clear focus. 
Given that I’m teaching this online course in the Fall, and that teaching was one of the many original use cases for Datavis.tech, I figured this is a great opportunity to clearly define the scope of its next incarnation: A Platform for Teaching and Learning Data Visualziation with D3.js and SVG. The litmus test is whether my students and I can actually use it as “the platform” for the course.

Now that we have a clear scope and goal, the two other things that need addressing are technical debt and user adoption.

Addressing Technical Debt

Having developed Datavis.tech with real-time features from the start, and without any real architectural plan (or knowledge of software architecture at the time), one of our main problems after a full year of development was technical debt. Features that should have taken hours to implement took weeks or felt impossible. For example, presence cursors and comments. The UI code was tightly coupled directly to the real-time database (ShareDB). This also blocked more fundamental features, like server-side rendering.

Clean Architecture and open source UI for vizhub.com.

To solve the problem of technical debt once and for all with this re-write, after evaluating various architectural patterns, it was decided that we will try to faithfully implement The Clean Architecture. The main purpose of this architecture is to ensure that the codebase stays maintainable, and that we do not get bogged down with technical debt ever again. This will (in theory) make it so developing new features becomes easier, not harder, and our development velocity increases, not decreases, as time marches on.

Although the Clean Architecture is based on request-and-response style interaction modeling, it has been successfully adapted to Reactive Clean Architecture in the Android world. This gives us hope that if we build the MVP without real-time features, we can adapt/extend the architecture later on to add real-time where desirable.

Addressing User Adoption

I’ve been reading this amazing book Scrum. In so doing I realized that we waited waaaayyyyyyyy too long to get the product in front of people who might actually want to use it. A year was spent developing, without any feedback from users. Features were developed based on mere imaginings of what might be super cool for someone some time, rather than addressing the usability pain points that turned people away. No wonder it failed.

This time, it’s Scrum 100%. We released an Alpha at vizhub.com. Try it! Maybe it’s even too early to release, but f*** it, we need feedback and interaction with potential users now, before it’s too late and we’ve painted ourselves into a corner.

Working software over comprehensive documentation.

While working on Datavis.tech, we were never quite certain what to work on next or how to prioritize our backlog (which hovered around 100 issues). This time around, we want the prioritization to be driven by the needs of real users. Therefore, we set up a public backlog under github.com/datavis-tech/vizhub-ui, which is a list of open issues prioritized by the number of 👍reactions. This way, we’ll be working on things that produce value. All the time.

A corollary to this (which is not set up yet but will be soon) is the idea that we include UI elements for non-implemented features, and when users try to interact with these UI elements, we tell them that the feature is not implemented yet, and direct them to give a 👍to the corresponding GitHub issue if they want this feature. If you like this idea, please give a 👍to its GitHub issue “Link unimplemented UI to GitHub issues”.

CodeSandbox.io has many Open Source contributors and an enthusiastic community, yet they can charge a monthly fee for “patrons”, who get additional features implemented in proprietary code. How can they do both of these things? This is the Open Core Model in action. Taking inspiration from their great success with this model, the “core” UI for VisHub is made Open Source.

Conclusion

If you read this far, we’d like to invite you to get involved. Please star the repo, follow us on Twitter, and give a 👍to the issues that you’d like us to work on next!