Useful terms for software architecture and user experience we commonly use at Tag.bio
You should be able to read this straight through, even though terms are presented in alphabetical order. Alternatively, you can jump around to specific terms of interest.
Terms in bold (←except this one) are all defined in this glossary. I’m going to figure out how to turn them into anchor links later.
What if a user’s Data Experience in software were primarily driven by server-defined functionality — instead of being driven by front-end functionality?
This would turn a front-end application into a simple browser of server content — which seems feature-weak — that is, if you only have one type of data application server to connect to.
But what if there were a whole universe of diverse data application servers out there, findable, accessible, and interoperable from the same front-end?
In our API-driven design, a user’s data experience in our front-end portal is driven by the Smart API of each Data Node — one of the myriad data application servers deployed into our decentralized Data Mesh.
This design pattern mirrors one of the most innovative and successful software architecture patterns in history — the World Wide Web. In the WWW model, web servers (e.g. Apache) define the content and function of their websites in order to provide a domain-native experience to users in the browser.
The World Wide Web has proven how API-driven design provides exceptional specificity to end users and scalability with emerging use cases.
In addition, our API-driven design minimizes Technical Debt, enables universal-yet-bespoke Q-to-A Workflows, and generates interoperable Useful Data ArtifacTs (UDATs) for reproducibility, re-use, and collaboration.
Data experience is a high-level concept covering all the ways users, developers and software components interact with domain data for various scientific and business purposes. A worthwhile data experience can be facilitated by superior user experience — but also via useful developer tools, agile processes, and modular, scalable software architecture.
The data mesh is our decentralized, silo-busting data/application architecture, comprised of disparate and independent Data Nodes.
The overall architecture of our data mesh is straightforward — it’s a registry of FAIR data nodes and a universal communication interface enabling the specific functionality of each data node to be accessed in a generic manner — it’s the data nodes within the mesh that do the heavy lifting, from domain-specific queries to complex algorithms.
A data node is an independent application server deployed into a Data Mesh. Data nodes are configured with functionality pertaining to a one or more domain-specific data sources — from a complete data lake, a single relational database, flat files (e.g. CSV), or even just one table.
A data node is FAIR — i.e. Findable, Accessible, Interoperable and Reusable.
The simplest implementation of a data node is “dumb” — meaning it can only perform generic queries on that data source (i.e. SQL queries).
In our API-Driven Design, however, data nodes offer much greater functionality — Q-to-A Workflows, backed by powerful algorithms, specifically tailored to the nature and purposes of each data source.
Back to the user interface for a bit.
Consider a screen in the front-end portal containing functionality for the user to perform. Here’s how we lay it out — navigation is located at the upper left, the content for the user to consume is vertically-scrollable in the middle center, and the actions to perform on that content are located to the lower right. This provides an axis of familiarity to the user, letting them know how to get around, how to consume content, and understand what to do next.
Data Nodes in our Data Mesh are FAIR:
FOFU — the Fear of F-ing Up
FOFU is what happens when the user is presented with a Data Experience that doesn’t match what they need to do — especially in scientific and complex business cases, where doing the wrong thing ranges from worthless to disastrous.
FOFU is all-too-common in the Healthcare and Life Sciences industry, where 80% of all data analysis is performed by services, not software. There are a number of software platforms — and coding languages like R and Python — that can do 100% of the analysis that the user needs to do, but they’re far too difficult to learn and use.
Unfortunately for most scientists and doctors, waiting for help from other people who can use R and Python takes too long — there’s a Last Mile Problem.
FOFU is solved by straightforward and domain-centric data experience. Applications must speak the language of the end user — enabling them to ask questions and understand the answers.
We use this term to describe a Q-to-A Workflow which asks an exploratory question over complex multi-modal data. A single Q-to-A workflow driven from a Data Node can be configured to analyze millions of variables in the data source at once, returning the most significant results to the user for consumption.
Analyzing most/all of the variables at the same time = kitchen sink analysis.
Most of our users don’t use our software from their smartphone — over 90% of usage is from a tablet or desktop/laptop.
So why would we use mobile-first design? Because the process of designing mobile-first makes for a simple, easy-to-use desktop experience. And when other complicated, dashboard-like software platforms (e.g. Tableau, Qlik, Spotfire, Array Studio) produce FOFU, the simple Q-to-A Workflows that we created in our mobile-first, API-Driven Design process gave our users a clear understanding of how to do complex data analysis without confusion about what to do next.
One major advancement of mobile-first design has been a focus on buttons over menus. Old-school desktop software buries most functionality in File/Edit/View/etc. menus and complex right-click pop-up menus. In mobile-first design, the actions you can take on a page are rendered as buttons right there on the page in front of you — not buried in a menu.
Q-to-A (Question-to-Answer) Workflows
When a biologist or doctor approaches a data source, they come with a large amount of supplemental domain knowledge — i.e. there is information encoded in their brains which is not encoded in the data source, and vice-versa.
A domain expert has certain assumptions, hypotheses, and some very specific questions to ask the data. These questions change and develop significantly over time. Software tools built on top of data sources must also change and adapt over time in order to evolve with the user.
Tag.bio provides adaptive, domain-centric functionality via Data Nodes, equipped with Smart APIs. High-level API calls are designed with the user’s questions in mind and give them simple forms to fill out in order to specify their question.
For example, this form below allows the user to configure a highly-domain-specific question in order to find genes that have a relationship with early mortality in a specific cohort breast cancer patients. This question has been designed to be constrained and simple — the only thing the user needs to select is a cutoff for the maximum period of mortality.
When the answers are returned to the user, they are contextualized with visualizations and dynamic text that directly explain how each result pertains to the question asked — as a dynamic figure one might find in a journal article.
Furthermore, after running this Q-to-A workflow, the system has recorded the analysis as a Useful Data ArtifacT (UDAT) in the users’s history, making it re-runnable, reconfigurable, and sharable for collaboration with colleagues.
In contrast to the simple Q-to-A Workflow above, this second Q-to-A workflow below, driven from the Smart API on the same breast cancer Data Node gives the user a much wider selection of options for configuring their question about mortality — the user can change the cohort of patients analyzed, the variable types analyzed, and algorithm parameters.
At Tag.bio, we refer to a workflow that enables broader selection to the user as a salad bar, where the user can pick and choose from a wide variety of parameters to ask their question and produce answers.
Some users benefit greatly from having a salad bar of choices, where other users benefit more from having a simpler Q-to-A workflow with fewer choices. Limiting and targeting user choice is a good way to prevent FOFU.
Borrowing again from the World Wide Web design pattern above, where web browsers and web servers communicate via HTTP, our smart API design provides a universal communication layer across all deployed Data Nodes. This enables our front-end portal or other clients to easily communicate with every data node in order to learn information about its backing data source and the domain-specific functionality the data node will provide in the form of Q-to-A workflows.
In addition, configuring each Q-to-A workflow, and parsing query/algorithm results from each workflow are also facilitated by the same universal communication layer — despite the fact that each data node is performing domain-centric calculations and is passing information to and from the user with domain-centric language.
Smart, eh? It’s also FAIR.
Technical debt is a great example of something everyone should care about, but very few people actually do. It’s what happens when software is no longer capable of working or expanding to new functionality without significant additional investment cost.
All software applications — even programming languages — contain technical debt. In order to make something useful, you have to make certain design decisions which limit what you can do in the future without additional cost.
Organizational culture and development practices like Agile and refactoring are useful for reducing the technical debt of a project, as they enable rapid evolution of technical specifications throughout the software development cycle.
Software architecture is even more critically important for reducing technical debt. Design patterns that segregate and modularize functionality by domain knowledge and technical expertise — e.g. Data Nodes and Q-to-A Workflows within Tag.bio— are incredibly useful for keeping development costs low as data sources change and new user needs emerge.
In Tag.bio, developers who specialize in a domain-specific data source (e.g. electronic medical records) can work to improve that data node, in their language, without impacting or requiring updates to other data nodes they may not understand.
Furthermore, the Tag.bio developer doesn’t need to implement any front-end code, dev-ops code, security code, or API-layer code — they only need to focus on their how their data node project connects to source data and implements algorithms for end-users.
The JSON templating layer of each data node also makes the Tag.bio developer experience consistent across data nodes, so contributors can leave and enter projects with minimal onboarding time. Each data node project looks and feels like other projects — because the templates and coding design patterns are the same — even if the source data and Q-to-A workflows are different.
In contrast to Tag.bio, bespoke applications designed in-house — either as a massive data lake with dashboards or as individual R/Shiny or Jupyter Notebooks applications — contain massive amounts of technical debt.
For example, try updating a monolithic data lake system to include an additional, complex source of data with new end user functionality in less than one person-year. Or try to improve — or integrate — functionality/deployment of disparate R/Shiny applications after the original R developers have left your organization!
UDATs — Useful Data ArtifacTs
Useful data artifacts are an emerging topic within reproducible, collaborative data science. Quoting from my previous article:
…a Useful Data Artifact is an actual digital thing. It is not an idea, a thought, a realization, or an insight. It’s not in your brain — it’s a structured data object, created when you or an algorithm do something with data.
More technically — a Useful Data Artifact is a nonrandom subset or derivative digital product of a data source, created by an intelligent agent (human or software) after performing a function on the data source.
In other words, UDATs — which represent discrete discoveries, insights and trade secrets — are directly measurable and attributable outputs from the source data which you have already invested a fortune in collecting.