The modern data experience: Code-first or no-code?

VFisa
7 min readMar 17, 2022

--

In recent years, no-code tooling has seen a surge in popularity and led to the democratization of software development — letting users with no coding experience to create applications and solutions easily. While in many cases it has done just that, data engineers have been frequently frustrated about the cons: no real version control, vendor lock-in, poor scalability, and limited options.

But what if there was a way to have the best of both worlds? To empower common data analysts with tools that are easy to use so they can deliver results fast, and an option for technical users to configure their code with code-first tools such as dbt? Let’s see how you can do just that.

Data democratization

In her article No Code is the Future, Sarah Krasnik pictures a world divided into no-code and fully code-driven data tooling. Among the main disadvantages of no-code tools mentioned are lack of versioning, templating, and making quick modifications in bulk, which can be seen as completely valid points from the perspective of hardcore engineers and practitioners.

I believe though, that the data world has finally come much closer to the “ordinary analysts and domain experts”, who as a result, are now even better equipped to get the real value of the data.

It started with the analytics engineering movement, which blended the roles of hard-core data engineer and data analyst through tools like dbt, dataform, and others. Although this is a positive push, an analyst is still required to learn git (and its all craft of using it), CI/CD concepts, become a friend with command prompt, etc. That is why I believe this is not the full data democratization we have been promised.

First, we are still leaving out the business analysts and domain experts that are still entrenched in excel and are more or less passive consumers of BI reports. This leads to maintaining the status quo of everlasting loops of requests on its data team, who are still seen as bottlenecks for innovation within companies. Teams constantly submit new requests of prepped datasets, new reports, or BI adjustments, making the real purpose of the data team and the best way to handle such requests common sessions at data-related conferences.

Second, even though there is a push to build up data knowledge within data prep tools, such as user-generated metadata, model descriptions, tags, classification, sometimes even handle user discussion, I think that it is the subject/domain experts who are the best suited for this.

Moreover, we see businesses diving into data democratization further down the road of data ecosystem (such as data discovery and catalog tools) or actively including domain experts in the data prep process. That means though, that we cannot require them to have deeper technical skills such as git etc. Those are commonly mentioned advantages of “Modern Data Stack” tools.

In reality, those users are seeking for no code/low code tool experience, because they are very often the people with the best ideas how to use data to serve business but are annihilated by the complexity of the tooling (tools by engineers, for engineers). As the article correctly points out the disadvantages of no-code tools from the engineering perspective (heavily promoted by MDS practitioners), it is important to see the other angle of the unserved audience. In other words,

Modern Data Stack does not mean a modern data experience.

The community should strive to build products that offer both, a delightful UI for common analysts that focuses on UX but is powered by code and APIs underneath, so it allows more technical data teams to improve data processes, build upon internal POCs, automate and save time — simply to augment the rest of the company:

The real purpose of a data team is to bring value to the business through data and to augment the rest of the business to achieve that.

Unified point of view — Keboola story

As I highly recommend reading the article and the examples, I would like to question this black and white view of tooling and bring some of the examples from our platform, Keboola.

Keboola is an all-in-one platform built on the strong foundation of API (for instance, UI is just a thin layer on the top of the API) and the notion of “componentization” of everything. In other words, the platform provides simpler, no-code experience utilizing code-driven configuration control and operations underneath.

Additionally, the platform itself puts a layer on the top of basic building blocks of cloud infrastructure so it is more reachable. For instance, there are components, not EC2 and docker images that powers them, there is a storage, not database which dictates everything, including data access rights, etc.

This abstraction of common cloud primitives also helps even less technical people to use the platform, execute internal POCs and projects faster and allows true data mesh concept. And since there are full API and strong componentization concepts within the DNA of the platform, it can serve both audiences.

Let’s look at some examples… (Warning: Next section is screenshots heavy)

Example 1: Infrastructure as code

Infrastructure as code means to allow data practitioners to clearly define and version control the whole data setup, ideally throughout the whole data process (extraction, transformations, reverse ETL, ML ops, etc.).

At the moment there are pieces of infrastructure that can be defined like that (such as through terraform), but it is mostly present as a deployment of configurations for SaaS tools which provides a layer on the top of real cloud infrastructure.

Keboola, as a SaaS product is similar, though since it covers the whole data stack, it allows to define the infrastructure through extraction phase, transformation phase (SQL, R, Python and other backends), reverse-ETL components (aka writers) and other clever apps interacting with other SaaS tools and controlling other infrastructure.

This is an example of the common data pipeline from the UI perspective:

Platform screenshot: Simple data pipeline (my personal project)

And a detailed configuration of extractor within the UI, you can notice the simplicity of adding a new one from the library.

Platform screenshot: An easily accessible, built-in component library

From a common user perspective, each component has a simple UI for configuration:

Platform screenshot: Simple component UI

Users can add descriptions of the configuration which lives with the code underneath and helps to work within the team.

From the data engineering angle though, all of that is available through Keboola as code: creation of the configurations from code and its materialization through CLI push or CI/CD git pipelines.

VSCode screenshot: File browser of the Keboola project

Example 2: Configuration as code

The setup of data pipelines and infrastructure can be almost always represented by the code configuration. There are even tools built solely as code-first, such as dbt.

Whereas this is a very comfortable state for developer minds, this is surely not an optimal use experience for the less technical minds. Sarah also points out that this sometimes also possess challenges for developers, such as unfriendly debugging of misconfigured YAMLs, I would add the YAML itself (I will not go into the holy war of YAML vs. JSON, and now vs. TOML as well).

Additionally, adding scripting language within configurations, such as dbt’s use of Jinja is super powerful, but it can be super hard to debug uncompiled scripts that are Jinja heavy.

At Keboola, we believe in the principle of providing utility, rather than technology itself. For instance, the platform provides out of the box versioning functionality (with the diff/restore/fork built within UI), rather than requiring user to start with the repository first:

Platform screenshot: Getting git features without command prompt

However, as the platform serves to both of the personas, versioning is also available if you use Keboola as code approach:

VSCode screenshot: Component configuration file in VSCode

So a data person can change configurations, transformation code, but also descriptions and other metadata.

The middle ground

Whereas I agree with Sarah’s critique of no-code tools, I believe we should approach this from the middle ground and use tools and platforms that are able to serve both audiences. Additionally, we should put more effort into addressing this disconnect between tech people and the rest of the company.

Keboola brings this unique approach of enabling all personas to collaborate, design new data use cases fast and bring them in the production: majority of users are working in the platform through the rich UI, whereas an API layer allows to set machine to machine interactions (which is common in the enterprise setups), and Keboola as code (CLI) that empowers code-first, engineering persona.

Image: Different persona interacting with Keboola platform

Where I see the next further development though is the business app frameworks, which allows companies to develop their own internal or customer facing apps fast and with less code — all possibly powered by the database itself as a center of gravity.

Where I see the next battlefront is the business apps frameworks

The Snowflake acquisition of Streamlit is a good example of the focus shift towards stack completeness and addressing all users. Luckily there is already quite a lot of other tools (Retool, Budibase, Appsmith, possibly notebook-based tools like Hex, Deepnote), and I bet we can expect even more in the future…

--

--

VFisa

A online presence of Martin Fiser. Sorting out data challenges @Keboola, Vancouver, BC.