The Architecture Decisions Behind DKAN in D8

Kim Davidson
DKAN Blog
Published in
4 min readSep 23, 2020

We learned a lot from building and maintaining DKAN 1.x sites, and when we started rebuilding DKAN in Drupal 8 from scratch, we had the opportunity to take what we learned and create something more performant, sustainable, and flexible. That meant we had a lot of decisions to make, and we wanted to share how we made a couple of the biggest architectural decisions affecting the architecture of DKAN 2.x — moving from a profile to a more modular architecture; using the schema to save metadata, rather than breaking it up into separate fields; and a decoupled front end. In future blog posts, we’ll dive into these in more detail, but here’s a high level overview of what these change accomplish.

The big architectural changes

First, the switch from Drupal profile to a collection of modules and libraries.

DKAN 1.x, as a profile, locks down the entire build into a very specific configuration glued together with features. That gave us some advantages, but it also caused a few problems we wanted to address:

  • Adding new or custom functionality to a profile requires feature overrides or patches, which increases not only the complexity of the build, but also the burden of maintaining compatibility, which can cause issues with upgrades down the line.
  • Development also comes with the complexity of standing up a full site and understanding an entire system and what hooks are firing and when. That makes debugging difficult, due to the numerous intersections where things can go wrong, and you find yourself digging through logs to figure out why something unrelated to your new code is breaking the site.

DKAN 2.x is built as a collection of discreet services and libraries. What are the benefits?

  • By building simple and specific things to do specific tasks rather than taking a contrib module that did almost the thing we need (migrate, feeds, workbench) and rewire it into a hard to maintain fork of the original, we avoid overly complicated code.
  • By splitting off each feature into a separate service, now those features can be developed and tested independently of the entire project. This separation means less complexity and increased efficiency, speed, reliability and ease of development, deployment and maintenance.
  • Tests on these independent services can provide detailed error reporting that will give you the exact point of failure.
  • The queueing system can now handle large operations without bogging down the web server.

Second, avoid saving metadata into separate fields and focus on the schema.

  • By keeping the metadata in separate fields, it is hard to deviate from the default schema, performance is slow, database processes are “expensive”.
  • Adding or removing fields is complicated and prone to errors.
  • In 2.x we focus on the schema to provide transparency and accountability. All inputs and outputs are beholden to the defined schema.
  • A single field to store the json object makes it incredibly easy to have custom schemas.
  • Validation against the schema provides even greater transparency around the data quality.
  • Standardizing column-level metadata also will allow more powerful queries across multiple datasets where the datastore columns share common values.
  • We have improved revision history because we’ll have a direct comparison of the previous json values rather than comparing against drupal fields.

Third, a decoupled front end.

Again, simplicity was the goal. Having a library of data components, hooks and services that could be arranged in any number of ways to suit the needs of each project. And a library of templates that would represent the default implementation.

  • Data-catalog-components: the building blocks.
  • Data-catalog-templates: default layouts for pages and complex component groupings.
  • Data-catalog-app: the boilerplate for starting a new frontend, holds site specific assets — images, favicons, css.

Where’s the Drupal?

At first look, it might seem like the content management tools that Drupal provides have been ignored, but what we’ve actually done is provide the opportunity to use more Drupal options.

By confining DKAN to a set of modules, we have freed up your build to define its own Drupal path. We’ve found that every project has needed something different so we are leaning hard into flexibility. DKAN will not force you to use a specific configuration; you are free to build out any content types, roles, permissions and workflows you require.

Want a bit more structure? Eventually, we will have pre-built options for anyone who just wants to stand up a “default” data catalog. And a Drupal theme that embeds the dataset search app.

Where are the user interfaces?

For our first release, we determined that command line administration and a bare bones default Drupal admin user interface would do for the MVP. There is an embedded react app for editing the json metadata which will be replaced by use of Drupal Form API soon. In the meantime, our UX team has been gathering information about what users need to share schema-compliant metadata, and we’re currently building user interfaces for all the features that data publishers need.

What’s next?

As we iterate on this release over the next year, you’ll find a new set of exciting features that you can check out on our roadmap here. Don’t see something you need? Tell us about it! We welcome collaboration, pull requests, and feedback. We hear you!

--

--