DKAN 1.16 released with revamped datastore, other improvements

Last month, we released DKAN version 1.16. While we did a lot of testing first, it contained some major changes to key components , so we’re treating it as more of a “soft launch.” While we still recommend approaching this upgrade with caution, we have now upgraded most of the sites we manage directly and feel ready to announce these exciting new features publicly!

Note that we recommend upgrading directly to the most recent release, which is 1.16.1 as of this writing. See the releases page for more information.

The big news: a complete refactor of the DKAN Datastore

The datastore is an optional piece of DKAN. Some data portals are entirely file-based and have no need to store anything other than a dataset’s metadata in the database. For others, though, the datastore — which imports CSV and similar files as tabular data into the database — is mission-critical. The datastore lets users preview extremely large datasets easily, and creates API endpoints that third-party applications (like custom data visualizations) can query directly.

Previous versions of the DKAN Datastore have used the Feeds module to bring files into the database. While Feeds has served us well over the years as a community-supported framework for importing CSV and similar files into database tables, it has also added a lot of overhead and bloat to what should be a simple system. The Datastore module has now been completely re-written to make it faster, more stable, and more modular. A properly object-oriented, decoupled architecture allows its various classes to be extended.

Let’s take a quick tour. The “Manage Datastore” tab will still appear on resources that contain compatible tabular data:

Clicking on it, though, reveals a different form in place of the Feeds-based UI previously available:

Note the “Importer” status set to “Simple Import.” The new modular architecture provides the ability to create additional datastore integrations in DKAN. The “simple import” is the only option available by default, and uses a lightweight CSV parser to read a file into the database. The simple importer will import as much of the file as it can quickly, and the status will then show as “done”:

With the resource in the datastore, the “Data API” tab is now available.

If the resource was too big to import quickly, it will be delegated to finish in the background on the next cron run.

The datastore does ship with an additional importer, the “fast import” that takes advantage of MySQL’s very fast LOAD DATA commands if available on the server.

Currently available datastore modules

Plans to support additional options for datastore infrastructure (such as a second, dedicated MySQL database, a cloud-based solution like Amazon RDS, or even a 3rd party service like Carto) are in the works. Developers interested in creating their own extensions can follow the model of the existing importers.

Datastore behaviors can also now be controlled through the Dataset REST API, so that you can automate actions like importing resources to the datastore and dropping datastore tables.

Note: While the system for getting data into the datastore has changed significantly, the API for querying the datastore remains the same.

Other improvements in 1.16 and 1.16.1

Improvements to DKAN command-line tooling

While not a change to the core DKAN codebase, this release marks the release of a new command-line tool for working with DKAN, DKAN Tools. This will make it easier for anyone to stand up DKAN locally, manage Docker containers, and use different CI pipelines.

Look for a new blog post in the coming days about DKAN Tools and migrating to its new workflow. DKAN Tools solves a number of problems with DKAN Starter, and support for that tool will be phased out.

Support for non-date values in the dataset “modified” field

The “Modified Date” field has been converted to a text field to accommodate ISO 8601 repeating interval values such as R/P1D and R/P2W. Previously, this field had been a date field, which was incompatible with certain values that would be allowed in Project Open Data's modified field.

Support PHP 7

Many great performance improvements happened in PHP 7. DKAN has been updated to be compatible with php 7.1 and allows users to take advantage of those improvements. Our CI and testing infrastructure has also been updated to use PHP 7.1.

We’re excited about these updates to DKAN and we hope you are too. We’re always happy to hear from you so feel free to let us know how this is working for you in the comments or join the conversation at dkan.slack.com.