The ODC Retrospective for 2018

Alex Leith
OpenDataCube
Published in
8 min readMar 4, 2019

Over the past year, a community of individuals working in diverse organisations around the world have all contributed to the Open Data Cube (ODC) project in a variety of ways, from improving documentation to continuing to develop the foundational technology. This article will discuss the work that has occurred over the last year, and outline some of the strategic direction for the coming year.

This article is structured into six key areas: Applications, Documentation, Deployment Models, Technology, Governance and The Future and will chart a course through the project touching on each area in turn. Sometimes it feels like a large project like the ODC, which has a lot of players working on it in their own way, is not moving very fast or far. But the reality is that when you look back, it’s easy to see a lot of changes and exciting work happening.

Applications

Potential applications of the ODC are many and varied, but there have been a few notable applications that are worth mentioning as they illustrate the scope and types of applications currently being addressed.

One of the more impactful is the African Regional Data Cube (ARDC), which is a project that deployed the ODC in five African countries: Kenya, Senegal, Sierra Leone, Ghana, and Tanzania, including training and documentation. Over 300 people across the five countries have been trained in the use of the ODC to explore issues such as water quality, agriculture and deforestation. The ARCD uses an ODC-backed user interface, called the Datacube UI, as well as a Jupyter notebook data science environment, to provide access to a suite of data products and analyses to technical and non-technical users.

In Australia, Digital Earth Australia have put together a suite of public data products as well as an ODC-backed OGC Web Services system, which means that data can be visualised in real time in applications such as the National Map. Visualisations of raw data and on-the-fly computed indices are available across Australia, demonstrating that the ODC can work at scale on modern cloud infrastructure and not just in the traditional High Performance Computing (HPC) environments that it has been deployed on in the past.

Landsat 8 2018 surface reflectance geomedian at 25 m resolution showing a false colour visualisation of South Australia.

There are a number of smaller software applications in the ODC family, such as the Datacube Explorer, which shows a view of what data has been indexed into an instance of the ODC. And there are countless applications and case-studies that use the ODC for analysis.

Work by Partner organisations has also made significant use of the ODC, including the UK Satellite Applications Catapult’s Common Sensing project, carried out as a part of the UK Space Agency’s International Partnership Programme (IPP) and aimed at building climate resilience with small Pacific Island nations. The CEOS System’s Engineering Office (SEO, backed by NASA) has promoted data cubes across a wide range of countries via the ODC-driven CEOS Data Cube. In conjunction with CSIRO, the CEOS SEO is supporting an initiative by the 2019 CEOS Chair, the Vietnam Academy of Science and Technology (VAST), to expand the success of the Vietnam Data Cube (an ODC instance) to cover the broader Mekong River region to address challenges around water, climate, and agriculture.

And the ODC community has initiated a library of exemplar applications that utilise the ODC to run. The Data Cube Applications Library (DCAL) is a growing catalogue of code, documentation and training for algorithms across a variety of topics. This library is expected to grow as ODC evolves, with new algorithms for interoperable data in development, and with community contributions and growth.

Documentation

Documentation is challenging because it’s easy to do a good job, but hard to do an excellent job! It’s very important for technical users who are trying to learn the system and also for decision makers who would like to know if the ODC is suitable for their organisation’s purposes. One year ago we had documentation in a few different places and unfortunately, we pointed people to those places inconsistently. There have been a number of initiatives to consolidate and refresh documentation, including improving the website and making incremental improvements to the core documentation. While there will always be room to improve, we’re in a better place than we were twelve months ago, and we’ll be better again in another twelve.

One key area of progress is the recently completed high-level architecture documentation. This is an important area that has needed clearer communication because the ODC requires a few pieces of tech to get it working (as this article outlines), and rather than being a solution in itself, it’s a tool that enables one to build a solution. So articulating how the ODC works from an architecture standpoint is important.

A high-level overview of the application environment around the ODC.

Deployment Models

In addition to documentation, there have been a number of reference deployments put together. One of these is known as the Cube in a Box, which is a Docker-based system that you can use to run the ODC on a local machine, for example, a high-spec laptop, or on a single instance in a cloud environment. The Cube in a Box contains enough information to demonstrate what is required to run your own instance of the ODC, and can serve as a quick start guide.

Another deployment model that has been developed is a Kubernetes-based JupyterHub environment that supports the ODC. This work is still in development, but demonstrates how a piece of serious infrastructure can be defined in code and deployed reasonably easily. Another example of this kind of infrastructure is the repository of Helm charts, which is maintained by Geoscience Australia, and that provide a way to deploy a number of components of the ODC core technology, like indexing systems, the explorer and the OWS service.

A diagram showing different kinds of deployment models for the ODC.

Technology

At the core of the ODC project though, there is the fundamental technology. The most recent release of the ODC is version 1.6.2 and this can be installed using the standard Python Packaging Index or with Anaconda. There are automatically building Docker images too. So there are a lot of ways to get the tool into your project. The ODC is an open source project, and you can participate and help by raising issues, providing bug fixes, core code contributions or by improving documentation at the ODC’s GitHub repository.

In addition to the stable core project, there are a number of initiatives that are more experimental. CSIRO are working in a new space that they’ve created called ODC-Labs. There are two key projects in ODC-Labs, the first is the S3 Array IO (S3AIO) driver, which is an ingested data optimisation system for reading and writing data on AWS’ S3. The objective of the S3AIO work is to investigate alternate data arrangements and assess the impact on ODC performance in the AWS S3 storage and compute environment. The second piece of work being undertaken by CSIRO is a tool called the Execution Engine, which has the goal to achieve performant distributed execution of functions over data of any size with results mapped back to the distributed storage in the AWS Cloud.

Governance

The ODC has a governance model that is formed around two committees, the Partners Forum, which is formed from Institutional Partners, or leaders of organisations that support the project financially, and the Steering Council, who are the decision makers for the project, and which is comprised of significant contributors to the project.

One of the major confluence points for both of the committees of the ODC is the ODC Conference where members of the ODC community come together to discuss the state of the project, to make plans for the future of the project, and to do collaborative work. This year’s conference was focussed on using the ODC, rather than building it, and there were people attending from organisations including NASA, USGS, CEOS, Geoscience Australia, CSIRO and the Satellite Applications Catapult. While some groups worked on core technology, looking at getting Dask and the ODC working together, others worked on platform integration and development of applications related to the UN SDG indicators. We hope to write up the outcomes of the ODC Conference hackathon soon.

The Future

So what’s next for the ODC project? There are a number of areas that are being worked on actively at the application level that leverage the core ODC project to get work done, and that core API is stable and strong. But there are some architectural decisions that have been made in the past that may need to be remade, and so it is perhaps time for a ‘breaking changes’ release over the next year. Some changes to database structure and the core API may be required. This is not taken lightly, and will require careful planning.

For new users and folks who are interested in learning the ODC, we’re hoping to have an announcement soon around an open sandbox environment, which will serve as a demonstrator of the ODC capabilities, as well as a serving as a training ground. The sandbox aligns with some of the ideas around seeking to improve the developer experience around the project. In fact, a lot of the goals over the last year have been leading towards improving developer experience! And in addition to bringing in new users, work will continue on breaking new ground in the ODC-Labs and DEA-Proto spaces.

And looking outward, there are a number of opportunities for alignment with other projects, such as STAC, and continuing to improve the performance of the ODC when reading Cloud Optimised GeoTIFFs.

One of the most exciting projects that is kicking off this year is Digital Earth Africa, which will be a continental scale implementation of the ODC, similar to Digital Earth Australia. Digital Earth Africa is a collaboration between GEO, Geoscience Australia, AfriGEOSS, and the World Economic Forum. Digital Earth Africa will aim to improve the progress of Sustainable Development Goals in Africa, as well as provide insights into the changing African environment. A data cube in Africa will be able to translate the wealth of Earth observation data that exists covering Africa into products that can inform responses to issues such as agriculture, water availability and land use.

In Closing…

To summarise, there has been a lot of great work achieved over the last year in the areas of documentation, reference architectures and in building great systems that work on top of the core ODC project. We continue to align with emerging paradigms, such as STAC and COG, while also supporting the current implementations and global users. The ODC project will continue to improve and evolve, and strive to ensure that we continue to work towards our objectives of increasing the impact of Earth observation data and supporting our community of users and developers.

Please get in touch with us if you’d like to find out more, or join us on Slack to chat.

--

--