Growing the Humanitarian Exchange Language (HXL) ecosystem
I’m not sure if I am laying out a plan for the future of the Humanitarian Exchange Language (HXL) ecosystem, or throwing out some ideas to judge responses, or just sharing what I plan to promote personally, but here is a speculative piece on how to progress the HXL ecosystem to make it more useful for users. Hopefully in the future I can return and write more details about particular points. If you are new to HXL then it might be worth reading this 30 second tutorial before continuing.
1 Year — Planting the seeds
Humanitarian data tools are being created to encourage creation of HXLated (verb — to add HXL tags to a dataset) datasets for open data. Increases the number of HXLated datasets in the ecosystem.
At the moment, there aren’t many real advantages to adding HXL tag to your dataset, but change is around the corner.
Currently the biggest reason to HXLate is to leverage the HXL proxy. I use it to power many of my data visualisations and work flows and wrote some use cases up here (although dated now). However, this is used in quite technical approaches and associated technologies, so not everyone will find it useful.
Going forward the humanitarian data exchange (HDX) team is working on a project called quick charts. This is a tool that builds quick online dashboards of your data with very few clicks or work need. The trick here is that it looks for which HXL tags are present in your data and then takes a best guess at providing sensible options to chart. It currently only operates for data on HDX, but it is worth finding a dataset and having a play.
Quick charts used on a dataset of Ebola treatment Centres
Quick charts used on a dataset of damaged houses in the Philippines after Typhoon Haima
HXL check is another example of a tool that could be created. I don’t want to go into too many details as it is still being thought out, but the general concept is a tool that you can run your HXLated data through and it will suggests changes and corrections to make. Think of it as a spell check for data.
Tools will also be created to help information managers add HXL tags to their datasets.
2–3 years — Growing the ecosystem
Tools will be adapted to work with non-HDX online data and offline data. An algorithm and process derived to share particular assessment data in a safe and easy to use way that protects affected people. PCoded geo service launched which is compatible with HXL services. First tools to provide automated data interoperability through shared spatial qualities. Data collection tools output in HXL formats
The tools written about earlier only work with online data and sometimes only HDX online data. This will change as the tools are adapted to work with any HXLated source that is online. Further down the line the tools are adapted for local offline use as well to encourage the use of HXL tags at all levels of privacy. Organisations will start to HXLate their internally facing data.
An abundance of assessment data sits siloed in organisations as they contain sensitive data and so cannot be shared. People are comfortable sharing conclusion in a written report, but do not share the conclusion in a data format. An algorithm and process will be created to help share data derived from these surveys in way that is useful for analysis, but still protects those affected (I will write up more about this method in a future post for comments.) and early concepts hav been trialled such as with the Kenyan Red Cross
A web geo service will be launched so that humanitarian data tools can leverage mapping. At the moment there is no centralised resource where consistently formatted and updated administrative boundaries can be queried. For example every time the 3W tool is used on HDX then it is a manual effort to create the geo file in the right form and format, and to check it matches the joining data. This is a barrier to scaling.
One of the key goals of a data standard is to help increase interoperability. By this stage with more datasets using HXL tags, information managers will have an easier time understanding datasets and using tools to merge and join different datasets. With the new geo services though this interoperability process can be sped up with the use of shared spatial qualities.
We could now take a dataset and automatically find the corresponding geometry files. This means multiple datasets could be mapped on top of one another to provide visual interoperability. For example, we have a dataset of ebola cases and local medical centres — a tool could now map the ebola cases to the right geometry file and overlay the medical centres with little effort, automatically updating when the underlying data is updated. Here we can see this product made with the traditional method.
An early concept of the above is being trialled on HDX with the map explorer. At its full realisation user could then drop a HXL dataset in and it would be automatically mapped on top.
As HXL becomes more prevalent and tools easier to leverage, data collection tools will start offering HXL as an output format and using the tools provided integrated into their products. There are some front runners already in this field with ONA already supporting HXL.
Year 4–5 — Pollinating the ecosystem
Continued adaptation of tools for internal and private data environment. Algorithm to build network of HXLated data to create implicitly-linked data. Standardisation of units via attributes.
Tools will continued to be developed so they can be used offline. The latest iteration of the HXL proxy will provide complex data transformation in a repeatable way as a desktop app; recipes are able to be shared between people, processes standardised meaning they are less dependent on individuals.
The next part is a big piece of the picture. Creation of implicitly-linked data through building a large index of all existing datasets, their context and how they relate to each other. The original version of HXL was based on linked data and is one of its huge advantages. While the second iteration is getting more traction it does lack the technical advantage of linked data. This is an attempt to make up for that and requires some heavy lifting.
Every HXL dataset is compared to every other HXL dataset and an index of how well they join and merge is formed.
For example if one dataset contains Ebola cases by district in Sierra Leone and another set contains population data by sub-region in Sierra Leone, then these two datasets match horizontally and are joined in the index. If there was another dataset with Ebola cases by sub-region in Guinea then this could match the Sierra Leone District dataset vertically. This relation is then stored in the index.
Private datasets could also be included in the index map by making using of just meta data plus unique IDs. With HDX connect now available, it acts as a good precursor to this. In the end every set is compared to every other to understand overlaps and similarity between datasets.
A network is then created of data and how they relate
Why is this useful?
Tools can then be built to leverage this. Going back to our Ebola cases example, a tool could then suggest, ‘Do you want to see per capita rates?’ and automatically pull through the population data and do the maths, or the tool already knows of other datasets of ebola cases and offers to pull through the rest of the data, or it highlights that another dataset exists, but has conflicting numbers and is worth checking.
One problem with this approach is you could be comparing apple and oranges. One dataset could be suspected cases while another confirmed only. I don’t think HXL will ever completely solve this problem and the user will always have to refer to the meta data, but HXL could define attributes for units to help with this problem.
The network could also build a picture of what is missing. By looking at countries that are deemed as complete and compared to lower covered countries the difference can be spotted. The network could then be used to generate templates for the data that is missing. It could also suggest improvements to datasets such as this dataset is missing two expected rows for these regions. Can these be added?
5 Years plus — Bearing the fruits
Beyond 5 years the basic infrastructure is in place for a sustainable ecosystem which increases the speed of raw data to decision making.
Is this possible? I believe so. Doable, I hope so, but it very much depends on the community adoption of HXL. The latter years and full value can only be implemented once there is a dense amount of data. Let’s see how it goes, maybe next year I will write a completely different plan once reality sets in.
Thanks to Shruti Grover for the top image and David Megginson and David Pyatt for the suggestions.