The Covid-19 SQL knowledge graph. Making the worlds Covid-19 data accessible and easy to explore.

Patrick Healy
The Startup
Published in
9 min readJun 26, 2020

--

Introduction to timbr and the Covid-19 SQL knowledge graph

The calamity of Covid-19 has caused each of us across the globe to momentarily localise our compassion to minimise the impact of the pandemic in our own countries. As some countries begin to rollback restrictions, focus turns towards.

1) Preventing a second wave of the virus.

2) Reopening the economy

3) Sharing resources, expertise and knowledge with countries who haven’t gotten the virus under control.

All of these goals are made possible thanks to the ethos of researchers, engineers and medical professionals to make their data open source. Without this, our understanding and decision making in the face of the pandemic would be much worse.

Making use of this open source data creates a challenging technical problem. Integrating all of these heterogenous data sources, from across the globe for further research is a difficult and time-consuming task. Everyday spent by researchers processing data is another day their findings aren’t in the hands of decision makers.

Private technology companies across the world are utilising their resources to help in the fight the pandemic. One such company is Israeli tech firm timbr.ai who specialise in turning databases into knowledge graphs quickly and easily, to be queried through SQL, python, R and apache spark. This prevents the need for performing complex joins and unions in SQL.

Working with the Israeli Health Ministry, the team at timbr.ai have been integrating the best open source covid-19 data from across the world into a single connected knowledge graph.

“Our main goal was to build an ontology which focuses on healthcare and government research questions and needs” said Amit, timbr.ai’s CEO. “Once the data was mapped to concepts in the Ontology, the researcher can explore and query it, see the connections and relationships between the data points on a graph or write an SQL query and visualise the results as a bar, chart, graph and more”.

Knowledge graphs provide a solution for organising disassociated date sources into a linkage of the concepts and properties these datasets can be grouped by.

The team at timbr.ai have created 83 concepts for the Covid-19 knowledge graph mapping data across 329 tables from more than 200 sources of information including:

  • Data and results from medical studies.
  • Hospital and Patient records at a country and regional level.
  • Covid-19 testing results and patient demographics.
  • The type of restrictions put in place by each country and the date they were started/stopped.
  • Weather reports.

Based on researchers, and government policy maker’s needs, timbr.ai built the covid-19 SQL knowledge graph within a few days to organise these datasets into a single database. Any changes to the various data sources are added daily, meaning researchers can access the most up to date information without making any changes to their queries.

For a researcher that wants to gather and pre-process data to answer a research question about covid-19, the SQL knowledge graph saves a lot of time. Unless the exact data needed has already been organised into a tidy, machine readable structure, then depending on the breadth of sources and quality of data, they can expect to spend around 90% of their first few days or weeks data wrangling.

If a researcher wants to utilize covid-19 datasets from across the world. Before they can explore the data for relationships, they would have to go through the time consuming process of:

  • Sourcing relevant data and determining its usability and quality
  • Juggling several different technologies, extract this data from a range of
  • heterogenous sources such as text, pdfs, API’s, kernels and knowledge bases such as Wikipedia.
  • Build an automated pipeline to continually gather updates to these data sources.
  • Resolve the relatedness of variables and entities between dataset for joins and unions. This will likely require some machine classification.
  • Build complex joins in SQL, R or Python.
  • Build a database structure.

Depending on the size of the team and the data needed, this process can take days or even weeks. What timbr.ai have done with their platform and backend data warehousing pipeline using python crawlers and apache airflow is to automate this process. This makes it very easy to add new data and concepts to the platform.

The timbr platform and integrated BI tool makes for a short learning curve. Anyone with some basic SQL knowledge can begin creating queries and visualisations to explore the data for relationships and begin conducting analysis in R, Python, SQL, Power BI or Apache Spark. In addition to the integrated BI tool any BI tool that people are familiar with can be connected to the platform.

What are knowledge graphs and why is the timbr SQL knowledge graph useful?

A knowledge graph is a way to group large volumes of data into an easy to understand format that represents the conventional way information is organised by experts across different disciplines. This is known as ontology mapping . Utilising concepts that are drawn from to the various data sources made available by experts and researchers, timbr have created an ontology for Covid-19 data.

Ontologies are a useful way of defining exactly what type of information exists in a data set and are used for mapping out hierarchical relationships between datasets. The timbr platform delivers the powerful features of a knowledge graph combined with the simplicity of SQL.

timbr Covid-19 Ontology Explorer
To create ontologies for Covid-19 data, timbr have been working closely with researchers to define what query-able concepts will help them with their research.

For example, through selecting a node of the covid-19 knowledge graph, it is possible to get a list of all the features of hospitals. Which can be used to queried in the SQL database. The additional benefit of the class hierarchy is that even if Korean Hospitals share little or no properties with data from other countries, the data can still be aggregated because they share the same parent concept.

This allows you to query a concept such as ‘hospitals data’ as if it were all one virtual table.

Knowledge graphs are a visual method for storing data, they provide a unified view of a range of otherwise unconnected data. Once ontologies have been mapped to various datasets, a knowledge graph of all of these datasets can be built with nodes to represent concepts and linkages to represent relationship between different concepts and their properties.

After mapping concepts to all of these datasets, timbr used their expertise to combine them into a single knowledge graph platform through which it is possible to query all the data about Covid-19 that timbr has collated from across the world.

“time was of the essence, we were able to leverage timbr ontology mapping capabilities to map hundreds of datasets in a matter of days and have the knowledge graph connected to a variety of tools “.- Amit, timbr.ai’s CEO.

These datasets are grouped into concepts represented by nodes. Each node lets you explore all the different data sets mapped to that concept and show the relationships between different datasets to see exactly what data can be queried.

timbr ontology explorer. An interactive way to look every concept.
Visualise the relationships between concepts to help build your query.

Explore the data to find trends and relationships.

The timbr SQL KG platform incorporates both aggregate and time series data to cater to a range of research interests.

From the knowledge graph you can fetch the data from anyone of the nodes. For example, if a researcher want to look at the success of restrictions implemented in each country, the embedded BI tools to quickly create visualisations from the data and look for relationships in the covid-19 patient data.

The imbedded BI tool also allows for these visualisations to be quickly and easily combined into a dashboard of visualisations to communicate a range of information, trends and relationships. From any of these visualisations you can also create select and edit the corresponding SQL query that has been generated by the BI tool to build on the analysis.

For example, we can look at what restrictions where put in place and removed in different countries and how effective they were. The figures below show the impact of restrictions on patient numbers in Israel (Figure 1–2) and Italy Figure (3–4)

Figure 1 : Israel country restriction

In figure 1 we can see all the action that countries took during the Covid19 crisis. Here we see the actions taken in Isreal.

Figure 2 : Israel new confirmed Covid19 cases by date.

Looking figure 1 and 2 we can see a correlation between Israel policies and the number of new confirmed Covid19 cases. As we can see over the time after those actions taken the number of new cases has reduce over time.

Figure 3 : Italy country restriction
Figure 4 : Italy intensive care patient by date

Similarly in figure 3 and 4 we can compare Italian restrictions to case numbers to assess whether this might be useful information for research.

Without timbr, for a data scientist to process the relevant data from different sources and formats into a power BI tool to create similar visualisations, they would require a large, messy SQL query. To query information such as patient status, country level test information and latitude & longitude for visualising on a map requires pre-processing different datatypes with two large left joins for getting information for country tests and longitude and latitude.

SQL query needed without timbr

The timbr platform saves researchers time through making complex queries like the one above shorter and clearer.

The query with timbr

Instead of performing joins they can use the relationship between tables to generate the necessary columns from the concepts they are interested in.

In addition, they can get any desired column from the shared properties of those tables (tests , longitude and latitude etc.) by utilising the knowledge graph relationships.
This is exactly the same query as above.

“ We challenged researchers to write those UNIONs in SQL and see how long it took them “ -Amit, timbr.ai CEO

Some examples of visualisations made with timbr.

Figure 5: Israel location activity at different hours of the day.
This information can be used to help keep citizens informed about how busy places are before they decide to make a trip.

Decisions about cities and areas in Israel that were isolated as well as what countries Israel can start travelling to and receive tourists from relied on information such as what is represented in figure 5 from timbr.ai’s covid19 knowledge graph

Figure 6: Average age of patients by country. Using the demographics of individual patients it is possible to
Figure 7: Number of intensive care patients by country. Get the most up to date data without having to change your query.

Conclusions

As countries begin to relax restrictions it is especially crucial to provide researchers with rapid access to the data they need. Whether you’re a policy maker, researcher or data scientist timbr.ai hopes that the convenience of their knowledge graph can help you by cutting out the data collection and cleaning process. The platform is being continually updated every day and the ontologies will be expanded as more researchers and governments work with timbr. Contact the team if you want to hear more.

Acknowledgments

I would like to thank Amit Weitzner the CEO of timbr.ai and Jonathan Weitzner & Michal Kovarsky for giving me access to the platform and talking me through the features. The company is doing some fantastic work. Any organisation looking to build an understanding of the concepts and properties in their data you should have a look at timbr.ai.

--

--

Patrick Healy
The Startup

Passionate about social science, economics and data science. Always happy to connect on LinkedIn https://www.linkedin.com/in/pat-healy/