Today’s Biomedical Research Requires New Technologies
Understanding the underlying complex relationships that make up the functions of a living cell is fundamental to helping us cure diseases. Although cells were first discovered over 350 years ago, there is still a significant amount that we do not understand about how they operate. Recently, data describing such mechanisms has proliferated, massively, and is available to scientists all over the world. Not only is this type of data very big, in multiple formats, and stored in countless, disparate locations; but it also contains immense amounts of variety, variability, and volatility.
Traditionally, such data has been stored in relational databases such as Oracle and MySQL. However, storing such complex data in tables comprised of thousands of rows and columns, trying to find insights using multi-join queries, which connect separate tables, quickly becomes unwieldy and computationally expensive.
Promises and Limitations of Graph Databases
More recently, graph databases have been suggested as an alternative to relational data models. Graph data structures enable users to be more expressive than relational databases, both in modelling and by making it simpler to traverse across highly connected data.
However, graph databases are also inadequate, as they make modelling highly interconnected data in a correct and meaningful way, much more difficult. Anyone that has given RDF or semantic web graph models a shot, has also faced a high-barrier to entry, given the sheer scope of the technical landscape needing to be understood to be effective.
“Schema-less” or “schema-on-write” are terms we hear frequently from graph database providers, but the reality is that without a schema, every fact, every relationship is treated like a unique object in the database. This becomes very difficult to manage efficiently when you could have thousands, if not millions of relationships in your data.
Performing large scale analytics on complex biomedical data, order and structure, a schema imposed on the data, provide many benefits. For example, due to a lack of schema, graph databases cannot guarantee data consistency through schema constraints and require many more resources and development work to maintain. We go into detail on the limitations in this talk.
Creating Abstraction Layers with Grakn and Graql
Nonetheless, graphs have tremendous potential, which is why Grakn leverages a subset of graph theory (hypergraphs) to offer a higher-level and more powerful query language. Effectively, Grakn goes a step further than graph databases by building an abstraction layer over a hypergraph database, providing a fifth generation language in Graql.
Graql abstracts away the low-level implementation complexities of the graph and makes it simpler to model complex domains. Additionally, Grakn offers a native reasoning engine to perform automated reasoning and infer new relations, new facts, in real time. The querying, schema, and reasoning all happen through Graql.
The power of Grakn/Graql is in the simplicity and elegance that it offers when modelling vast and complex domains. Moreover, it also enables us to discover hidden patterns in our data, generating novel insights, useful to data scientists, domain experts, research professionals and doctors. Grakn works at a higher level than other technologies and is easier to learn, reducing the barrier to entry and enabling millions of developers to have access to graph technologies which were previously inaccessible.
Grakn gives a powerful platform to represent, store, and retrieve today’s complex data, which is why we are seeing so many medical researchers across the life sciences value chain using this technology. Grakn also offers high availability and horizontal scalability, enabling researchers and teams to collaborate and scale their work effectively.
Grakn In Action
Grakn is quickly becoming an industry standard to represent heterogeneous biological networks. For example, companies like Bayer, GSK and AstraZeneca are using Grakn to accelerate their drug discovery pipelines.
Bayer and AstraZeneca also leverage Grakn through the use of Machine Learning and Natural Language Processing, to predict and produce new therapeutics. In companies such as these, Grakn enables scientists to powerfully connect different types of data, making it trivial to retrieve data from any source, in any format, while reducing time to value and making researchers much more productive.
You can hear directly from Bayer and GlaxoSmithKline on how they are taking a data driven approach to drug discovery and drug repurposing at Grakn Orbit April 21–22nd - secure your spot here.
Grakn’s language is particularly loved by users for its type system, simplicity, and elegance. Graql’s ease-of-use enables domain experts, researchers, scientists, as well as those without any coding experience, to quickly understand how to write valuable software for their use case. For example, modelling a gene and all its various identifiers is as simple as writing:
gene sub entity, owns entrez-id, owns ensembl-id, owns gene-name, owns gene-symbol, owns gene-id, owns kegg-id;
In addition, Grakn’s type system allows you to create very expressive models, for example using UMLS as a reference:
physical-object sub entity;organism sub physical-object;eukaryote sub organism;animal sub eukaryote;
vertebrate sub animal;reptile sub vertebrate;mammal sub vertebrate;human sub mammal;anatomical-structure sub physical-object;fully-formed-anatomical-structure sub anatomical-structure;tissue sub fully-formed-anatomical-structure;gene-or-genome sub fully-formed-anatomical-structure;
If we then wanted to write a broad query for all physical-objects in the database, we would simply ask:
match $physical-object isa physical-object;
This avoids the need to query for each specific sub-type. On the other hand if we wanted to ask a more narrow question retrieving all tissue types, we would ask:
match $tissue isa tissue;
In BioGrakn Covid, we can ask questions like: what are all genes associated with the disease “SARS”:
match$disease isa disease, has name “SARS”;$gene isa gene;(therapeutic: $drug, associated-gene: $gene) isa gene-disease-association;
With Grakn, organisations can eliminate data silos, consolidating data into a single database that acts as a source of truth for your organisation. This leads to the democratisation of access to data, empowering business analysts, product owners, marketers, and others to draw insights for their business units. This also reduces the time spent on managing infrastructure and enables greater data access through enhanced data governance, while maintaining complete data provenance.
The Future of Biomedical Research Made Possible by Grakn
Medical research is extremely difficult and involves a lot of unknowns. Grakn is able to accelerate much of this process by making it easier to uncover and infer hidden relationships in our data.
Through the strong type system, type hierarchies and hyper relations, Grakn’s schema language allows a level of expressivity required to model biomedical data for any application; and because of this flexible and expressive language (Graql), costly and complicated schema re-designs are no longer needed. This ability to extend your schema, quickly and efficiently, is paramount to accelerating biomedical research.
The incredible proliferation of biomedical data across the life sciences has created a need for new ways to leverage such data to accelerate drug discovery operations. Our technology has opened the pathway to the creation of a true knowledge representation system. This knowledge representation system, Grakn’s type system, abstracts away many of the complexities innate to working with complex data. Coupled with a native automated reasoning engine, we’re now able to uncover those critical insights humanity has not been able to find until now, with Grakn.
Stay tuned for the forthcoming release of Grakn 2.0 Cluster — our commercial product that enables massive scalability, high availability, and orchestration mechanisms which biotech and pharmaceutical companies require.