How to Succeed with Your First TypeDB Project

Published in

Vaticle

9 min readFeb 12, 2021

“The future is already here — it’s just not very evenly distributed”. — William Gibson

TypeDB is a distributed knowledge graph: a logical database to organise large and complex networks of data as one body of knowledge. Rooted in Knowledge Representation and Automated Reasoning, TypeDB provides the knowledge foundation for cognitive and intelligent (e.g. Artificial Intelligence) systems, by providing an intuitive language (TypeQL) for modelling, transactions, and analytics.

TypeDB represents the future of data management and analysis — it is our privilege and duty to help distribute this future more evenly. With this in mind, we have created this guide to help you plan your exploration. This is intended for CTOs, enterprise architects, or developers, from organisations who are committed to developing innovative applications leveraging groundbreaking technologies that deliver value.

Using TypeDB in Your Application

If you’re starting to think of using TypeDB for your application, you’ll find that the TypeDB community offers a wide variety of open-source resources to help you get up to speed. After you have identified a use case for TypeDB, you can start thinking about how TypeDB fits within your application’s larger architecture. Does it connect to your NLP and Machine Learning pipelines? What ETL/ELT pipelines will you need?

This will help you to create a pilot to demonstrate the value of TypeDB, whether it be small or large in scope. If this is successful, you can move into the development of your application and scale up your engineering resources on TypeDB.

You should then start looking at deployment options — e.g. on premise, cloud or hybrid? Once everything is in place, you can move your application into production and reap the benefits of all your hard work! An additional step may involve scaling your deployment, if necessary, by adding more CPUs or nodes in your TypeDB Cluster.

This whole process can be broken down into seven distinct steps. Below we share detailed advice on the best practices we have observed for a successful initial deployment of TypeDB.

Evaluation Phase
Architecture Design
Pilot Project
Development
Deployment
Production
Scaling

Evaluation Phase

As you begin your journey with TypeDB, make sure to engage the TypeDB Wonderland — our spirited and supportive open-source community. For those Do-It-Yourself types, remember to join our Discord server and witness for yourself. In the screenshot below, you’ll notice our core engineering team diving into community questions and helping community members, all while continuing to build on future capabilities and releases of TypeDB:

Screenshot of TypeDB’s Discord Wonderland — so many helpful engineers here!

We encourage you to take advantage of the myriad resources our Community Leader, Daniel Crowe, has graciously put together on our “Inspiration Hub”. You’ll find videos and presentations on TypeDB’s Purpose and History, Comparing SQL to TypeQL, Modelling Rules for Logical Reasoning in TypeDB, How Can We Complete a Knowledge Graph, and much more.

If you didn’t approach TypeDB with a specific application in mind, now is the time you should be thinking of potential use cases that are directly tied to a measurable, or critical business value (kpi, metric, key initiative, etc.) and what success would look like if you were able to develop a solution. Who needs to be involved? Circle your wagons and get them involved early.

Architecture Design: Consider Your Current Data Infrastructure and Where TypeDB Fits

Regardless of your infrastructure blueprint; whether it be heavy on the OLAP side (“Modern Business Intelligence”), on the other end with OLTP and emerging components of “AI and ML” stacks, or both, with the “Multimodal Data Processing” approach — if generating new insights from data is a goal for your business, TypeDB can fulfil that requirement.

I have seen great success when TypeDB functions as the foundational database to organise vast and complex systems, data, and applications; becoming the centralised knowledge base that ingests data and feeds all upstream applications. In this way, TypeDB becomes the unified representation of knowledge in a system.

The screenshot below shows how this might work relative to other application components. Working as the central knowledge base, you can leverage TypeDB as the unified representation for your NLP, knowledge management/acquisition, or machine learning agents.

Reference TypeDB Application Architecture

We have also seen great success within those organisations whose primary business value is derived from analysing and generating new insights from existing data — whether it be internal data, streams of data, external/public data, or commissioned data — TypeDB can be connected to these feeds through any of our client drivers (native java, node.js, and python), or from powerful open-source tools like the one built by one of our customers (Bayer) who graciously made it available to the community: https://github.com/bayer-science-for-a-better-life/grami.

While you identify your initial use case and where you will position TypeDB within your architecture, take into consideration which version best suits your needs. TypeDB offers both an open-source product available under AGPL v3, and a commercial product in TypeDB Cluster. Ask yourself: does my organisation require security? Or high-availability? Or guaranteed support with a SLA in place? If yes to any of these, then you need to evaluate our commercial license for TypeDB Cluster.

Pilot Project — Finding Proof of Value

Now is the time for you and your team to narrow down your scope to a specific use case. Here, you should endeavour to clearly define a use case and determine how TypeDB will fit within the larger architecture, and the workflow that will lead to your desired outcome. What will success look like for your project? Some of these success criteria will include certain insights that you want TypeDB to generate.

Once these factors are determined, you should then work backwards to structure your pilot. Let’s paint a quick picture to better understand what we mean by “work backwards”:

You are the CTO of a pharmaceutical or biotech company.

Business Goal: develop landmark medicines faster and with higher efficacy than your competitors.

Strategy: accelerate the discovery of new therapeutics by 30% (from 10 years to 7 years) which will positively impact your top/bottom lines and value to stakeholders/investors.

Considerations:

What are the different ways in which AI and data could help accelerate this process?
What are the obstacles to accelerating the adoption of novel technologies?

Potential Challenges:

Data management bottle necks
Legacy information systems
Isolated architectures
Heterogeneous data formats

These challenges have the potential to make it very difficult to contextualise new insights generated in the organisation. There are a few more questions you will want to answer which will further scope your initial pilot with TypeDB:

What data sources are in play?
Where are they coming from and going?
What types of questions, analyses, and models are being implemented to find and develop new therapeutics?

These questions will help to inform the types of insights that you intend to generate with TypeDB, and the data it needs to represent. We can then work backwards to help us define what success looks like.

Focus on listing five to ten insights that are important to the business.

In the context of drug discovery, a question you could consider asking:

Give me all the drugs that are related to the protein SIRT1, expressed in mice, and all the papers in which these are mentioned?

These questions not only help to define success, they also begin to identify the entities, relations and attributes, as well as the datasets that will be needed in the model. This will be key to writing a well modelled schema.

Finally, as part of your pilot, you may wish to benchmark TypeDB’s performance. While we’ll be publishing our own benchmarks soon, ultimately, any team should do their own benchmarking to get most comfortable. In so doing, make sure to take into account whether your application will be read or write intensive, the different types of relations your model will have, and to what extent TypeDB’s automated reasoner is being leveraged.

SQL query on the left vs. TypeQL query on the right — same question.

Development Phase

During your development phase, you will find that TypeDB’s flexible data model allows you to iterate quickly on your model/schema. Take time to iterate and think big picture — what does your “world”, your domain look like, how should it be modelled? If any of your engineering team requires help, the TypeDB open source community is always there to troubleshoot. And for those teams who have budgets available to accelerate their development, we also have Development Support packages to ensure you’re not missing insights or approaches to TypeDB you may have yet to grasp.

Development Support packages include our TypeDB Academy — 3 day intensive workshop with one of our Principal Engineers, dedicated private communication channels (both on Discord and through our private ticketing system), as well as monthly Knowledge Engineering Review (KERs) sessions. These KERs are somewhat flexible in quantity and duration, but at minimum, will be 2–3 hours per month of dedicated technical consultancy and guidance, including hands-on help from our team.

When all goes as planned and you have proven your concept, you now need to begin planning for production — your go-live date. Depending on your scenario, you should consider bringing in additional development support and get as many business owners, division-heads privy to your application development to ensure buy-in from the organisation. This will make the process of moving your application into production as frictionless as possible internally. Resources are limited in organisations of any size. Making sure as many of your colleagues as possible understand the value of your TypeDB solution will ease any concerns of resource allocation. Benchmarking results will help your case as well.

Deployment: Hardware/Cloud Environment Needed to Deploy TypeDB Successfully

As you move your application into production, you’ll start to deploy to a staging environment. TypeDB is cloud agnostic and can run on any public cloud platform: for example AWS, Azure, GCP. The optimum machine choice is one with a balance between CPU and memory that, ultimately, will be dependent on your application’s needs.

TypeDB does not have a necessary minimum hardware requirement but we highly recommend at least 8 CPUs (virtual or physical) and 8 GB RAM. Regarding storage — for performance, we suggest SSD persistent disks. It is possible to use HDD disks, but not recommended.

Production Phase — Go-Live!

You are now ready to go live! As you move into production, you will probably be looking to deploy TypeDB Cluster. Depending on the type of support you’ve purchased with the license, we’ll ensure it goes as smoothly as possible and ready to support the scale and growth of your TypeDB environment.

Scaling Your TypeDB Application

As you’ve deployed your application in production successfully, you’re getting more users, generating more value, and need to grow your TypeDB Cluster with more machines, CPUs, and up to 25 GB RAM (any more than this is not expected to yield additional performance improvements).

TypeDB is at its core, a distributed database designed to scale over a network of computers, allowing you to easily scale up or scale down cluster size with built in tools to automate the orchestration of your cluster. Elastic throughput allows you to scale linearly as new machines are added to your TypeDB cluster, without any downtime.

You should also continue sending additional development staff to TypeDB Academy trainings and consider creating a Center of Excellence focused on Knowledge Engineering, Symbolic Artificial Intelligence to cultivate your own team of TypeDB experts in-house.

— —

Following the steps above will ensure your project will get off the ground and be ready for the next use case or complex problem to tackle. From robots that are navigating potentially dangerous environments, saving lives (and performing incredible choreography), to knowledge graphs that are predicting novel disease targets — we continue to be in awe of the solutions our community and customers build with TypeQL and TypeDB every day.

Our community, colleagues, partners, and customers continue to teach us and push us to improve. We are here to support and disseminate best practices to be successful in your endeavours, and provide opportunities to share your story across the community.

The future is here — and we at TypeDB Labs cannot wait to help make your vision of the future a reality.