How to Succeed with Your First Grakn Project

Mason Powers
Feb 12 · 9 min read

“The future is already here — it’s just not very evenly distributed”. — William Gibson

Grakn is a distributed knowledge graph: a logical database to organise large and complex networks of data as one body of knowledge. Rooted in Knowledge Representation and Automated Reasoning, Grakn provides the knowledge foundation for cognitive and intelligent (e.g. Artificial Intelligence) systems, by providing an intuitive language (Graql) for modelling, transactions, and analytics.

Grakn represents the future of data management and analysis — it is our privilege and duty to help distribute this future more evenly. With this in mind, we have created this guide to help you plan your exploration. This is intended for CTOs, enterprise architects, or developers, from organisations who are committed to developing innovative applications leveraging groundbreaking technologies that deliver value.

Using Grakn in Your Application

If you’re starting to think of using Grakn for your application, you’ll find that the Grakn community offers a wide variety of open-source resources to help you get up to speed. After you have identified a use case for Grakn, you can start thinking about how Grakn fits within your application’s larger architecture. Does it connect to your NLP and Machine Learning pipelines? What ETL/ELT pipelines will you need?

This will help you to create a pilot to demonstrate the value of Grakn, whether it be small or large in scope. If this is successful, you can move into the development of your application and scale up your engineering resources on Grakn.

You should then start looking at deployment options — e.g. on premise, cloud or hybrid? Once everything is in place, you can move your application into production and reap the benefits of all your hard work! An additional step may involve scaling your deployment, if necessary, by adding more CPUs or nodes in your Grakn cluster.

This whole process can be broken down into seven distinct steps. Below we share detailed advice on the best practices we have observed for a successful initial deployment of Grakn.

  1. Evaluation Phase
  2. Architecture Design
  3. Pilot Project
  4. Development
  5. Deployment
  6. Production
  7. Scaling

Evaluation Phase

As you begin your journey with Grakn, make sure to engage the Grakn Wonderland — our spirited and supportive open-source community. For those Do-It-Yourself types, remember to join our Discord server and witness for yourself. In the screenshot below, you’ll notice our core engineering team diving into community questions and helping community members, all while continuing to build on future capabilities and releases of Grakn:

Screenshot of Grakn’s Discord Wonderland — so many helpful engineers here!

We encourage you to take advantage of the myriad resources our Community Leader, Daniel Crowe, has graciously put together on our “Inspiration Hub”. You’ll find videos and presentations on Grakn’s Purpose and History, Comparing SQL to Graql, Modelling Rules for Logical Reasoning in Grakn, How Can We Complete a Knowledge Graph, and much more.

If you didn’t approach Grakn with a specific application in mind, now is the time you should be thinking of potential use cases that are directly tied to a measurable, or critical business value (kpi, metric, key initiative, etc.) and what success would look like if you were able to develop a solution. Who needs to be involved? Circle your wagons and get them involved early.

Architecture Design: Consider Your Current Data Infrastructure and Where Grakn Fits

Regardless of your infrastructure blueprint; whether it be heavy on the OLAP side (“Modern Business Intelligence”), on the other end with OLTP and emerging components of “AI and ML” stacks, or both, with the “Multimodal Data Processing” approach — if generating new insights from data is a goal for your business, Grakn can fulfil that requirement.

I have seen great success when Grakn functions as the foundational database to organise vast and complex systems, data, and applications; becoming the centralised knowledge base that ingests data and feeds all upstream applications. In this way, Grakn becomes the unified representation of knowledge in a system.

The screenshot below shows how this might work relative to other application components. Working as the central knowledge base, you can leverage Grakn as the unified representation for your NLP, knowledge management/acquisition, or machine learning agents.

Reference Grakn Application Architecture

We have also seen great success within those organisations whose primary business value is derived from analysing and generating new insights from existing data — whether it be internal data, streams of data, external/public data, or commissioned data — Grakn can be connected to these feeds through any of our client drivers (native java, node.js, and python), or from powerful open-source tools like the one built by one of our customers (Bayer) who graciously made it available to the community: https://github.com/bayer-science-for-a-better-life/grami.

While you identify your initial use case and where you will position Grakn within your architecture, take into consideration which version best suits your needs. Grakn offers both an open-source product available under AGPL v3, and a commercial product in Grakn Cluster. Ask yourself: does my organisation require security? Or high-availability? Or guaranteed support with a SLA in place? If yes to any of these, then you need to evaluate our commercial license for Grakn Cluster.

Pilot Project — Finding Proof of Value

Now is the time for you and your team to narrow down your scope to a specific use case. Here, you should endeavour to clearly define a use case and determine how Grakn will fit within the larger architecture, and the workflow that will lead to your desired outcome. What will success look like for your project? Some of these success criteria will include certain insights that you want Grakn to generate.

Once these factors are determined, you should then work backwards to structure your pilot. Let’s paint a quick picture to better understand what we mean by “work backwards”:

You are the CTO of a pharmaceutical or biotech company.

Business Goal: develop landmark medicines faster and with higher efficacy than your competitors.

Strategy: accelerate the discovery of new therapeutics by 30% (from 10 years to 7 years) which will positively impact your top/bottom lines and value to stakeholders/investors.

Considerations:

  • What are the different ways in which AI and data could help accelerate this process?
  • What are the obstacles to accelerating the adoption of novel technologies?

Potential Challenges:

  • Data management bottle necks
  • Legacy information systems
  • Isolated architectures
  • Heterogeneous data formats

These challenges have the potential to make it very difficult to contextualise new insights generated in the organisation. There are a few more questions you will want to answer which will further scope your initial pilot with Grakn:

  • What data sources are in play?
  • Where are they coming from and going?
  • What types of questions, analyses, and models are being implemented to find and develop new therapeutics?

These questions will help to inform the types of insights that you intend to generate with Grakn, and the data it needs to represent. We can then work backwards to help us define what success looks like.

Focus on listing five to ten insights that are important to the business.

In the context of drug discovery, a question you could consider asking:

Give me all the drugs that are related to the protein SIRT1, expressed in mice, and all the papers in which these are mentioned?

These questions not only help to define success, they also begin to identify the entities, relations and attributes, as well as the datasets that will be needed in the model. This will be key to writing a well modelled schema.

Finally, as part of your pilot, you may wish to benchmark Grakn’s performance. While we’ll be publishing our own benchmarks soon, ultimately, any team should do their own benchmarking to get most comfortable. In so doing, make sure to take into account whether your application will be read or write intensive, the different types of relations your model will have, and to what extent Grakn’s automated reasoner is being leveraged.

SQL query on the left vs. Graql query on the right — same question.

Development Phase

During your development phase, you will find that Grakn’s flexible data model allows you to iterate quickly on your model/schema. Take time to iterate and think big picture — what does your “world”, your domain look like, how should it be modelled? If any of your engineering team requires help, the Grakn open source community is always there to troubleshoot. And for those teams who have budgets available to accelerate their development, we also have Development Support packages to ensure you’re not missing insights or approaches to Grakn you may have yet to grasp.

Development Support packages include our Grakn Academy — 3 day intensive workshop with one of our Principal Engineers, dedicated private communication channels (both on Discord and through our private ticketing system), as well as monthly Knowledge Engineering Review (KERs) sessions. These KERs are somewhat flexible in quantity and duration, but at minimum, will be 2–3 hours per month of dedicated technical consultancy and guidance, including hands-on help from our team.

When all goes as planned and you have proven your concept, you now need to begin planning for production — your go-live date. Depending on your scenario, you should consider bringing in additional development support and get as many business owners, division-heads privy to your application development to ensure buy-in from the organisation. This will make the process of moving your application into production as frictionless as possible internally. Resources are limited in organisations of any size. Making sure as many of your colleagues as possible understand the value of your Grakn solution will ease any concerns of resource allocation. Benchmarking results will help your case as well.

Deployment: Hardware/Cloud Environment Needed to Deploy Grakn Successfully

As you move your application into production, you’ll start to deploy to a staging environment. Grakn is cloud agnostic and can run on any public cloud platform: for example AWS, Azure, GCP. The optimum machine choice is one with a balance between CPU and memory that, ultimately, will be dependent on your application’s needs.

Grakn does not have a necessary minimum hardware requirement but we highly recommend at least 8 CPUs (virtual or physical) and 8 GB RAM. Regarding storage — for performance, we suggest SSD persistent disks. It is possible to use HDD disks, but not recommended.

Production Phase — Go-Live!

You are now ready to go live! As you move into production, you will probably be looking to deploy Grakn Cluster. Depending on the type of support you’ve purchased with the license, we’ll ensure it goes as smoothly as possible and ready to support the scale and growth of your Grakn environment.

Scaling Your Grakn Application

As you’ve deployed your application in production successfully, you’re getting more users, generating more value, and need to grow your Grakn Cluster with more machines, CPUs, and up to 25 GB RAM (any more than this is not expected to yield additional performance improvements).

Grakn is at its core, a distributed database designed to scale over a network of computers, allowing you to easily scale up or scale down cluster size with built in tools to automate the orchestration of your cluster. Elastic throughput allows you to scale linearly as new machines are added to your Grakn cluster, without any downtime.

You should also continue sending additional development staff to Grakn Academy trainings and consider creating a Center of Excellence focused on Knowledge Engineering, Symbolic Artificial Intelligence to cultivate your own team of Grakn experts in-house.

— —

Following the steps above will ensure your project will get off the ground and be ready for the next use case or complex problem to tackle. From robots that are navigating potentially dangerous environments, saving lives (and performing incredible choreography), to knowledge graphs that are predicting novel disease targets — we continue to be in awe of the solutions our community and customers build with Graql and Grakn every day.

Our community, colleagues, partners, and customers continue to teach us and push us to improve. We are here to support and disseminate best practices to be successful in your endeavours, and provide opportunities to share your story across the community.

The future is here — and we at Grakn Labs cannot wait to help make your vision of the future a reality.

Find me on Discord: mason.powers#1912, on Twitter: @MasonMontPowers, or email: mason@grakn.ai.

Vaticle

Creators of TypeDB and TypeQL