Just as electricity transformed every major industry starting about 100 years ago, AI is now poised to do the same. Several large tech companies have built AI divisions, and started transforming themselves with AI. But in the next few years, companies of all sizes and across all industries will realize that they too must be part of this AI-powered future.
So claims Andrew Ng, the polymath who has brought deep learning to the masses. Ng has served as Baidu’s chief scientist, head of Google Brain, professor at Stanford and founder of Coursera, where he now publicly distributes coursework that helps people worldwide learn AI. Electricity was harnessed by a niche few, but once people began to realize its enormous potential, it very quickly became indispensable in all walks of life.
Grakn is, of course, the database for AI, and so we too have been working hard to integrate our product with the world of deep learning.
Artificial neural networks — the ones that run on anything from our MacBook Pros to the most powerful GPU clusters around — are among the most widely applicable learning techniques out there. And due to the universality of neural nets, you can theoretically approximate any function with a precise enough model, which is pretty crazy if you think about it.
Neural networks are trained, like any machine learning algorithm, with a set of training data, and are evaluated on a separate set of test data. Training data creates, or “trains”, the weights and biases of the network, which are mathematical formulations of the inputs and outputs of the network’s neurons. We start off with a randomly generated vector of weights and biases, and with enough data and time arrive at a set of parameters that can accurately predict complex inputs.
To get a better sense of how we can utilize Grakn in tandem with neural nets, we should first briefly review some of Grakn’s design patterns as well as explain the project that forms the basis of this blog post.
Code for this project can be found below:
The particular graphs used in Grakn are often referred to as “knowledge bases”, which you can read more about here. A knowledge base is a graphical representation of known information; relationships, hierarchies, etc. These can be queried to see whether an edge exists between 2 given nodes either explicitly in the graph itself or implicitly, through schemas. Expanding a graph expands the knowledge base it is associated with, and thus the ground truth. One particular type of relational system that Grakn handles particularly well is a hierarchy, in which relationship are uni-directional and indicate some underlying vertical ordering of concepts.
In the world of artificial intelligence, learning improvements are the holy grail. If you can make a model learn more accurately, or more quickly, or more precisely, you can open up the domain you’re working in to a whole new set of use cases. However, every dataset will always have incomplete information; this can produce flawed insights. Thus, if you can improve the quality of data in your database, you will improve any of the machine learning that is built on top of it. This can be achieved through “Knowledge Base completion”, which is the concept of expanding existing knowledge bases with new or improved information through some established rules of inference (and by adding more data, of course). Previously, KBC was attempted by making reasoned human inductions about the types of relationships that should hold. Of course, human inductions can only go so far.
Using deep learning to facilitate knowledge base completion is, then, a natural approach. With the existing knowledge base as a training set, you can program the neural net as a binary classifier to find likely relationships and then insert them back into the graph. It turns out that Grakn can do all of the legwork as a knowledge base!
This paper presents a neural network for knowledge base completion. It is built to accommodate two separate databases — WordNet and Freebase. For the purposes of this project, we will only concern ourselves with WordNet.
WordNet, a text corpus produced by Princeton University, is a lexical database of English and contains entities (words with unique identifiers) as well as relations (ways of describing the lexical relationship between two related entities). The goal of the neural network is to be able to predict whether a given entity e1 is related to an entity e2 by way of relation r. For instance, if
e_1 = __atlantic_1
e_2 = __north_sea_1
r = _has_part
then we are asking whether the Atlantic → has part → North Sea, which is true. As another example, we could ask regression curve → part of → shell bean — clearly false.
There are 11 relations in the dataset and nearly 40,000 different entities. The neural tensor network is like most neural nets in that it trains a set of weights and a bias; one major distinction, however, is an extra tensor layer that multiplicatively relates entities and relations.
If the math behind this isn’t clear, don’t worry. All we need to know is that the neural tensor network is able to handle multiple types of entity pairs per instantiation of entity-relation-entity (up to k slices, as you can see in the equation). You can refer to the paper I linked above for more specifics.
As I mentioned above, one aspect of this data that makes it particularly amenable to Grakn manipulation is its hierarchical nature. The relations indicate increasing or decreasing levels of specificity, such that if A →r →B and B →r →C, then A →r →C. Grakn models hierarchies like this very well.
Once the network has been trained, we can do two things with the results:
- Classify an input e1-r-e2 triplet as correct or incorrect based on tuned thresholds.
- Determine the likeliest second entity for a given e1-r (first entity — relation) combination.
Of course, the classification of #1 only requires one pass-through of the neural net, whereas #2 requires as many passes as there are entities, since we are calculating a likelihood for each one.
This presents a big challenge — verifying a triplet is simple enough, but improving a knowledge base by adding the most relevant relationships is a lot more computationally expensive. I will explain below how I dealt with that.
This project was implemented in two parts. The first was the actual neural tensor network, as explained above, which produces the weights for each embedding and provides the initial set of predictions for the test data. The second part was the Grakn base, which stores all the entity-entity relationships as well as the rules of inference, and which can be used to check for a graph connection between two entities.
The neural network is a pure NumPy/SciPy implementation of the knowledge base completion paper. The base of the neural network code, not written by me, can be found here. Because this implementation does not make use of a deep learning library like TensorFlow, and is set up to run on CPU, not GPU, the time it takes to train the neural network is comparatively slow. I have optimized it in my implementation but it still takes several hours to complete; luckily, the neural net only needs to be run once before we “plug it in” to Grakn. Of course, in a production environment, you could make use of TensorFlow to improve runtime.
On the Grakn side, the flow goes something like this:
- Build the ontology and rule set, and insert the relation data from the training set into Grakn.
- After the neural net has been run (and an accuracy has been calculated), loop through the test set again, this time checking Grakn for relationships; use any inferred relationships to come up with a modified accuracy.
- Return to the output of the neural net, choose a subset of x entity-relation-entity triplets that the neural network gives a high score to, and insert these triplets into Grakn. Note, of course, that some of these triplets might be false!
- Repeat steps 2 & 3 a set number of times, each time calculating an updated accuracy.
Our goals are two-fold:
- Maintain Grakn as a versatile and robust knowledge base even as additional (possibly false) relationships are added to it.
- See if the accuracy of the neural net classifier is improved with Grakn inferences!
What exactly is happening here? In Step 1 of the flow, by building a schema and a set of inference rules specific to the project, we can augment the power of the neural network by letting Grakn check for relationships that the network might not have caught. Having a good number of reliable inference rules is critical to making the most of Grakn; I explain in the results section what happens when a priori inferences are few and far between.
To improve accuracy, we have to improve either our Type I or our Type II error rate. In an incomplete knowledge base, it is difficult to check for false positives (Type I) — a graph claiming that a relationship does not exist when the neural network claims otherwise might just be down to a lack of information in the knowledge base itself. Therefore, in step 2, we look for Type II errors — false negatives — by looping through every item in the test set that the neural network classified as false, and trying to find a relationship that would reject that classification.
The augmenting happens in Step 3, which is where we return to the obstacle I mentioned in the paragraphs above the Implementation header. We want to add the most likely WordNet triplets in every iteration of the algorithm without explicitly checking every e1-r-e2 combination (this would be 40 000 * 40 000 * 11 checks, since there are 40 000 entities in WordNet). Instead, we can choose x entities at random for each e1 entity, calculate triplet scores across each of the 11 relations for these x entities, and choose the most confident one.
As long as x is large enough that we can be fairly sure that each addition we make has a high likelihood of being true, we can make additions to the graph in a reasonable amount of time without sacrificing too much accuracy. This is an important tradeoff to consider though; if we make additions to the knowledge base that we are not completely positive are true, we risk contaminating the knowledge base with incorrect information. One question we will want to answer is how the ratio of correct/incorrect Grakn inferences changes over time.
There were 2 main takeaways from running this program:
- Actual accuracy changes were minimal, since the inferences rules simply weren’t finding many graph connections
- However, ratio of correct / incorrect inferences was between 2/1 and 4/1 throughout, indicating that the knowledge base was absorbing correct information faster than incorrect information
The first bullet point speaks to something I discussed earlier, which is that Grakn is at its most useful when there are many inference rules that can quickly produce new inferred edges from any entity that we insert. With WordNet, there were few rules that I was able to objectively verify. For example, you might have A → subordinate instance of → B, and C → member meronym → B. These two relations indicate similar (but slightly different relationships), but you cannot say with certainty what sort of relations might hold between A and C. You might think that A → similar to → C or A → part of → C are true statements, but this is not necessarily the case and counterexamples can be found in the test set. The English language is complicated.
The inference rules I did use were pure transitive and reflexive relations — the sort that say if
A → domain region → B and B → domain region → C
A → domain region → C
A → similar to → B
B → similar to → A
I found these rules to be limited in scope, however. Taken alone, they did not connect different relations with each other and only hierarchically extended existing relationships. Even so, they are a good starting point for lexical inferences.
Despite this, I found that the ability of the graph to improve the accuracy of the system was consistent. Depending on various random factors, every time I ran the program I got anywhere between a 2/1 to a 4/1 ratio of correct to incorrect inferences from neural network-inserted relations. Although some graph insertions produce type I errors, most reversed what had originally been type II errors. This is very promising, because it tells us that false additions to a Grakn knowledge base don’t wildly propagate through the inference chain.
Below are some screencaps from a) before any re-inserts (just the inference rules), b) after 1 re-insert, and c) after 20 re-inserts.
The net accuracy gain was approximately 0.21%, meaning that we got rid of about 1% of the total error.
With a more inter-connected graph, you could see a much bigger improvement than that. The point is that it is doable. Consistently, over many runs, the number of correct inferences dwarfed the number of incorrect inferences.
As we’ve seen above, Grakn is a natural structure for modeling hierarchical knowledge bases. Training data can be fed into the database to create relationships between entities. The graph can be expanded by re-inserting likely entities, as determined by the network of tensors, into the Grakn database. Moreover, Grakn gives a user the ability to scope out potential Type II errors in test data that has been fed through a neural net. It is even possible for Grakn to improve the program’s prediction accuracy in such situations.
So this is one application of deep learning principles to Grakn — hierarchical relationship matching. Train a neural net, and then use Grakn inferences on the test data to iteratively improve the accuracy of your predictions. But there are of course other ways that Grakn can improve the efficiency and operation of neural networks.
For instance, in n-ary classification or regression settings, you could use Grakn analytics to intelligently initialize the neural network’s vector spaces, rather than simply using random initializations. Grakn relationships and inferences could potentially give you advance information about ground truths that could make your network learn more quickly.
Another idea, which we will be tackling, is to use the output of the neural net to build inference rules for your graph. Using only neural pathways with a very high confidence, you could add not only entities and relationships to a Grakn graph, but even the rules themselves! This would be especially useful in situations where explicit inference rules are scarce, such as in the project you’ve just read about.
The potential for applying Grakn to deep learning principles is limitless, but of course you have to be smart about what you do, and you have to understand the purpose of the technologies you are using. I hope this post gave you some inspiration, and don’t hesitate to reach out with questions!
If you enjoyed this article, please hit the clap button below, so others can find it, too. Please get in touch if you’ve any questions or comments, either below, via our Community Slack channel, or via our discussion forum.
Find out more from https://grakn.ai.