Strongly Typed Data for Machine Learning

Here we’ll dig in and look at how Strong Typing can benefit Machine Learning.

James Fletcher
Sep 24 · 9 min read

What is strong typing?

ML problems solvable with strong typing

The flow of data from tables to model. We’d like to avoid flat feature data.

Only ND-arrays fit into ML pipelines

Data lacks context and consistency

How does strong typing help?

Inter-related tabular data is really a graph

Transforming tabular data with inter-related columns into a graph with an ETL script.

Strong typing adds context throughout

Graph ML

Creating a TypeDB model

Nodes and edges are modelled as entities (rectangles) and relations (diamonds), this may not be a simple mapping, as we can see here the edges on the left are each actually only a role in a relation (diamond) on the right.
Properties of a node are modelled as attributes. One-to-many properties are trivial to model.

From DB queries to ML input

Graph Embedding

Type Embedding

In this example we find motorbike has index 3, so that points us to a particular index of the embedding space built to yield the embedding for the motorbike type.

Future Work for Type Embedding

The Pipeline



Creators of TypeDB and TypeQL