A deep dive into deep tech — for execs

Mihai Raulea
Neo4j Developer Blog
5 min readMar 1, 2019

Graph databases are powerful because they allow for the exploration and analysis of the connections in the data.

For SQL, it’s impossible to answer in real-time to a query with 5–7 JOINS. Document-based databases(think Mongo and Couch) lock in place the connections between the data; once you establish a relationship between the objects(nesting), you would have to go through 2 entire datasets to uncover a new relationship between 2 objects.

For graphs, following an edge(relationship) comes with almost zero cost, which opens up a world of possibilities.

Deep tech is hard to explain, and graph databases are no exception.

Everything in tech has a language of its own and for good reason. Have you tried having a conversation in a foreign language, with a native, when all you knew were 200 words?

Explaining deep tech is more than that; it’s not only that one can’t use a word that abstracts what would be put in tens or hundreds of other words; it’s also the fact that the implications and ramifications of one concept are huge. After all, every programmer uses a maximum of 20 or so basic language constructs (declaring variables, executing an instruction several times, based on a condition, etc)— but look at how diverse the tech world is.

Every flavor of storing data out there does so by using rows and columns; something quite similar to Excel spreadsheets. A paradigm that makes relationships between data as important as the data itself exists and the implications are huge.

Without graph databases, we wouldn’t have Google; its search algorithm runs on top of a graph. Facebook uses a graph internally to store the users and the relationships between them and it’s of paramount importance when suggesting new friends and events. But what is a graph?

There you go! That’s a graph. Can you draw that?

It labels the nodes, from 1 to 6. It tells us that 1 is connected to 2 and 5. That 4 is connected to 6. That’s it. Thank you!

I don’t know the vertical you’re in; but let’s assume you know and use social media(Facebook, LinkedIn, etc).

Let’s have a look at how you would model a network of people based on who knows who.

User 4 knows User 5. User 7 knows user 6. Ignore the arrow direction, for now.

Here’s how one would model a network of friends that can author posts, and like each other’s posts. I bet you can now read it on your own. Users are not labeled, in the below. Take the User in the center; how many friends does he have? How many posts has he written, and how many has he liked?

Now, on to some serious business. Looking at the network, who would you contact to post something about your business? Which of the users is most connected to other users?

If only friends can see each other’s posts, and some users meet in person and show posts from their network to each other, what’s the probability of two unconnected users seeing each other’s posts? Just joking.

As stated earlier, the implications of this easy to understand tech are remarkable. I won’t bore you with my work on the core of Neo4j or satellite projects — but here are some high-level examples of how I brought ROI as a consultant and even made the use-cases possible.

Case study 1 — Adventia Prevention(bioinformatics)

Adventia is a personalized medicine tool that suggests treatments. Doctors use it to make better and more informed decisions for their patients.

Each doctor has a number of patients, and each patient has blood work data, a history of previous disease and physical attributes(such as age, height, weight). Among other things I did for Adventia(built an interpreter and execution engine for a flavor of a programming language that (even) doctors can write, implemented Bayes probabilities on top of the graph), i made sure their queries were possible(SQL systems wouldn’t support them because of the huge amount of data that needed linking together) and enabled them to ship on time(writing SQL queries for our types of questions would have been too specific, and required several individual queries where graphs could do it with one, more abstract query).

All in all, it took 3 months to reshape the future of Adventia. In the end, I made some of their very needed use cases, possible and trained their team to use graph databases — after all this, their team velocity increased. ROI for the win!

Case study 2 — Atlas, or how to put the world map in a graph

Have you ever wondered what a graph of all the roads in the world would look like? What kind of amazing insights one would be able to uncover if such a graph would be put in a Neo4j database? Amazing information can also be unveiled if one could extract the most connected cities in Europe, and it would definitely come in handy for a vacation planning platform.

I have published a dedicated blog post about this with the leading graph database provider, Neo4j. Find it here.

Case study 3 — Blue Air, and finding (definite)similarities in web pages

Have you ever looked at how HTML looks? It’s a hierarchical system. My client was employing AI to determine if too web pages are alike, by looking at the source code. The problem? AI works with a degree of uncertainty(heuristics). Graphs can yield deterministic answers.

Not only did my client save on computing power, but the system ran smoother after the upgrade, with zero false positives.

It seems this is a trend to follow; in his book, Connected, Dr. James Fowler argues studying relationships is more important than the data contained in the nodes.

Who else is using graphs?

Conclusion

I believe there is a huge opportunity in the graph space, and there is a lot of value to be created by educating yourself about the space and using graph databases. If you read so far and want to know more, I would be happy to discuss how it could be used in your company. Book a 15-minute call now!

--

--