The ego graph for Tensorflow — radius 22

The Google ‘vs’ Trick

How ego graphs can help you learn about AI, play chess, eat healthier, buy a dog and find love.

David Foster
7 min readJun 20, 2020

--

Have you ever found yourself Googling something followed by ‘ vs ’ to see the autocomplete suggestions that are a bit like the thing you wanted to find out about?

Yeah, me too.

Turns out it’s a thing.

There are 3 reasons why this trick works really well if you want to understand a technology, product or concept better:

  1. The best way to learn something new is to find out how similar or different it is to things you already know — you might recognise things on the returned list and think ‘ah so it’s like THAT’.
  2. It’s simple. It literally takes a few seconds.
  3. The ‘ vs ’ suffix is a strong indication to Google that you want a direct comparison. You can also use ‘ or ’ but it doesn’t quite have the same strength of conviction as ‘ vs ’, so Google returns a range of options that are more likely to be off the mark — see below.
‘bert or ‘ gives you Sesame Street. ‘bert vs ‘ gives you Google BERT

This got me thinking — if you passed the autocompleted terms from a Google ‘vs’ search back into another ‘vs’ search and kept going, you’d end up with a pretty graph network of terms that are linked.

Something a bit like this…

The ego graph for BERT — radius 25

This is really useful technique for creating a mental ‘map’ of technologies, products or ideas and how they relate to each other.

Here’s how to build your own…

Automating the Google ‘vs’ trick

There’s a URL you can use to return the autocomplete suggestions in XML format. It doesn’t look very official so probably not a good idea to spam it with tons of requests.

http://suggestqueries.google.com/complete/search?&output=toolbar&gl=us&hl=en&q=<search_term>

Here, output=toolbar returns the response as XML, gl=us gives the country, hl=en provides the language and q=<search_term> is what you want to autocomplete.

Standard two letter country codes and language codes can be used as values for gl and hl respectively.

So let’s pick a starting term — say, tensorflow .

The first step is to hit the URL with q=tensorflow%20vs%20 .

http://suggestqueries.google.com/complete/search?&output=toolbar&gl=us&hl=en&q=tensorflow%20vs%20

OK I see XML — now what?

Now we need to apply a series of criteria against each autocomplete suggestion in turn, to judge if it is kept or is discarded:

The criteria I use are:

The term shouldn’t contain the search term (i.e. tensorflow)

The term shouldn’t contain any previously accepted terms (e.g. pytorch)

The term shouldn’t contain multiple ‘vs’

Once you’ve found 5 suitable terms, discard the rest

This is just one way of ‘cleaning’ the returned list of suggestions. I also sometimes find it useful to only include terms with exactly one word, but it depends on the use case.

So using this set of criteria we obtain the following five weighted connections:

Then we keep going!

We then feed these five terms into the autosuggestion API followed by ‘ vs ’ and again store the top 5 filtered connections.

And so on and so on — expanding the words in the target column that haven’t yet been explored.

Do this enough times and you end up with a big table of weighted edges — perfect for visualising in a graph…

Ego Graphs

The graph network at the top of this article is what’s known as an ego graph for tensorflow. An ego graph is the graph of all nodes that are less than a certain distance from the tensorflow node.

But wait — how do we define distance?

Let’s first see the graph again:

The ego graph for Tensorflow — radius 22

The weight of the edge from term A and term B we already know — it’s the ranking from 1 to 5 from the autosuggestion list. To make the graph undirected, we can just sum the weights in both directions (i.e. A➡B and B➡A, if it exists), to get an edge weight between 1 and 10.

The distance of each edge is then simply 11-weight. We choose 11 because the maximum weight of an edge is 10 (if both terms appear at the top of each others autosuggestion list), so this definition gives a minimum distance of 1 between terms.

The size and colour of each node is determined by the count of edges where it is the target (that is, the number of times it appears as the autosuggestion). So, the larger the node, the more important the concept.

The ego graph above has a radius of 22. That means that you can reach each of the terms within a distance of 22, starting from the tensorflow node. Let’s see what happens if we increase the radius of the graph to 50:

The ego graph for Tensorflow — radius 50

Pretty cool — this graph contains most of the established technologies that an AI engineer would need to know about, clustered in a logical way.

All from one keyword.

How do you draw the pretty graphs?

I used a neat online tool called Flourish.

It lets you build network graphs and other charts really easily in a simple-to-use GUI — definitely worth checking out.

How do you build an ego graph with given radius?

You use the networkxPython package, which has a handy function called ego_graph . You specify the radius as as input parameter to the function.

I also use a second function — k_edge_subgraphs to remove some of the terms that go off on a tangent.

For example, storm is an open source, distributed realtime computation system. storm is also a character in the Marvel universe — if you type ‘storm vs ’ into Google, guess which one dominates?

The k_edge_subgraphs function finds groups of nodes that cannot be separated by less than or equal to k cuts — k=2 or k=3tends to work well. Only keeping the subgraph thattensorflow belongs to ensures that we stay close to home and don’t literally wander off into the realms of fantasy.

Using ego graphs to organise your life

Let’s move on from the Tensorflow example and take a look at another ego-chart — this time, for my other love … the Ruy Lopez chess opening.

Learn chess openings!

It’s pretty neat that this trick can very quickly get you a graph of most of the most common opening ideas, for you to organise you research.

OK, now for some fun stuff.

Eat healthier!

Mmmm kale. Everyone loves kale.

But what if you fancy a change from delicious, delicious kale? Kale ego graph to the rescue!

The ego graph for ‘kale’, radius 25

Buy a dog!

So many dogs, so little time. Need one. But which one?

The ego graph for ‘poodle’ — radius 18

Find love!

Dog / kale didn’t do the trick? Need a partner instead? There’s a small, very self-contained ego graph for that.

The ego graph for ‘coffee meets bagel’ — radius 18.

Dating apps didn’t work?

Watch a series with a tub of kale (or newly discovered rucola) flavoured ice cream instead. If you like The Office (UK obviously), you might also like…

The ego graph for ‘The Office’ (UK) — radius 25

Summary

That’s the end of my adventures into the world of ego graphs and the Google ‘vs’ trick for now.

I hope it helps you in some small way to find love, a labrador and lettuce.

If you’ve enjoyed this post, please do leave some claps 👏👏👏👏 — I’d really appreciate it :)

Also, if you have any ideas for how this could be extended, I’d love to hear about it — leave your thoughts in the comments below!

Applied Data Science Partners is a London based consultancy that implements end-to-end data science solutions for businesses, delivering measurable value. If you’re looking to do more with your data, please get in touch via our website. Follow us on LinkedIn for more AI and data science stories!

--

--

David Foster

Author of the Generative Deep Learning book :: Founding Partner of Applied Data Science Partners