Is Google Really Unrivaled When It Comes to Search?
Google Knowledge Graph is one of the interesting feature that I like most when Googling. They announced this feature a fews years ago and received great popularity boost. They’ve proved true the fact that that Google has unrivaled data driven power when it comes down to search.
Let’s analyze this feature from another point of view.
When you query “Los Angeles Lakers”, you get a descriptive pane on the right hand side about the entity in your query.
How about introducing an additional term, say “history”, into the query, “Los Angeles Lakers”.
What’s that ?!? The entity “Los Angeles Lakers” embedded in the second query couldn’t be extracted by Google Knowledge Graph! May Google’s power on search be questionable?? Seemingly, yes :)
I had a chance to ask for it to a Googler who is specialized in Google search in the Q&A session of a webinar below.
(You can find the question along with the answer in (35' 39") of the recording)
The answer of the speaker was briefly that Google might have assumed that the information need of the user was not “Los Angeles Lakers” itself anymore but “historical information” of Los Angeles Lakers. I personally consider the information regarding “Los Angeles Lakers” itself is also useful even if I added additional term “history” into the query “Los Angeles Lakers”. Hence, I can personally claim that Google fails to capture my information need due to its wrong assumption.
The correct technical answer is actually lies in the underlying algorithm which is frequently used in information retrieval, namely “intersecting posting lists of the query terms”.
The majority of the search engines in the industry including Google returns documents that contains “all” of the query terms. We can understand it when we jump into the advanced search page after typing our query as shown on the left hand side.
There are some alternative well known practices like returning documents containing “any” or “none” the query terms. However, none of them meets the need for “extracting entities out of the query terms” which is indeed what we need when to show the entity pane of “Los Angeles Laker” for a query of “Los Angeles Lakers history”.
I have been developing a special search engine particularly addressing this need. The concrete example to use this type of search engine is to extract terrorism entities out of unstructured free texts, which may be a news article, twitter feed or the underlying financial messages when you send money internationally.
Here is a sample result page for a sample query “Saddam Hossein bomb”
Below is the corresponding result set from Google showing that it fails to present the entity “Saddam Hussein” from its Knowledge Graph due to the additional term “bomb”.
Anyone who is willing to join me for developing such search engine is more than welcome :)