Artificial Intelligence and the Future of Search
Where are search engines heading in the future?
The algorithms used by current search engines analyze many factors to rank websites, however, the results are far from perfect. The results generated are more in line with an ‘expert system’ rather than an intelligent machine, and at best, this is weak AI. This may be changing, as Paul Bruemmer states, “Such a system will display more precise results faster, as it’s based on semantic technology focused on user intent rather than on search terms.”
Google was founded with the mission to “..organize the world’s information and make it universally accessible and useful”. Google has stayed true to its core business and is today’s dominate search engine, with annual advertising revenue of over $50 billion dollars. The ability of Google to translate user search queries into usable website links makes it an internet powerhouse, processing over 1 billion search requests each day. Founded in 1988, Google has become so integrated in our daily culture that even its name has become a verb, as people “google” information online. [1]
According to comScore’s October 2013 rankings, Google controls 66.9% of the internet search market share, with 12.9 billion explicit core searches. [2] Google’s algorithm to organize the world’s data and provide the most relevant links to any query has been challenged over the years by Microsoft, Yahoo! and others, yet Google is still the number one search engine. The rise of social media has brought other players into the information game, with companies such as Facebook and Yelp focusing on friend’s recommendations and reviews. Bing, Microsoft’s entry in the search engine game, has gained market share – mostly at the expense of Yahoo! – handling 18.1% of search entries in October 2013.
While different search engines and platforms will return different results, at the end of the day what matters is the relevance of the results to the end user, or as Google calls it search quality, with the most useful results to be returned at the top of the page. How this happens is through a variety of mechanisms that rate the results to determine what matters most to the end user.
Returning google search results is a multi step process. First, Google indexes the web, analyzing billions of web pages out there and breaking those pages down into a word index. When a user types a word into the search box, Google queries the index for relevant pages, returning list of thousands, if not millions of potential websites that match the criteria. But scrolling through millions of possible websites trying to find the right, or most relevant, webpage could be a full time job. This is where ranking comes in to play.
Page ranking is where Google decides how important a webpage will be to the person doing the search. Freshness of a page, the location in relationship to the user, keywords (how and where they are used), page loading speed, links (outbound, inbound, affiliate and broken) and many other factors all help determine the order of a webpage in a search query. Additionally, a user’s search history, along with the search history of others users and which pages they than clicked through, are all considered in ranking webpage search results.
Individualized results are returned based on a person’s web browsing behavior. Websites with the most relevant, high quality links get higher rankings, as does those sites which are deemed more authoritative. Google is constantly updating its index, providing real time results based on the most current information available.
Constantly refining and updating the process, Google is always attempting to understand the contextual meaning behind the words. Are the search terms a name, a location or a question? Google is not just matching words, they are matching the context and what those words mean. Google is constantly testing, tweaking and improving, and every search result is a lab test to determine what is working and what can be improved. As the amount of data available grows, what that improvement will look like is still unclear, but to get an idea of where Google, and internet search, is going, we need look no further than one of television’s most successful game shows, Jeopardy.
In 2011, IBM’s Watson, an artificial intelligence computer, was filled with four terabytes of information, including all of Wikipedia, a dictionary, the bible and the Internet Movie Database. Going head to head against two of Jeopardy’s all time winners, the stand alone computer Watson, finished the game with a victory.
Watson is a cognitive system that understands natural language and provides answers based on the probability of correctness. A human contestant is able to buzz-in as they search their mind for the answer, while Watson must parse each question to determine how words relate to each other before searching for the correct the answer. Of course the ability of Watson to scan millions of pages in seconds makes it quicker than the human who might still be thinking of the correct answer before buzzing-in. Yet even with access to all of that information, Watson answered the final Jeopardy question incorrectly.
Watson derived its answers through a multistep process that begins with loading the data onto the system. Once a question is asked, Watson considers the various possible meanings and determines the Lexical Answer Type (LAT), which is the mechanism to look for key words to determine the context of the question. For example, if a question about Dallas included the word “television”, Watson would know what the answer should relate to the TV show and not the city.
Watson may come up with as many as 250 potential answers, which are then pared down through a filtering process that looks at how likely the answer is the correct LAT. Additional scoring and ratings are then used to produce a final set of results, to which a confidence level is assigned based on the likelihood of being the correct answer, and the answer with the highest ranking is given.
The next step for Watson is not in game shows but in areas such as internet search, finance and healthcare, to name a few. Designed as a natural language processing system, the future of computers like Watson is to provide answers to questions asked in normal language. According to IBM, “The goal is to have computers start to interact in natural human terms across a range of applications and processes, understanding the questions that humans ask and providing answers that humans can understand and justify.”[3]
In developing the ability to answer normal language queries “…more than 100 different techniques are used to analyze natural language, identify sources, find and generate hypotheses, find and score evidence, and merge and rank hypotheses.”[4] Instead of programming every possible variable into a system, Artificial Intelligence is designed to allow a system train itself to provide better answers.
Watson reads and understands language on human terms. Along with providing an answer, Watson will generate a level of much confidence about that answer. With each interaction Watson learns and with each experience Watson becomes smarter and faster. Based on the outcome of each answer, Watson can analyze patterns, and by analyzing patterns over time, learn and make predictions of correctness based on probabilities. Much like the human brain (but still not exactly, indeed we don’t entirely understand how the brain works), Watson is designed to learn based on prior experience and teach itself new tasks.
IBM has now taken Watson to the next level, entering the world of healthcare by deploying the system to the Memorial Sloan-Kettering Cancer Center. The goal of Watson is to provide a single, correct answer to any medical question. Millions of pages of medical texts and patient records have been fed into the system, along with tens of thousands of hours of staff training, to teach Watson to give better correct answers. The result is the interactive Care Insights for Oncology, a cloud based system focused on providing state of the art treatment for cancer patients. This is just one example of how a system such as Watson can become an expert system with decision making ability in a specific field, while continually learning and improving over time.
As for now, Watson can be considered a great support system to specialists, providing information about possible solutions and their probability to be correct. Watson, thanks to providing a history of how the final results are determined, is a valuable tool which allows people to create their own conclusions based on Watson’s findings.
Matching specific patient data with a huge database of information will allow the most current medical advances and techniques to be part of a patient’s treatment process. Doctors cannot keep up with all the new information released every week, but with this system they will have access to the most up to date information available. As new information is provided, Watson will update the results, giving a confidence level for each recommendation. WellPoint (the largest managed health care, for-profit company) also has their own Watson system, doing for the financial side of healthcare what Memorial Sloan-Kettering is doing for the treatment side, determining financially which is the best course of treatment.
The ability to answer natural language queries is where search engines are headed in the future. How exactly that will work is unknown, but to get an idea of the possibilities we can look into the world of science fiction. After all, the communicators and Personal Access Data Devices from Star Trek of yesterday are the cell phones and iPads of today.
In the 2014 movie Robocop, search questions were asked in natural language and the results instantly displayed, with the most relevant information visible in the foreground. As more information became available, results were instantly updated.
While this is a movie, it was reported in early 2014 that Google would buy the artificial intelligence company DeepMind. There is very little information available about the company, but on their website they have a mission statement “To build general-purpose learning algorithms. We combine the best techniques from machine learning and systems neuroscience to build powerful general-purpose learning algorithms.” Facebook has also entered the Artificial Intelligence game and along with Google is working on image recognition software to instantly identify what is happening in photos. The technology and brain trusts working at DeepMind, including the patents on search image technology, are now all part of Google.
In May, 2013 I attended the Google I/O conference where Google’s Amit Singhal, in talking about the future of search, stated “a search engine’s three primary functions will need to evolve and that search will need to: 1. Answer, 2. Converse, and 3. Anticipate.”[5]
Robotics and quantum computing are also part of the AI end game, with robotics company Boston Dynamics being acquired by Google. Google is also teaming up with NASA to work on the quantum computer by D-Wave. Where this will lead is not yet known, as Google has remained quiet about specific future plans but the next step is to deal develop a computer that can solve optimization problems. For example, suppose you want to take a three month trip through Europe, and have a list of ten cities you want to visit, and are looking for the best price. The possibilities are endless, and developing a system that can solve this problem will require the speed of a quantum computer and the processing ability of Artificial Intelligence. While the results are not yet impressive, the potential benefits are almost limitless.
Combining the natural language and learning abilities of Watson with the current context based algorithms of location, social media and user history, we may one day soon find ourselves with an AI based search engine, that understands our questions and provides the right answers. But on top of the ultimate knowledge, will this search engine be a real Artificial Intelligence system comparable to the the abilities humans posses? This is indeed a very philosophical question. As science learns more and more about our brain, we realize that our neural network is just a very sophisticated set of connected electrical elements which in theory, we should be able to build and program. Unfortunately, it doesn’t look like we will achieve that during our lifetime.
References:
[1] merriam-webster.com, “Definition of a word ‘Google’” <http://www.merriam-webster.com/dictionary/google>
[2] comscore.com (November 13, 2013), “comScore Releases October 2013 U.S. Search Engine Rankings” <http://ir.comscore.com/releasedetail.cfm?ReleaseID=807081>
[3] Brodkin, Jon (2010–02–10), IBM’s Jeopardy-playing machine can now beat human contestants, Network World, retrieved 2011–02–19
[4] Watson, A System Designed for Answers: The Future of Workload Optimized Systems Design. IBM Systems and Technology. p. 3. Retrieved 2011–02–21.
[5] Google (May 15, 2013). “Google I/O 2013: Keynote” <http://www.youtube.com/watch?v=9pmpa_kxsam#t=1h51m10s>.
About Daniel Sodkiewicz: I am currently working on many exciting projects as a leader of RoyalDeerDesign team, a web design studio in NYC.