Jan 16, 2019 · 6 min read

NTENT CTO Dr. Ricardo Baeza-Yates Shares Insight into Machine Learning, NLP, the Rise of Voice Search and More

Image for post
Image for post

In the following interview with NTENT Chief Technology Officer, Dr. Ricardo Baeza-Yates, he opens up about his extensive history in Search, the role that Machine Learning, Natural Language Processing and Voice Assistant technology stand to play in the future of Search, and why business leaders should rely on NTENT to help them reach the next level. A transcript of the interview can be found below.

NTENT: Can you tell us a bit about your extensive background in Search?

I did my Ph.D. thesis at the University of Waterloo in Canada, where I worked in the context of the New Oxford English Dictionary. At that time, end of the ’80s, that was the largest textual file comprising about 570Mb. Although that is nothing today, at that time it did not fit in a single CD ROM. My main contributions were fast sequential algorithms to search not only words but also words with errors and more advance patterns. After that, I first tried to apply the same techniques to DNA sequences, but that was too early and the biologists I talked to were not yet interested. Then, the Web started to grow and the first search engines appeared, so it was natural to switch my research to web search. I worked first in semi-structured retrieval, just before XML was defined.

The main challenge I had was data, particularly user query data. To solve this problem, with the help of some Brazilian friends in 1999, I built a search engine for all Chile, called TodoCL (all Chile), that ran for more than 15 years. This search engine had several million queries per month and thanks to that anonymous data, I could do seminal research in query log mining, for example in query suggestions and answer caching. Later I started to work for Yahoo and leading the European lab in Barcelona, I was able to expand my research to all aspects of web search, including new distributed web search architectures. Today I am working on semantic search thanks to NTENT. So, overall, I can say that I have been working on different types of search for more than 30 years.

NTENT: What are the implications for search with the rise of voice technology and smart assistants?

This is a very interesting and non-trivial question. There are many implications, but I think the main two are (1) contextual search and (2) task-oriented search. The first relates to adding the right context to the search. For example, if I ask, “Which was the score of the Giants last night” the answer might be different if I am in New York or California. However, even if I am in New York, if yesterday the only game of the Giants was in San Francisco, I am clearly talking about baseball and not football. The second refers to a sequence of queries where there is a final goal. Hence, search is not about a single query but a task where the context also will be important. Currently, at NTENT we are working on both problems as they are a core part of our smart assistant.

NTENT: How do Machine Learning and Natural Language Processing improve the types of queries search engines can answer, especially when it comes to voice?

When people use voice, they naturally use natural language. So instead of keywords, we use sentences, sometimes ill-defined grammatically. So, parsing the semantics of these sentences is naturally done using NLP. We can do shallow parsing and extract just the entities (i.e., people, locations, dates) or do a full parsing if we need to track the context of a conversation. This can be done using classical parsing techniques, machine learning or, the best approach, a combination of both. In addition, ML can be used to detect multiple conversations (e.g., adding an item that we just remembered to the shopping list when engaged in the preparation of our holidays) with the same person or within the whole family.

NTENT: At a high level, can you describe the technologies NTENT is using, such as Machine Learning (ML) and Natural Language Processing (NLP), to deliver these solutions?

As mentioned before, we enhance classical NLP by using ML to improve entity recognition or doing entity disambiguation like in the Giants example. Second, we use a suite of different ML techniques and word embeddings to do query intention prediction. Third, we use learning-to-rank to improve search ranking using a two-phases approach, either for generic search or for a given intention. Finally, we use deep learning to understand and generate answers when assisting through a conversation.

In a nutshell, NLP infers the semantics of what the user is trying to do while search provides the answer. Machine learning is the magic ingredient that ties it all together and obtains the best possible quality.

NTENT: How will NTENT use NLP and Machine Learning to set themselves apart in this space?

I think that the main advantage of NTENT comes from the mix of more than 20 years of experience developing semantic technology coupled with the newest ML techniques that have appeared in the last couple of years. In summary, putting together wisdom and youth.

NTENT: Can you talk about the data science and the value of the search data available through NTENT’s platform?

As many people say, data is the gold of the 21st century. The only way to improve your products is understanding your data and that means doing data analytics to answer the questions that you have and perform data mining to find the answers for the questions that you do not (yet) know. Both will give you insights about your users and clues on how to keep innovating. Hence, data science is essential to survive. For example, we can discover popular queries that have no advertising associated with them and improve monetization.

NTENT: Why should business leaders and CEOs partner with NTENT to bring voice and search solutions to their customers?

These companies have a captive audience that they need to capitalize. One way is through smart voice assistance through mobile phones, cars or virtual butlers. Not only because they improve the communication with their clients, but also because in that way they can gain insights from the data that they can gather. Note, that this data has meaning as natural language may give more powerful information than, say, numerical sensor data.

The main value for consumers is to improve the user experience. The main value for businesses is to understand what people want, particularly what you do not have (e.g., asking for a new product or functionality). For example, you can discover a common behavior in most families that triggers the offering of a new periodic service.

In the future, everything will be done by voice and powered by AI, particularly customer management through many different channels. In the case of assistants, most use cases imply explicit or implicit search tasks, from a direct question like “What is the temperature?” or an implicit question if two friends decide to meet at noon, such as “Do you want me to reserve your favorite restaurant? Which day?”.

NTENT: You recently completed research on bias on the Web. Can you explain what that means?

The Web has many instances of bias, including data bias, sampling bias, algorithmic bias, interaction bias and second-order bias such as personalization and ranking bias. In other words, it is a vicious cycle of bias that pollutes the Web more each day.

NTENT: What does the industry need to do to overcome bias on the Web?

First, we need awareness of all the biases involved, and that was one of the goals of my recent research. The second is bias mitigation by designing systems that counteract these biases, such as recent research in learning-to-rank with bias. The third is to put in place and enforce codes of ethics to make sure that all systems consider the possible negative impacts of bias, particularly gender or race discrimination.

At NTENT, we are aware of the problem, we de-bias our data before using it. For example, click feedback suffers from ranking bias and that must be considered before using it. This implies building machine learning models that weigh clicks differently, giving more importance to rare clicks rather than, say, clicks on the first results.

NTENT: What makes NTENT a great place to work for computer scientists, data scientists and mobile engineers?

I think we have a great combination of technical quality and work culture, so it is a good place to learn and grow in your career. So come join us!

To learn more about NTENT visit www.ntent.com.

Welcome to a place where words matter. On Medium, smart voices and original ideas take center stage - with no ads in sight. Watch
Follow all the topics you care about, and we’ll deliver the best stories for you to your homepage and inbox. Explore
Get unlimited access to the best stories on Medium — and support writers while you’re at it. Just $5/month. Upgrade

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store