Findo wanted to use the results achieved in image analysis with deep statistical models and to apply them to text analysis. The problem? Text data is extremely sparse: the more discrete the data, the more data is required to successfully train statistical models.
The solution to this obstacle was vector solutions. Artificial Intelligence generally does involve machines producing language responses to a natural (meaning human) language query. But recent advances in fields like generative variational text modeling, distributed vector space modeling of sentences and documents, and topic modeling have made the problem of sparseness more tractable. Using an unsupervised and semi-supervised approach, Findo aims to create a statistical model of text data so that a limited quantity of information need not hinder analysis.
Once the limited amount of text is entered, the machine reader “pretends” to understand the text. It proceeds to convert it into a feature-rich artificial vector representation. A representation like this does away with the need for the text to have been rich in quantity in the first place. This form of information that codes what was contained in a text can later be used for various NLP tasks: unsupervised clustering, classification and so on. The model can then match relevant user data to a contextually appropriate query topic. It can even ask questions to refine and clarify a query if needed, enabling the user to get exactly what they want.
A model like this could then be the solution to a great many problems. Findo is focusing on the following ones.
Users would be able to search by description. Meaning they would not only find the document they were looking for through keywords, but the system would also indicate the semantic space corresponding to the search query, allowing users to also find other relevant or related results.
Remember how difficult it is to manually look after structuring your own data? Tags, folders, rules for moving items? The model’s job here would be to solve the problem of auto-tagging and automatically classifying the data for the user. Findo either learns by studying new data patterns or by understanding how to use the habits and structures a user already has to keep sorting the incoming feed with tags or folders.
Controllable chat bot
The “holy grail” of models is the bot that won’t simply rely on the high probability bits of analyzed text, but speak with real cognition and insight. To this end, Findo’s model will generate text that can ask accurate questions during a conversation. The aim is to help the bot develop a sophisticated sense of communication that is nevertheless controllable. For example, when a user searches for images, the bot might ask a user what the picture showed. Currently, neural networks are already used for supervised question answering.