Teaching language to machines — one step closer to AI.

Pat Inc just passed our first set of independent tests, verifying we are on the path to true natural language understanding. This is great progress — but for us, it’s just the beginning.

Despite all the investment and advances we’ve seen in machine intelligence over the last 60 years, we still can’t match a three-year-old for understanding meaning in natural language. Unlike a human, AI can’t understand the basics, let alone the nuances of human language right now. And until it does, we can’t really communicate effectively, which limits the potential value of machine intelligence, especially to narrow or specialist domains, which is hardly AI. My vision is to humanize conversation with machines and we are making great progress.

Pat Inc has taken a completely different approach to natural language understanding (NLU) and we have just successfully completed our first round of testing to independently verify our progress. What’s unique about our approach is that we are teaching language to machines. We are building AI’s next generation Natural Language Understanding API to deliver “Meaning-as-a-Service” by processing natural language and human conversations into structured information about its meaning. We are building the world’s most powerful natural language service for developers to build intelligent agents and applications that you can talk or text to.

100% accuracy

To verify our progress and evaluate Pat’s reading comprehension, we chose Facebook AI Research’s (FAIR) Question Answering tasks they call bAbI. I am pleased to report that Pat passed our first 6 of the 20 bAbI tasks with 100% accuracy. But let me tell you a little more about the tests and our results to help explain why we are so pleased and working on the next batch of tasks already.

FAIR, led by Yan LeCunn, are on a mission to advance machine intelligence to help create new technologies to give people better ways to communicate, or in short, to solve AI. Last week the team published their thoughts on the long game towards understanding dialogue and last year the team published 20 tasks for Natural Language Understanding (NLU) and observed that many existing learning systems cannot currently solve them. Their aim was to classify the 20 tasks into skill sets, so that researchers could identify (and then rectify) the shortcomings of their systems. Each of the 20 tasks provides 30,000 or so sets of training data and 3000 sets for testing. The tests include a sequence of statements culminating in a question. To get started, we tested Pat on 6 tasks; Single Supporting Facts, Yes/No Questions, Simple Negation, Conjunction, Basic Coreference and Compound Coreference. The full list of the 20 tasks is below with examples of the tests in each task.

Sample statements and questions from bAbI’s 20 tasks

The tasks are publicly available at http://fb.ai/babi. Source code to generate the tasks is available at https://github.com/facebook/bAbI-tasks.

Pat’s First Report Card

So how effective is our linguistic approach to natural language understanding? In brief, the results from our first round of bAbI tasks are below compared to FAIRs initial baseline tests against machine learning approaches. First let me explain the results in the table below, the comparison tests and then a little more about our approach and results.

bAbI task results — Pat Inc versus FAIR machine learning baselines

When FAIR benchmarked the bAbI tasks they chose three different approaches. For each task, FAIR used 1000 questions for training, and 1000 for testing.

Weakly Supervised. FAIR chose an N-gram and LSTM model to test weakly supervised models which were only given question answer pairs at training time, whereas strong supervision provides the set of supporting facts at training time (but not testing time) as well. The LSTM outperformed the N-gram in most tasks and overall hence we included it in the comparison results above.

Structured SVM. FAIR also built a classical cascade NLP system baseline using a structured support vector machine (SVM) which incorporated large amounts of costly labeled data as external support for the model. The bAbI paper has more detail on the external resources and Pat does not rely on these resources in our approach.

Strongly Supervised. Finally, FAIR used MemNNs, a recently proposed class of models that have been shown to perform well at QA. They work by a “controller” neural network performing inference over the stored memories that consist of the previous statements in the story. This approach also used supporting facts and five variants were explored so we have compared against the strongest of those models as well.

Pat’s approach

There are two main areas about our approach to the tasks I would like to highlight.

Firstly, Pat didn’t require any of the training data provided to pass with 100% accuracy. This is a result of our approach and why we are so different to everyone else we know of trying to solve Natural Language Understanding. The challenge of Natural Language Understanding is understanding the true meaning of sentences and conversations not just translating words or guessing the intent of a question. The traditional approach to NLU is big-data where the, say 30,000 training samples would be used to enable the machine to determine the patterns of co-location of the words.

Pat Inc takes a linguistic approach to NLU not statistical or machine learning based. After all, humans don’t learn language reading thousands of books and memorising colocation patterns and probabilities. When FAIR developed these tests they assumed like everyone else that algorithms would be “trained” on hundreds or tens of thousands of training tests before they were tested. To be fair, we did do some preparation before we tested Pat and have included those remarks for each task in the table above but they are linguistic training tasks not brute force machine learning. Pat Inc is not trying to learn patterns about language meaning from big data. Rather than training data or annotated corpora, Pat builds knowledge on language just like a human by progressively learning the way words are combined, regarding real objects, people, processes, and events in context.

Finally, Pat responded to each test with Natural Language not just a keyword. Pat went beyond the required answers to provide more complete responses, with clarity and human logic. For example, in Task 6:

bAbI: Sandra moved to the office. John went back to the garden. Is Sandra in the office?

Pat: Yes. She was in the office.

bAbI: Sandra went to the hallway. Sandra went to the kitchen. Is Sandra in the bathroom?

Pat: No. She is not there now.

bAbI: Mary went to the office. Sandra got the apple there. Is Sandra in the kitchen?

Pat: Yes. She was in the kitchen.

The limitations in the bAbI tests in this regard are significant. The path to NLU (AI-complete problem) requires machines to emulate human levels of interaction. For this reason, human level tests should be performed. BAbI is a good start and only small changes are needed to the test sets to become human-like.

So what’s different about our approach

NLU has been AI’s ‘hard problem’ since Alan Turing’s pioneering work and John McCarthy’s AI movement in the 1950s. IBM Watson, Google, Nuance, Amazon and Apple along with at least 1000 other teams have all focused on developing solutions over the past decade, emphasising big-data or machine learning to fuel natural language understanding. But our approach is different.

Over the past 10 years, we’ve developed a unique capability to work with human language. Unlike other platforms, Pat focuses on solving the problem of machines actually understanding human language. It’s easy for a machine to look up the definition of a word, although the processes beyond that are probabilistic: counting the number of words, tracking word order, parsing the syntax. Meaning will only make sense within the context of the sentence — and this is why we used linguistics as the foundation for Pat, not big-data. For that reason, we will apply Pat to the other linguistic based tests that we consider are relevant to NLU.

Pat’s future potential

This is great progress — but for us, it’s just the beginning. We believe we can scale Pat beyond these tests to really solve the challenge of NLU. In the process, we can also meet the significant forecast demand for AI apps — forecast by IDC to be valued at $40 billion across Google, IBM, Amazon and Microsoft platforms by 2020.

That’s why Pat’s further development will have significant impact on the AI we already depend on today — as well as the technology just around the corner. From driverless cars and wearables to home automation and networked applications, we can expect machines to provide us with more meaningful, helpful experiences and a natural, human-like interaction.

Developers are now welcome to register their interest in private beta access for our API or sign up for regular updates on our progress.