What Will The Natural Language Technology Stack Look Like in 5 Years?
It’s been a year and a half since Andrej Karpathy wrote one of the best pieces on the new A.I./NLP revolution — The Unreasonable Effectiveness of Recurrent Neural Networks. I’ve thought a lot about NLP since reading it, and even started a company, Talla, that builds intelligent, conversational versions of enterprise applications.
Last year, I joined Pillar as a Venture Partner, where we’ve had much discussion about the coming changes A.I. technologies will bring to natural language. When Amazon announced Lex in November, our team had a nice internal debate about what this means for the future of conversational technology. Ultimately, our conversation left us with a lot of open questions like:
- How much NLP will future startups need to do themselves, and how much will be off-the-shelf?
- Which pieces of the “NLP Stack” will be owned by large companies, and where are the opportunities for startups?
- Will a single technology solve a variety of natural language problems like text summarization, natural language understanding, question answering, sentiment analysis, natural language generation, and more? Or will these require separate solutions?
So, we decided to dig in by inviting some of the smartest people we know in the space out for to debate and discuss.
Luckily, Boston is has two great universities right in our backyard, MIT and Harvard, and companies with deep NLP roots like Nuance, Basis, and Amazon’s Alexa group, in addition to tons of startups working in fields that touch on NLP. We ended up with a great mix of people from industry and academia, with different perspectives on where natural language technology is going. I’ll highlight the key ideas from the dinner below, but, keep in mind that for every idea, someone at the table disagreed with what I’m writing (it was a lively dinner!).
The Natural Language Tech Stack in 2022
There are several parts to the evolving natural language stack.
- Voice Input: At the top, there is voice input. Voice to text is a relatively well solved problem for most common use cases.
- Natural Language Understanding (NLU): Below that is NLU, which consists of parsing text (from voice, or from direct input) and pulling out the key entities. At this point NLU is good, moving towards “great.” Parsing is already near human level success rates.
- Inference & Reasoning: Moving down the stack further, to inference and reasoning is a bit more difficult. A.I.s with bodies (robots, for example) may be needed to help ground language in some other modality, or, new graphical techniques might help, but these are all very nascent. (But keep an eye on the intersection of deep learning and graphical models)
When it comes to different types of natural language goals, like text summarization vs. question-answering, it seems likely a single platform will be able to solve them all in coming years — we won’t see dramatically different technologies for each type of problem. In fact, several people felt that many many natural language problems can be reframed as machine translation problems, and use similar approaches to solve them.
There was strong consensus that many of the new conversational and semantic technologies developed recently haven’t been widely used yet, and that systems that can take advantage of them will improve tremendously in coming years. There was also a lot of support for some of the new “approximate inference,” and thinking that the hardware to support it could lead to some really good next-generation breakthroughs.
So, to predict what the world might look like in 5 years:
I expect voice-to-text, parsing, and NLU to be largely solved problems, with platforms that execute them operated by tech giants like Google, Amazon, and Microsoft, and with plenty of vertically targeted or open source alternatives as well.
Startups probably won’t have to worry about any of these phases, and instead can focus on induction, inference, and reasoning. If you are building something in a targeted vertical, you may end up building some custom NLP to deal with the vernacular of that vertical.
Opportunities For Startups
There are a few different ways startups can play the future.
- Full-Stack Products: The best is probably to work on full stack products that use NLP, rather than platform-able pieces of the NLP technology stack. If you start today, that will require you having some strong NLP chops to get started, but over time much of your core stack can be replaced as the technology standardizes and matures. But waiting too long, until that maturation point, will probably mean the largest market spaces are closed off for startups. You have to get going before the technology is fully ready.
- Platform Play: If you want a platform play, there are likely opportunities around special language models that deal with targeted verticals with lots of unique words. And of course there will always be opportunities to sell tools, analytics, and products that take other levels of NLP tech and make it easier for someone else to use. Think about the tools that sit on top of AWS and make it easier to monitor, deploy, and track usage of your account. Those businesses aren’t ready to start but, as NLP platforms mature, they will be.
- Natural Language Generation: Natural language generation (e.g. robowriters), is also going to be an area with huge opportunities for startups, because the use cases are so varied that no platform player can support them all. This is a strong area to investigate if you are thinking about a company in NLP.
The past few years have seen tremendous progress in voice to text, and in machine vision. Both of these were driven by deep learning. You can see from this graph that machine vision went from nearly a 30% error rate in 2010 (that was after about 50 years of work on machine vision) to human level success rates (5–6%) in just 5 years. Voice recognition followed the same pattern.
Most of the innovations in NLP of the last few years haven’t been deployed into production systems. There are huge opportunities to work on that edge of solving the applied versions of the recently solved research problems.
Now that these techniques are being applied to natural language, we think similar improvements are just around the corner.
What’s Next (Near-Term)
We have a few theses at Pillar about near-term opportunities in NLP. If you’re focused on any of these areas, we’d love to connect:
- Tools applying approximate inference techniques to NLP, NLU, and NLG.
- Hardware specific to natural language applications. (Integrated parts of the stack, for example)
- Human-in-the-loop systems that automate responses based on how humans respond, and improve over time.
- Information retrieval tools that use neural techniques on natural language to provide better results for specific domains.
- Things at the intersection of robotics and natural language.
It is an incredibly exciting time to be an entrepreneur, with so much opportunity coming in the next few years in A.I., and specifically around natural language. Feel free to leave a comment if you have any views on this subject you think we should consider, or if you just want to reach out and discuss your company.