Five Hypotheses For Investing In The Future Of Computing

Published in

DeepTech & Climate Fonds

9 min readNov 22, 2023

Thinking about LLMs in the context of “Memory”, “Application” and “Attention” (Rendering via DALL·E)

OpenAI and NVIDIA are currently leading the charge in AI advancements, disrupting established markets. However, they could potentially be disrupted themselves by emerging competitors soon. It is interesting to peek beyond the current hype wave and look at the next waves with the potential to create many new startups with unicorn status. One lies in improving Large-Language-Model (LLM) memory within the foundational layer. The second is about the transformative impact of hybrid quantum computing, particularly in domains where artificial intelligence struggles to scale due to insufficient training data.

1) Attention (and brain drain) is all you need

On the 30th of November 2022, OpenAI released ChatGPT — setting a new record of just two months to reach 100 million monthly active users. This success is disrupting hundreds of established companies, potentially even Google — a case in point for the “Innovator’s Dilemma”, where incumbents do not invest enough in innovation potentially hurting their established business. In the case of LLMs many of the top AI researchers were once part of Google. This includes the authors of the 2017 “Attention is all you need” paper, which introduced the Transformer Architecture — a key element in recent AI advancements. Transformers have achieved new breakthroughs not only for LLMs but also for many more domains such as in robotics, or as base architecture for computer vision models such as Stable Diffusion (which was developed at my alma mater LMU Munich). Google’s brain drain resulted in many former Google researchers now leading startups such as OpenAI, CharacterAI, Anthropic, SakanaAI, Cohere or MistralAI — and slowly starting to chip away revenue from for example Google’s “Search” or “Machine Translation” business.

But also the state-of-the-art LLMs do still leave room for improvements. One area where LLMs are at this point particularly ripe for disruption is regarding their memory in general and their “short term memory” in particular. In general, the recent cutting edge LLMs have become huge — with GPT4 having allegedly more than one trillion parameters up from GPT2 which had 1.5 billion parameters, raising challenges about memory constraints when deploying these models.

2) Revolutionizing AI Memory: Challenges and Innovations of Memory in LLMs

Particularly managing the limited “short term memory” aka the context window remains a challenge, although LLMs have a great “long term memory”. The LLM “long term memory” is almost unreasonably effective in distilling petabytes of training data (think “all of the internet”) into gigabytes of implicit knowledge. Therefore, it is possible for anyone to download about 13 GB of model weights for a Meta LLama2 model. So a model like Meta LLama2, after being trained on a trillion words, can provide insights on a wide range of topics, from Shakespeare’s plays to modern scientific theories, all compressed into its 13 GB of model weights — although this model weights consist solely of numerical values.

At the same time the “short term memory” or context window of LLMs is rather limited. This is due to the LLM’s attention mechanism. This mechanism connects each word in an input text with every other word, analyzing their relationships. Therefore, the computational complexity in context length scales in a quadratic way. The maximal context size for SOTA (state-of-the-art) LLMs is limited to about 100k words. This means that when processing text, these models can only consider 100,000 words of prompt input.

Moreover, for these 100k words, LLMs do mostly recall what is at the beginning and at the end of the context — following a U-shape — and neglecting information in between start and end. The graphic below shows this for a benchmarking study of several LLMs.

Recall of words is generally highest when the relevant knowledge is positioned in a paragraph at the very start or the very end of the given “short term memory” and degrades when LLMs access knowledge in the middle of the context. It seems LLMs have been then again partially optimized to put higher weight on the last paragraphs of the input.

To get around the ”short term memory” problem, many startups are strategically fine-tuning LLMs for specific tasks — a good analogy is a doctor who after med school specializes further via a medical residency program. The challenge with fine-tuning is that it requires a significant amount of additional training data. Moreover, fine-tuning is costly because it requires changing the weights of the model which means training the model weights again using a powerful GPU. Besides fine-tuning, a widely used method introduced by Meta researchers is called Retrieval-Augmented Generation (RAG). RAG inserts an additional step before the LLM. This step uses an Encoder (also an LLM) to translate additional documents from a context library — which way larger than the “short term memory” of the LLM — into a vector representation (so just a list of numbers) and stores these vectors. These can then be compared with the (again encoded) query and provide the most relevant data points for a query. This additional data snippet can be added to the actual query into the “short term memory”, for example to answer questions for a Customer Success Chatbot. RAG is a nuanced art in itself and led to a boom of adjacent tech. Nevertheless both RAG and fine-tuning do not overcome the “short term memory” problem, but just navigate around it.

A way to fix this could revolutionize AI conversations a second time after ChatGPT. One remarkable approach out of Joseph Gonzalez’s lab at UC Berkeley is relying on seeing the “LLM as an Operating System” and cleverly borrows from operating system memory techniques, by using hierarchies of Memory to make the idea behind RAG even more successful. This approach divides memory into different levels: a main context for immediate information and an external context for additional data, the model can then move data between these contexts, allowing it to handle much more information than typical language models.

An alternative approach co-developed by Yoshua Bengio’s lab is claiming that “attention may not be all we need” and introduced an alternative architecture replacing the attention-mechanisms with sub-quadratic operators — which are scaling much better for longer context windows but have not yet been implemented on a large network yet.

3) Navigating the AI Boom: Opportunities in the Post-ChatGPT Landscape

As a Series A Venture Capitalist, my perspective on the AI surge is mixed — optimistic but with a bit of caution. The rise of AI, post-ChatGPT, has sparked great interest from the public, as evidenced by Google Trends data for AI-related searches. But while I firmly believe in AI’s lasting impact, I expect a recalibration of enthusiasm to more sustainable levels, similar to other very interesting fields such as quantum computing. Thus the current market valuations seem slightly inflated for AI startups with robust technology and substantial market traction. This poses a potential challenge for startups to grow into these valuations. In my opinion, this is particularly the case for the foundational layer while the competition from Open Source models such as by MistralAI seems already quite strong. Better investment opportunities are in my opinion still developing at the AI application and AI ops layer. In particular around LLM integration frameworks which make it easier to deploy LLMs.

Google Trends Ranking for “Quantum Computing” (blue) and “Generative AI” (red) in the last 12 months

As discussed above, deploying LLMs in production via Retrieval-Augmented Generation (RAG) depends on the use of vector-databases. Therefore, another trend on the AI ops layer has emerged: Several startups are building vector databases. Leading companies include Qdrant, Milvus, Weaviate or Pinecone. While the potential of stand-alone vector databases is intriguing, it remains to be seen whether they can outperform vector extensions by established database technologies such as pgvector by PostgreSQL.

The longer context windows discussed above could also bring other new applications — for example in bioinformatics where the whole of a human DNA could be used as context — enabling a LLM based version of 23andMe. I see this research about larger context windows as a blueprint for the next generation of AI, but if it will be commercialized by startups or established companies — for example by OpenAI — is yet to be determined. Other applications are agents delivering real-time information or personal virtual assistants based on LLMs combined with wearables way more powerful than current “smart” assistants such as Siri.

4) GPU Gold Rush: Strategies in Securing Hardware Resources

Recently, GPUs have become as sought-after as rare earth metals, with AI startups going to remarkable lengths to secure them. Companies are navigating wait lists, leveraging government grants, and even repurposing crypto mining infrastructure, to achieve a competitive edge. This scramble for GPUs has been fueled by the AI boom, with generative AI applications creating an unprecedented demand. The graph below compiled by the State of AI Report, shows a comparison of AI compute resources across various entities, quantified by the number of Nvidia A100 GPUs in each cluster. Surprisingly, a significant number of these clusters are in the EU, which has started a program to make these resources available for startups. The next application phase ends in December.

Clusters by A100 GPU Count (via State of AI Report)

5) Hybrid quantum advantage expected to emerge within the next three years

Machine learning models, particularly Large Language Models (LLMs), require massive amounts of training data. In situations where such data is scarce, AI’s effectiveness is limited, yet other emerging fields might offer a solution. In the next three years, I expect another “ChatGPT” moment — from the field of Quantum Computing (QC). Imagine a future where the next Wegovy — the drug that propelled Novo Nordisk to the status of the most valuable company in Europe — is discovered not through traditional research methods, but via the advanced capabilities of QC. In this scenario, quantum computers analyze molecular interactions at an unprecedented speed, identifying potential drug candidates. Anticipating breakthroughs like these, I believe that investments in QC are highly promising, especially in fields where limited data is available for training AI models.

The future of QC is likely to be shaped significantly by hybrid models, both in terms of software and hardware. This dual approach not only melds the strengths of classical and QC in software but also innovatively combines different quantum hardware technologies to leverage their individual advantages. On the software side, hybrid computing uses classical optimization to prepare problems for quantum processing. These problems are then tackled using quantum algorithms designed specifically for Noisy Intermediate-Scale Quantum (NISQ) devices. NISQ devices are a class of quantum computers that operate with a limited number of qubits, representing the current stage in QC technology. The hybrid approaches are practical, as they forgive the current limitations of quantum hardware. I firmly believe that profitable examples of quantum advantage will be achieved by this approach in the next 36 months.

Moreover, the hardware side of QC will also be hybrid — although this will in my opinion take significantly longer than 36 months. Hybrid QC hardware involves integrating different QC technologies to create a more powerful and versatile system. For instance, combining the long qubit lifetimes of neutral atom qubits, the all-to-all interactions facilitated by ion trap qubits and the manufacturing processes adopted from the semiconductor industry inherent in superconducting qubits, could lead to a quantum computer that surpasses the capabilities of any single-technology approach. This hybrid hardware approach could in the long run address many of the current limitations faced by quantum computers, such as coherence times, error rates, and scalability challenges.

In conclusion, the future of quantum computing is not just in developing better quantum algorithms or in building larger quantum systems, but in combining the unique strengths of different quantum and classical computing technologies. For an overview of the german QC landscape, have a look at our DTCF Deep Dive by my colleague Lena, and stay tuned for our next quantum blog post.

Five Hypotheses For Investing In The Future Of Computing

Written by Jonas Sommer