The AI Accelerators Blog Series: a Year‘s Retrospect

Adi Fuchs
7 min readMar 31, 2023

--

It has been a little over a year since I published the five-part blog series on AI accelerators, followed by another blog that outlines the challenges and next stages for AI hardware acceleration. In the final part of the series, I teased the idea of retrospection to see how well that series stood the test of time. Well — since last year, the rapid advancement of AI seems to have accelerated even further and it is on the verge of redefining the relations between humanity and technology. So now we are a few months into 2023, and it seems like a good time to cover recent events and see where the interplay of AI and hardware is going next (which I will mostly elaborate on in a follow up post).

Disclosure: I did not use any generative AI tool for writing these paragraphs, maybe I should have?

I. AI Commoditization: Revolution at Unmatched Scales

It seems that the biggest game-changer in AI or even in the world of technology was the public announcement of generative AI tools like Stability AI, Midjourney, and of course — ChatGPT. The rapid adoption of Generative AI is unlike anything seen previously, but what were the main contributors to its raging success?

Time to Get to a Million Users (data source: statista.com)

First of all, as mentioned in the last blog post, commoditization is a driving force behind the great technological revolutions, and Generative AI seems to be no different. The transformative capabilities of large language models were already demonstrated over two years ago by GPT3, which could write whole new articles from scratch, Shakespeare-style poetry, and more. However, while GPT3 was almost as powerful as ChatGPT (RHLF and other refinements aside) and thus possessed the same revolutionary potential, it did not get the same traction as ChatGPT since its availability was limited; it required developers to sign up for a waiting list, which limited the way people interacted with it, and had a narrow target audience.

In contrast, the new Generative AI tools are practically free for everyone (at least in their basic forms). To use ChatGPT, you could just sign up with your email or Google / Microsoft account and start experimenting. To use Midjourney, you sign up for a Discord account and start generating wild images with every prompt imaginable. No tedious registration process, no waiting lists, and no strings attached. It is now easy for everyone to share their experiences with these powerful tools.

How Does AI-Powering Compute fit Into this World?

Bing/Dalle Illustration (prompt: “a computer chip holding a ball which is half brain and half earth, realistic”… close enough.)

Arguably, the most important factor for successful commoditization is solid and very expensive infrastructure, as the current scale of Generative AI services requires an abundance of compute power to provide high availability. I would dare to assume that two years ago, OpenAI did not possess the infrastructure at a capacity that would withstand the computational costs of a service like ChatGPT, or at least it could not have provided the same quality of service (if ChatGPT would have taken, say, 30 minutes to answer each prompt, it would probably not have been nearly as successful, see “the binge example”).

Once again, the ability to scale large compute and easily ship it, is a key driver for commoditization. There is a lot of money spent and compute infrastructure demands will only rise. However, the AI accelerator startup landscape is yet to fully benefit from this trend… and why is that?

II. A Very Rich Landscape at Times of Turmoil

Midjourney Illustration (prompt: “computer chips and dollar bills in a stormy weather”)

Another Disclosure: As a former employee at SambaNova, I was able to both participate and witness the strides taken in bringing an early-stage accelerator startup to a potential AI game-changer, so I cannot be entirely unbiased when writing the following paragraphs. On the flip side, I also worked at Mellanox which was acquired by NVIDIA and became the basis of their DPU architecture. Therefore, I can say that I am fortunate enough to testify on the strengths of these two high-quality organizations.

Out of the AI accelerator blog series, one of the posts that caught significant traction was about the “very rich landscape” that was published around the time that the AI accelerator landscape was boiling hot. As 2022 has been a tricky year for many tech companies, the AI accelerators landscape was no different; NVIDIA’s stock was down from its peak by about 60% (though now it seems to have bounced back to where it was around mid-2021). As VC money became scarce, we have not seen any significant funding rounds in the past year.

Consequently, AI accelerator startups had to adapt and shift their focus accordingly. Several startups laid off employees, some had an unformal drop in valuation, and some even had to shut down their activity altogether.

Why did that happen?

Chip startups can be viewed as an extreme version of how typical startups are created, and how they operate. While the common perception of a typical startup is a few friends gathering in a garage and investing a few months to materialize their great ideas, that is not the case for a chip startup; it typically takes a team of dozens of domain specialists with expensive CAD tools three to four years and 10–50 million dollars to get to an MVP (minimal viable product) which is the company’s first chip. So compared to your typical startup, chip companies embark on a longer journey that requires an even bigger leap of faith. On the flip side, if they have a solid solution, that is a high bar to cross and is less likely to face much competition since the long time-to-MVP makes the landscape move slowly.

Back in 2020 and 2021, when the VC business was booming, many startups got a lot of money which they invested in building strong teams of experts to build higher-quality chips, architectures, and software stacks. It was all for the purpose of investing in the long-term vision. However, in 2022 and 2023, investors were not as patient and they needed to see more immediate revenues to justify further spending.

What About The Future of Startups?

The explosion of Generative AI in the past few months presents a huge opportunity and re-opens the race for many startups if they play it right. AI accelerator startups have the advantage of driving a disruptive innovation to the AI landscape since they were built based on different architectural assumptions, and have different constraints and strengths. While they can beat NVIDIA on several fronts, they might be less of a game-changer if they choose to compete head-to-head with what NVIDIA’s GPUs already do pretty well, which is the common CNNs and transformer pipelines. What can they do alternatively?

Vertical Integration based on foundational models: As major parts of the IT landscape gravitate towards the new age of Generative AI, the number of potential use cases is huge. Much like there’s no one-size-fits-all for every application, there can be no “one-architecture-fits-all” or “one-stack-fits-all.” AI startups should create partnerships with organizations within given sectors and understand how their acceleration platform delivers a differentiating factor for that particular sector. Startups have already started doing that, for drug development (Cerebras, Graphcore, and Groq) and banking (SambaNova), and there will be opportunities in other large sectors. The race has only started.

Last Mile Service: Even after years of software, compilers, and library development, we still cannot reach full out-of-the-box performance, due to the “user-to-hardware expressiveness” gap mentioned in the first blog series.AI has many “last mile” challenges in comprehending what the user or programmer wants, and AI acceleration is no different. Much like model performance (i.e., accuracy), in terms of accelerator performance (i.e., execution times), we are “almost there” in many cases but we are also “not even close” in many other cases. I can personally testify that this was the case for me when I experimented with new models on several accelerators from non-startups as well as startups. While transformers and LLMs have been with us for about five years already, it is impossible to map out all possible use cases as each case has its own particular characteristics, and it is often hard to anticipate at compile time how it would stress the different parts of the compute, memory, and communication stack. That is why organizations should invest not only in data processing, orchestration, and engineering but also spend a great deal on infrastructure adaptation. A good example is the great deal of engineering effort that went into improving Microsoft’s Azure infrastructure utilization for large-scale AI and LLM workloads, as shared by an interview with Nidhi Chappell, which leads the workload-optimized organization in Microsoft Azure.

In 2019–2022, AI accelerator startups have been on an ambitious cadence of publicly releasing a new chip roughly every 1.5 years. I think we have been squeezing a lot out of our silicon, and now it might be better to invest more time in expanding their ad-hoc teams of engineers and specialists to help organizations maximize the potential of their architecture. That will also give the startups the opportunity to learn more about how the behavior of their toolchains and how they can refine and generalize for more use cases, and maybe hold off on newer chips to better understand what can radically be changed in their architecture before releasing a new chip that will still be on an evolutionary-but-not-revolutionary roadmap. Will that potential exist? For sure. But for that, I am working on my next post on the need for a new acceleration architecture and how it would determine the future of AI.

--

--