Memory Leak — #30

Published in

Memory Leak

5 min readOct 6, 2023

VC Astasia Myers’ perspectives on machine learning, cloud infrastructure, developer tools, open source, and security. Sign up here.

🚀 Products

Assistant With Bard: A Step Toward a More Personal Assistant

Assistant with Bard is a personal assistant powered by generative AI. It combines Bard’s generative and reasoning capabilities with Assistant’s personalized help. You can interact with it through text, voice, or images — and it can even help take actions for you. As an example of what the tool can do, Google says you can “float the Bard overlay on top” of a photo you want to post on social media and ask Assistant with Bard to create a caption.

Why does this matter? AI assistants leveraging AI agents running background processes has been one of the most interesting areas of innovation. Assistant with Bard is an extension of Google’s work around its Assistant product that has a conversational interface most often used in Google Home products. We have seen a number of AI assistant products that are local applications or browser extensions.

Retool Workflows Is Now Generally Available

Retool Workflows is a visual automation product that lets you compose APIs and database queries with code (JavaScript or Python) to automate work. It combines the ease of drag and drop with the extensibility and reliability of code that engineers need for production-grade work. On the surface, you might find it similar to Zapier or Workato — except it’s self-hostable and designed to handle complex workflows that are often difficult or impossible to build in those low-code only tools. Under the hood, Workflows runs on Temporal, a highly scalable, durable, and reliable execution engine which supports jobs that can run as long as you want — whether that’s hours, days, or even months.

Why does this matter? Retool started in the internal application development category, particularly admin panels. Now it considers itself, “a development platform for building business software.” The release of Retool Workflows demonstrates the company is trying to own more layers of the internal software stack. Workato operates in the Integration Platform as a Service (iPaaS) market that Gartner states is over $3.5B. Retool moving into iPaaS is a huge new market for it to address.

Introducing Magic Studio: The Power of AI, All in One Place

Canva launched a suite of Magic products that help with inspiration, creation, and editing. Magic Design turns your ideas into designs in an instant. No matter what you’re making, all you need is a written prompt, or you can start by uploading your media. From your request, Magic Design takes it up another level by identifying exactly what you need and quickly creating sophisticated, curated designs — just for you, in a matter of seconds.

Why does this matter? Since generative AI is nondeterministic, many of its best use cases are in creative fields. Adobe Firefly moved quickly to adopt GenAI functionality, and Canva is not to be outdone. My favorite Magic release was Magic Design for Presentations. Just type your idea in a few words, and watch slides fill with a unified story, outline, and content. Canva has been used to build presentations for years, and Magic Design for Presentations makes it a direct alternative to Tome.

📰 Content

KPMG 2023 U.S. CEO Outlook

KPMG conducted a CEO survey and found that seventy-two percent of CEOs ranked investment in generative AI as a top priority for their organizations. And the majority said they are placing more capital investment in buying new technology (57%) than developing their workforce’s skills and capabilities (43%). Eighty-five percent of U.S. CEOs say AI can help detect cyberattacks while providing new attack strategies for adversaries, playing on perennial doubts that any organization can develop an enduring defense against cyberattacks.

Why does this matter? During our buyer calls, we continue to hear that adopting AI-enabled products or infrastructure to build AI has the most attention from executives. Check writers have urgency to purchase products. The market is moving quickly. It’s early enough in the process that a good new solution is competitive with a bad known solution. There isn’t vendor lock-in yet. We expect the new AI infrastructure stack to crystallize over the next year. Regarding security, we are most concerned about deepfakes being applied to commit fraud and phishing.

ChatGPT-Owner OpenAI Is Exploring Making Its Own AI Chips

OpenAI, the company behind ChatGPT, is exploring making its own artificial intelligence chips and has gone as far as evaluating a potential acquisition target, according to people familiar with the company’s plans. The company has not yet decided to move ahead, according to recent internal discussions described to Reuters. The effort to get more chips is tied to two major concerns Altman has identified: a shortage of the advanced processors that power OpenAI’s software and the “eye-watering” costs associated with running the hardware necessary to power its efforts and products.

Why does this matter? Nvidia, which has a market cap of $1.1T, controls more than 80% of the global market for the chips best suited to run AI applications. There has been a GPU shortage for AI businesses, many are “GPU poor.” The cost of GPUs also severely impacts companies’ margin structures. For large companies, having your own processors isn’t new. When I worked at Cisco it was ASICs for Cisco Switches. Google has Tensor Processing Units (TPUs). OpenAI is not alone in considering building their own chips. Amazon is also making custom chips to catch-up in the GenAI race.

Flash Attention-2: Faster Attention with Better Parallelism and Work Partitioning

Tri Dao of Together proposes FlashAttention-2, an IO-aware exact attention algorithm that uses tiling to reduce the number of memory reads/writes between GPU high bandwidth memory (HBM) and GPU on-chip SRAM. In particular, they (1) tweak the algorithm to reduce the number of non-matmul FLOPs (2) parallelize the attention computation, even for a single head, across different thread blocks to increase occupancy, and (3) within each thread block, distribute the work between warps to reduce communication through shared memory. These yield around 2× speedup compared to FlashAttention, reaching 50–73% of the theoretical maximum FLOPs/s on A100 and getting close to the efficiency of GEMM operations. They empirically validate that when used end-to-end to train GPT-style models, FlashAttention-2 reaches training speed of up to 225 TFLOPs/s per A100 GPU (72% model FLOPs utilization).

Why does this matter? Scaling Transformers to longer sequence lengths has been a major problem in the last several years, promising to improve performance in language modeling and high-resolution image understanding, as well as to unlock new applications in code, audio, and video generation. The attention layer is the main bottleneck in scaling to longer sequences, as its runtime and memory increase quadratically in the sequence length. FlashAttention-2 is 2× faster than FlashAttention, which means that we can train models with 16k longer context for the same price as previously training an 8k context model, for the same number of tokens.

💼 Jobs

⭐️DragonflyDB — React Tech Lead — Dragonfly Cloud

⭐️Chroma — Member of Technical Staff

⭐️Speakeasy — Founding Engineer

Memory Leak — #30

🚀 Products

📰 Content

💼 Jobs

Written by Astasia Myers