Unlocking the Promise of Artificial Intelligence with Green Data Infrastructure

Published in

Generation Investment Management

6 min readNov 15, 2022

Artificial intelligence (AI), machine learning (ML) and high-performance computing (HPC) workloads have the potential to power much of the innovation we need for a net-zero, prosperous, equitable, healthy and safe society. This includes technologies like hyper-efficient manufacturing, which uses computer vision for automated error detection to cut down on waste, and electric self-driving vehicles for passengers and freight which are safer and less carbon-intensive.

But there is a catch. Today, these technologies are bottlenecked by legacy data and storage architectures. This means that models can take hours (even days) to run. Moreover, the datacentres these large models depend on could consume ~7% of the world’s energy by 2030, up from 1–2% today¹. We urgently need to find ways to make the AI infrastructure stack far more efficient.

This is why we couldn’t be more excited to announce that we are leading WEKA’s $135 million Series D round. We’re excited for the opportunity to partner with an exceptional team but also because now, more than ever, we need faster and more efficient models to build a more sustainable world.

Legacy data and storage architecture is not up to scratch for the most important workloads.

Compute, networking and storage are the three foundational building blocks underpinning every enterprise data centre. In the past 20 years, performance bottlenecks were associated primarily with compute and networking, so that’s where much of innovation has focused. However, next generation workloads like AI, ML and HPC can only move as fast as their weakest links. Today, these workloads are also powered by costly GPUs, which are underutilised up to 70% of the time, resulting in data scientists waiting hours and even days for a new training model to run. As great leaps continue to be made in accelerated compute and networking, great leaps also need to be made in storage.

From a sustainability perspective, this is a huge problem. Not only do underutilised GPUs consume enormous amounts of energy while they remain idle, but stalled AI and HPC deployments are slowing the pace of critical research and business innovation.

We urgently need to ‘green’ large-scale enterprise data — including AI, ML and HPC models.

Source: McKinsey & Co, The green IT revolution: A blueprint for CIOs to combat climate change, September 2022

Enterprise technology is estimated to be responsible for 350 to 400 megatons of carbon emissions each year — roughly equivalent to the entire annual emissions of the UK². Data creation and replication are expected to grow at greater than 20% annually through 2025, when there will be some 16 zettabytes of data in the world, up from less than 7 zettabytes in 2020.

For AI, ML, and HPC workloads, siloed applications, extensive data movement, and the need to oversize an environment to meet performance goals will drive even greater energy consumption and higher carbon emissions. It’s a vicious cycle, and if countries, research organisations and enterprises worldwide are going to meet their net-zero commitments, maintaining the status quo won’t cut it. Data infrastructures need to be re-architected and made green.

Deploying big data models will only get more complex as data environments evolve.

One of the best (and well-documented) ways to reduce data’s carbon footprint has been to shift workloads to the cloud where CO2e emissions intensity is more efficient (we’ve written about that previously here). But we don’t believe that the transition to the cloud will be immediate or total: organisations are going to have to operate in a hybrid world, leveraging a mix of cloud and on-premise data centres. As such, workloads and data will continue to sprawl across disparate systems and data will need to be ingested from multiple sources and protocol. Today, over 85% of IT organizations employ three or more different enterprise storage vendors³. Organisations need solutions that serve multiple data environments seamlessly.

CIOs, CTOs and all those who work in enterprise data centres face a dual challenge: the need to decarbonise while also running faster and more efficient large-scale models that use significant amounts of energy. This challenge is becoming increasingly complex given the need to manage ever more substantial amounts of data across increasingly complex hybrid environments that span both cloud and on-premise storage. What they need are products and tools that allow them to work seamlessly across clouds and between public and private clouds.

We believe WEKA is spearheading a green data revolution by streamlining data pipelines and providing significant performance and efficiency gains for next-generation workloads.

The WEKA team shares our belief that the data solutions needed for a more equitable and net-zero world need to run next-generation workloads efficiently, be cloud-native, work across multiple clouds and on-premises environments, and scale linearly to many petabytes of data.

The WEKA® Data Platform fits the bill. It is software-based and has been architected from the ground up for modern workloads like AI, ML and HPC. The platform can drive 10–100x performance and efficiency improvements. In doing so, it can significantly reduce the energy required to run these workloads and dramatically improve GPU and storage throughput. This means fewer chips and boxes are needed to run the same workloads faster, reducing the AI/ML stack’s embedded carbon and required energy use. Moreover, WEKA’s platform also enables its customers to transition to the cloud more seamlessly than before, with the ability to span across on- and off-premise storage.

Source: WEKA.io

WEKA is helping leading research organisations, government agencies and global enterprises — including eight of the Fortune 50 — to turn ideas into outcomes and unlock the power of AI-driven use cases. In their own words:

“The types of performance we were getting earlier for 1GB files, it would take about 319ms to write a particular 1GB file. With WEKA we saw the time drop by a factor of 40x. It was just an amazing increase in performance” — VP Technology, Atomwise
“We looked at our legacy architecture and instead of taking an evolutionary step and upgrading every component, we took the revolutionary approach. WEKA cost-effectively enables both the use of POSIV and object storage with performance and latency that is far superior to any other solution” — Chief Information Officer, Cerence
“We built a GPU farm, and we needed a high-speed data pipe to feed it. We evaluated open source solutions, HDFS, and the public cloud. We chose WEKA for its ability to provide cost-effective, high-bandwidth I/O to our GPUs, product maturity, customer references, and stellar on-demand support.” — Engineering Operations Lead, WeRide
“I’ve never seen anything like it (WEKA). I’ve never used storage where the numbers just sold the product by themselves” — Head of Technology, Preymaker

From a holistic, systems perspective, we are as excited about what WEKA can potentially unlock. The platform can enable the operationalisation of the AI-driven use cases needed for a safer, fairer world, from computer vision AI for more efficient manufacturing and chip design, to personalised medicine, self-driving electric vehicles and fundamental science research.

This is only the start of the green data revolution: we’re excited for what’s next.

We’ve followed the journey of WEKA CEO and co-founder Liran Zvibel and the rest of WEKA’s talented team for nearly two years and we couldn’t be more proud to be partnering with them for the company’s next phase of growth and beyond. At scale, WEKA can help organisations avoid millions of tons of carbon as data volumes continue to explode. Our preliminary life cycle analysis suggests that over the next 10 years, WEKA could save its customers tens of millions of tons of carbon emissions annually. We will have more exciting news to share here soon.

We’re only at the beginning of the green data revolution. The opportunities involved in shifting to a more efficient enterprise data stack are staggering. The Generation Growth Equity team is always excited to meet, share perspectives with and invest in growth-stage companies that are shaping our world for the better, particularly where doing so provides a sustainable competitive advantage. If this sounds like you, please reach out.

***

¹ Nature, How to stop data centres from gobbling up the world’s electricity, September 2018

² McKinsey & Co, The green IT revolution: A blueprint for CIOs to combat climate change, September 2022

³ IDC, Portfolio Survey, September 2020

Unlocking the Promise of Artificial Intelligence with Green Data Infrastructure

Written by Generation Investment Management