Memory Leak — #7
VC Astasia Myers’ perspectives on machine learning, cloud infrastructure, developer tools, open source, and security. Sign up here.
InfluxDB IOx is a cloud-native, real-time, columnar database optimized for time series data built in Rust on top of Apache Arrow and DataFusion. The team also deployed their next-generation storage engine that’s built on InfluxDB IOx in their InfluxDB Cloud platform. InfluxDB Cloud customers using the new storage engine have their cardinality limits minimized. Users can write any kind of event data with infinite cardinality and slice-and-dice data on any dimension without sacrificing performance. This opens up use cases such as events, tracing, and all sorts of ephemeral unbounded cardinality data.
Why does this matter? This announcement represents the largest leap forward for InfluxDB since they introduced their TSM storage engine in 2016. The new storage engine represents the next phase of InfluxDB that brings metric data and event data time series into a single database core, giving users the ability to create time series on the fly from raw, high-precision event data.
Snaplet and Supabase’s Postgres WASM
Postgres-wasm is a PostgreSQL server that runs inside a browser. It provides a full suite of features, including persisting state to browser, restoring from pg_dump, and logical replication from a remote database.
Why does this matter? WebAssembly (WASM) is a binary instruction format for a stack-based virtual machine. WASM is designed as a portable compilation target for programming languages, enabling deployment on the web for client and server applications. The use of WASM is a huge trend right now for browser-based services. We are also seeing other execution engines like DuckDB for analytical workloads be able to run in the browser with DuckDB-wasm. Running processes in the browser can accelerate queries and provide a better end user experience.
Whisper is an automatic speech recognition (ASR) system trained on 680,000 hours of multilingual and multitask supervised data collected from the web. It enables transcription in multiple languages, as well as translation from those languages into English. The Whisper architecture is a simple end-to-end approach, implemented as an encoder-decoder Transformer.
Why does this matter? OpenAI is the process of trying to commercialize their research so Whisper being free and open source stands out. We expect models to continue to be open sourced to the community like Stability.ai’s Stable Diffusion, Meta’s OPT-175B, and HuggingFace’s BLOOM.
This piece by CEO of DX Abi Noda highlights 10 different types of technical debt including code quality, testing, coupling, among others. He also underscores signs that technical debt is a bottleneck like value lead time and impact to end users. There are a few ways to address technical debt including clear end to end ownership and empowering teams to address the issues as part of their natural workflow.
Why does this matter? Engineering teams must strike the balance between new development and technical debt. Technical debt can significantly limit engineer productivity and worsen end user experience. With the rise of large language models, we hypothesize that new tools will emerge to help solve challenges around technical debt.
DevOps Is Dead. Embrace Platform Engineering
While an eye-catching title, this piece highlights the emerging field of platform engineering. The post emphasizes that developers are tired of dealing with operational issues, such as provisioning resources to run an app. This has been the focus of DevOps, but platform engineering actually works to remove the developer burden as much as possible.
Why does this matter? Over 100,000 people clicked on Humanitec’s post to find out what killed DevOps. It was The New Stack’s most popular post for September. Clearly, there is increasing momentum around platform engineering. Additional evidence from HashiCorp here.
An Engineer’s Guide to Data Contracts
Chad Sanderson and Adrian Kreuziger from Convoy lay out the technical aspects of a data contract. Data Contracts are API-based between software engineers who own services and data consumers that understand how the business works in order to generate well-modeled, high-quality, trusted, data. It is a three-part series that discusses: 1) entities, 2) application events, and 3) semantics.
Why does this matter? The topic of data contracts picked up steam this summer when Chad and Monte Carlo began writing posts about the subject. The topic reflects cloud data warehouses becoming a source of truth for many data types including data generated by software engineers. It speaks to the challenges of how upstream decisions around data collection and schema changes can have resounding impact on uses cases downstream like BI, data apps, and ML that software engineers may not have visibility into.
⭐️Dragonfly — Developer Advocate (fully remote)
⭐️Diagrid — Developer Advocate (fully remote)
⭐️Omni — Full Stack Engineer (fully remote)