Memory Leak — #22

Published in

Memory Leak

4 min readApr 29, 2023

VC Astasia Myers’ perspectives on machine learning, cloud infrastructure, developer tools, open source, and security. Sign up here

🚀 Products

ElevenLabs’ Eleven Multilingual V1: New Speech Synthesis Model

Eleven Multilingual v1, an advanced speech synthesis model, now supports seven new languages: French, German, Hindi, Italian, Polish, Portuguese, and Spanish. Users can now generate speech in multiple languages using a single prompt while maintaining each speaker’s unique voice characteristics.

Why does this matter? Continuously enhancing models is critical in the AI model arms race. Making the model multi-language helps broaden scope and the market size. ElevenLabs targets content creators, game developers, publishers, educational institutions, and accessibility institutions.

Hugging Face Presents HuggingChat, Open Source Alternative to ChatGPT

HuggingChat is a new AI-powered chatbot available for testing on Hugging Face. HuggingChat is able to carry through many tasks that have made ChatGPT attract lot of interest recently, including drafting articles, solve coding problems, or answer questions. HuggingChat has 30 billion parameters and was developed on the latest LLaMa model. HuggingChat enforces a strict privacy model, whereby messages are only stored to display them to the user and are not even shared for research or training purposes.

Why does this matter? Every week we see new models come to market like Databricks Dolly, Vicuna, StableLM, and HuggingChat. Builders have their choice of options. The barrier to building a new model continues to lower so the speed new models are released is increasing. The commoditization of algorithms discussed a few years ago under the “data-centric” ML movement has extended to LLMs. Given the pressure on OpenAI to be more mindful of privacy, it is interesting to see HuggingChat take more precautions at launch.

Nvidia Releases a Toolkit to Make Text-Generating AI ‘Safer’

Nvidia released NeMo Guardrails, an open source toolkit aimed at making AI-powered apps more “accurate, appropriate, on topic and secure.” Guardrails includes code, examples and documentation to “add safety” to AI apps that generate text as well as speech. Specifically, Guardrails can be used to prevent — or at least attempt to prevent — models from veering off topic, responding with inaccurate information or toxic language and making connections to “unsafe” external sources.

Why does this matter? AI tools like ChatGPT work by predicting strings of words that it thinks best match a query. It can lack the reasoning to apply logic or consider any factual inconsistencies to generate a “hallucination.” Developers can add guardrails to control what kind of responses it gives. Some of these prevent it from spewing offensive diatribes, but others serve to stop it from taking leaps of logic or hallucinating fake historical facts. The ability to regulate the output is especially important for production environments where the information could be driving business decisions or processes. Importantly, prejudice outputs can have a negative impact on brand, reputation, and trust. We are a strong believer in AI safety like Nvidia Nemo and Guardails.ai.

📰 Content

Dbt State of Analytics Engineering

The report is based on a survey of 567 data practitioners worldwide, conducted between October-November 2022. Two interesting findings: 1) 46% of respondents plan to invest more in data quality and observability this year — the most popular area for future investment, and 2) 71% of respondents rated data team productivity and agility positively, while data ownership ranked as a top concern for most.

Why does this matter? As teams adopt the modern data stack, they fulfill the lower levels of the data hierarchy of needs. Now data quality vendors like Monte Carlo have become a higher priority. Teams also highlighted the lack of data contracts as one of the biggest problems still facing the modern data stack. Data contracts try to help improve data ownership and collaboration.

Building a Validity Scorecard in the Age of AI

Adobe-led Content Authenticity Initiative (CAI) is a consortium of around 1,000 members, including companies ranging from big-name media outlets and camera makers to tech startups, collectively working to craft a set of standards that they claim could guarantee the authenticity of digital content. Adobe first formed the group in 2019, but its mandate has become more urgent as new AI tools have made the technology more sophisticated and accessible in the past year or so.

Why does this matter? Deepfakes are becoming more realistic and can be used as a form of social manipulation. CAI is focused on a standard that it claims would certify the authenticity of a photo or video from the source, using “cryptographic asset hashing to provide verifiable, tamper-evident signatures,” which would then be amended to reflect any alterations. The group compares the standard to a “nutritional label” for digital content. Projects like the CAI have arisen as AI’s development has so far outpaced the ability of regulators to keep up with the fast-changing space.

APIs vs SDKs: Why You Should Always Have Both

This piece defines APIs and SDKs and highlights the difference between the two. It stresses that without an SDK, developers have to manually handle the API integration process. This is time-consuming and error-prone. SDKs offer several advantages over the APIs alone: faster development; standardized data and type definitions; consistency; breaking change mitigation; and streamlined documentation.

Why does this matter? API vendors have emerged over the past few years but now existing offerings are starting to offer APIs and SDKs to their customers. SDKs have emerged as a mechanism to improve developer experience. We are also starting to see SDKs for internal use cases.

💼 Jobs

⭐️Claypot — Founding Engineer (Infra)

⭐️Chroma — Member of Technical Staff — Distributed Systems

⭐️Speakeasy — Founding UX Lead

Memory Leak — #22

🚀 Products

📰 Content

💼 Jobs

Written by Astasia Myers