5 AI Open Source Repos

Open Source AI will dominate the future of AI research

C. L. Beard
OpenSourceScribes
3 min readFeb 29, 2024

--

Photo by Igor Omilaev on Unsplash

The open source community is driving remarkable innovations in artificial intelligence. Talented developers are creating and contributing impressive AI projects that push boundaries and unlock new possibilities. Several standout offerings focus on AI capabilities for natural language processing.

For example, Andrej Karpathy’s minbpe provides clean code for the essential byte pair encoding algorithm used in language model tokenization. Google open sourced Gemma, a flexible inference engine tailored for AI experimentation and research. For animating characters with deep learning, Sebastian Starke’s AI 4 Animation targets the Unity platform. Evil Socket’s specialized database server SUM promises lightning performance for machine learning applications, while their Ergo tool aims to simplify core AI development tasks.

With source code access, documentation, and community support, these open source AI projects enable exploration and customization that expands on their considerable baseline utilities. Their availability also promotes transparency and democratization of leading-edge AI.

From a former OpenAI researcher, Andrej Karpathy, comes minbpe

Minimal, clean code for the (byte-level) Byte Pair Encoding (BPE) algorithm commonly used in LLM tokenization. The BPE algorithm is “byte-level” because it runs on UTF-8 encoded strings.

This algorithm was popularized for LLMs by the GPT-2 paper and the associated GPT-2 code release from OpenAI. Sennrich et al. 2015 is cited as the original reference for the use of BPE in NLP applications. Today, all modern LLMs (e.g. GPT, Llama, Mistral) use this algorithm to train their tokenizers.

From Google we get Gemma

Modern LLM inference engines are sophisticated systems, often with bespoke capabilities extending beyond traditional neural network runtimes. With this comes opportunities for research and innovation through co-design of high level algorithms and low-level computation. However, there is a gap between deployment-oriented C++ inference runtimes, which are not designed for experimentation, and Python-centric ML research frameworks, which abstract away low-level computation through compilation.

Gemma.cpp targets experimentation and research use cases.

Sebastian Starke brings us AI 4 Animation

Focuses on utilizing deep learning for character animation and control, particularly within the Unity platform. Additionally, it facilitates matrix builds for simultaneous testing across multiple operating systems and versions of runtime, supporting a wide range of programming languages such as Node.js, Python, Java, Ruby, PHP, Go, Rust, .NET, and moreThe project also incorporates live logs for real-time workflow monitoring and a built-in secret store to enhance software development practices by codifying workflow files within the repository.

Evil Socket is developing SUM

The specialized database server designed for linear algebra and machine learning applications. It offers features such as data persistency, fast in-memory operators with multiple backends (currently supporting blas32 with CUDA support coming soon), and a scripting engine for easy access to these functionalities.

To install SUM, one can download the latest binary release, create the necessary certificates for authentication and encryption, and proceed with the installation steps provided on the repository page. Additionally, users can compile SUM from source, run tests, benchmarks, and install it as a systemd service. The repository provides detailed instructions on how to set up and run SUM nodes and masters for efficient database operations.

And Evil Socket is also developing Ergo

Which is a tool designed to simplify artificial intelligence (AI) processes. It aims to streamline AI-related tasks, making them more accessible and manageable. The repository provides a tool that assists in AI-related projects, potentially offering features that enhance the development and implementation of AI algorithms.

The vibrant community expanding open source AI delivers remarkable tools to empower innovation. Whether crafting optimized infrastructure like inference engines and databases or unlocking capabilities in animation, tokenization, and more, contributors across various specialties are advancing the AI field. Minbpe, Gemma, AI 4 Animation, SUM, and Ergo represent only a sample of the high-quality projects moving AI forward through open source collaboration. The accessibility, transparency, and standardization inherent to these offerings drive progress and growth.

Lower barriers make AI experimentation and customization more achievable for researchers and developers alike. Both inspiration and code for tackling AI changes can transfer into commercial applications as well. With AI poised to transform many industries, open source projects smooth paths for researchers to push boundaries and developers to drive practical progress. The commendable drive toward open access and democratization leads the way into an AI-integrated future.

--

--

C. L. Beard
OpenSourceScribes

I am a writer living on the Salish Sea. I also publish my own AI newsletter https://brainscriblr.beehiiv.com/, come check it out.