State of LLaMA 2023/Q1
Here’s a mind map for AI/ML ChatGPT (and Rust 🦀) related to the current state (and counting) for my own use.
Published in
4 min readApr 9, 2023
- 🐍 llama: Open and Efficient Foundation Language Models.
- 🐍 LLaMA_MPS: Run LLaMA (and Stanford-Alpaca) inference on Apple Silicon GPUs.
- 🐇 llama.cpp: Inference of LLaMA model in pure C/C++.
- 🐇 alpaca.cpp: This combines the LLaMA foundation model with an open reproduction of Stanford Alpaca a fine-tuning of the base model to obey instructions (akin to the RLHF used to train ChatGPT) and a set of modifications to llama.cpp to add a chat interface.
- 🦀 llama-rs: Do the LLaMA thing, but now in Rust 🦀🚀🦙
- 🐍 alpaca: Stanford Alpaca: An Instruction-following LLaMA Model
- 🐍 codealpaca: An Instruction-following LLaMA Model trained on code generation instructions.
- 🐍 alpaca-lora: Low-Rank LLaMA Instruct-Tuning
// train 1hr/RTX 4090
- 🐥 llama-node: Node.js client library for llama LLM built on top of llama-rs. It uses
napi-rs
as Node.js and native communications. - 🦀 RLLaMA: Rust+OpenCL+AVX2 implementation of LLaMA inference code.
- 🐍 Dolly: This fine-tunes the GPT-J 6B model on the Alpaca dataset using a Databricks notebook.
- 🐍 Flan-Alpaca: Instruction Tuning from Humans and Machines.
- 🐇 bloomz.cpp: Inference of HuggingFace’s BLOOM-like models in pure C/C++ built on top of the amazing
llama.cpp
. - 🐍 BLOOM-LoRA: Low-Rank LLaMA Instruct-Tuning.
- 🐍 RWKV-LM: RWKV is an RNN with transformer-level LLM performance. It can be directly trained like a GPT (parallelizable). So it’s combining the best of RNN and transformer — great performance, fast inference, saves VRAM, fast training, “infinite” ctx_len, and free sentence embedding.
- 🦀 smolrsrwkv: A very basic example of the RWKV approach to language models written in Rust by someone that knows basically nothing about math or neural networks.
- 🐍 gpt4all-lora: A chatbot trained on a massive collection of clean assistant data including code, stories, and dialogue.
- 🐍 Lit-LLaMA: Independent implementation of LLaMA that is fully open source under the Apache 2.0 license. This implementation builds on nanoGPT.
// The finetuning requires a GPU with 40 GB memory (A100). Coming soon: LoRA + quantization for training on a consumer-grade GPU!
- 🐇 rwkv.cpp: a port of BlinkDL/RWKV-LM to ggerganov/ggml. The end goal is to allow 4-bit quantized inference on the CPU.
// WIP
- 🐍 LLaMA-Adapter: LLaMA-Adapter: Efficient Fine-tuning of Language Models with Zero-init Attention. Using 52K self-instruct demonstrations, LLaMA-Adapter only introduces 1.2M learnable parameters upon the frozen LLaMA 7B model.
// 1 hour for fine-tuning on 8 A100 GPUs.
- 🐍 vicuna: An Open-Source Chatbot Impressing GPT-4 with 90% ChatGPT Quality.
- 🐍 koala: a chatbot trained by fine-tuning Meta’s LLaMA on dialogue data gathered from the web.
Tools
- 🐍 langchain: Building applications with LLMs through composability.
- 🐥 langchainjs: langchain in js.
- 🐥 langchain-alpaca: Run alpaca LLM fully locally in langchain.
- 🐇 whisper.cpp: High-performance inference of OpenAI’s Whisper automatic speech recognition (ASR) model.
- 🐍 whisper-small: Whisper is a pre-trained model for automatic speech recognition (ASR) and speech translation. Trained on 680k hours of labelled data, Whisper models demonstrate a strong ability to generalise to many datasets and domains without the need for fine-tuning.
- 🐇 talk: Talk with Artificial Intelligence in your terminal.
- 🐍 chatgpt-retrieval-plugin: The ChatGPT Retrieval Plugin lets you easily search and find personal or work documents by asking questions in everyday language.
- 🐍 llama-retrieval-plugin: LLaMa retrieval plugin script using OpenAI’s retrieval plugin
- 🦀 llm-chain: prompt templates and chaining together prompts in multi-step chains, summarizing lengthy texts or performing advanced data processing tasks.
- 🐍 petals: Run 100B+ language models at home, BitTorrent-style. Fine-tuning and inference up to 10x faster than offloading.
Demo
- 🤗 Raven-RWKV-7B:
7B
, Raven is RWKV 7B 100% RNN RWKV-LM finetuned to follow instructions. - 🤗 ChatRWKV-gradio:
14B
, RWKV-4-Pile-14B-20230313-ctx8192-test1050 - 🤗 Code Alpaca:
13B
, The Code Alpaca models are fine-tuned from a 7B and 13B LLaMA model on 20K instruction-following data generated by the techniques in the Self-Instruct [1] paper, with some modifications that we discuss in the next section. Evals are still a todo. - 🤗 Alpaca-LoRA-Serve:
7B
, Instruction fine-tuned version of LLaMA from Meta AI. Alpaca-LoRA is Low-Rank LLaMA Instruct-Tuning which is inspired by Stanford Alpaca project. This demo application currently runs 7B version on a T4 instance. - 🤗 LLaMA-Adapter:
7B
+1.2M
, The official demo for LLaMA-Adapter: Efficient Fine-tuning of Language Models with Zero-init Attention. - 🤖 Alpaca-LoRA Playground:
30B
, Alpaca-LoRA which is an instruction fine-tuned version of LLaMA. This demo currently runs 30B version on a 3*A6000 instance at Jarvislabs.ai. - 🤖 Koala:
13B
a chatbot fine-tuned from LLaMA on user-shared conversations and open-source datasets. This one performs similarly to Vicuna.
I will continue to map random knowledge I found especially about Rust related here.