Mamba Complete Guide on Colab

Yogendra Sisodia
3 min readDec 30, 2023
Mamba v/s Transformers for Language Modeling
Mamba as an alternative for Transformers

Introduction

Researchers have unveiled an innovative model called Mamba, which aims to disrupt the current dominance of transformer-based architectures in the field of deep learning.

Their research reveals Mamba as a state-space model (SSM) that exhibits exceptional performance in different modalities, such as language, audio, and genetics. As an illustration, the researchers conducted language modeling experiments using the Mamba-3B model. This model surpassed other models based on Transformers that were of the same size, and it did equally well as Transformers models that were twice its size, both during pretraining and downstream evaluation.

Mamba’s uniqueness lies in its quick processing capability, selective SSM layer, and hardware-friendly design inspired by FlashAttention.
These features enable Mamba to outperform transformers as Mamba departs from traditional attention and MLP blocks.

Mamba: Linear-Time Sequence Modeling with Selective State Spaces
https://arxiv.org/pdf/2312.00752.pdf

Here are three features:
Selective SSMs: These allow Mamba to filter irrelevant information and focus on relevant and important data, enhancing its handling of sequences.

Hardware-aware Algorithm: Mamba uses a parallel algorithm that’s optimized for modern hardware, especially GPUs.

Simplified Architecture: By integrating selective SSMs and eliminating attention and MLP blocks, Mamba offers better scalability and performance.

In this article, we will walk through the code for the following:

Original Mamba : state-spaces/mamba-2.8b

This is original Mambafine-tuned on Pile Datset
Models are available from 130 million to 2.8 billion

mamba-chat : havenhq/mamba-chat

Mamba-Chat is the first chat language model based on Mamba and not a transformer.
Mamba-Chat is based on Mamba-2.8B and was fine-tuned on 16,000 samples of the ultrachat_200k dataset.

Walkthrough

Minimum Installation Libraries

!pip install causal-conv1d==1.0.0
!pip install mamba-ssm==1.0.1
## state-spaces/mamba-2.8b

import torch
import os
from transformers import AutoTokenizer
from mamba_ssm.models.mixer_seq_simple import MambaLMHeadModel
tokenizer = AutoTokenizer.from_pretrained("EleutherAI/gpt-neox-20b")
model = MambaLMHeadModel.from_pretrained(os.path.expanduser("state-spaces/mamba-2.8b"), device="cuda", dtype=torch.bfloat16)

tokens = tokenizer("What is the meaning of life", return_tensors="pt")
input_ids = tokens.input_ids.to(device="cuda")
max_length = input_ids.shape[1] + 80
fn = lambda: model.generate(
input_ids=input_ids, max_length=max_length, cg=True,
return_dict_in_generate=True, output_scores=True,
enable_timing=False, temperature=0.1, top_k=10, top_p=0.1,)
out = fn()
print(tokenizer.decode(out[0][0]))

### Output
## What is the meaning of life?" "What is my purpose?" "I have been asking myself that question ever since I first took the oath to defend and protect our land." "The answer is not easy." "Sometimes we are given a gift to serve and protect." "Sometimes we are given a gift to destroy." "Sometimes we are given a gift... to take what belongs to someone else." "You are all warriors."

## We are trying to load Tokenizer and Model
## Tokenizing Text and generating the answer
## More Models are given at :: https://huggingface.co/state-spaces

## havenhq/mamba-chat

import torch
from transformers import AutoTokenizer
from mamba_ssm.models.mixer_seq_simple import MambaLMHeadModel

device = "cuda"
tokenizer = AutoTokenizer.from_pretrained("havenhq/mamba-chat")
tokenizer.eos_token = "<|endoftext|>"
tokenizer.pad_token = tokenizer.eos_token
tokenizer.chat_template = AutoTokenizer.from_pretrained("HuggingFaceH4/zephyr-7b-beta").chat_template

model = MambaLMHeadModel.from_pretrained("havenhq/mamba-chat", device="cuda", dtype=torch.float16)


messages = []
user_message = """
What is the date for announcement
On August 10 said that its arm JSW Neo Energy has agreed to buy a portfolio of 1753 mega watt renewable energy generation capacity from Mytrah Energy India Pvt Ltd for Rs 10,530 crore.
"""

messages.append(dict(role="user",content=user_message))
input_ids = tokenizer.apply_chat_template(messages, return_tensors="pt", add_generation_prompt=True).to("cuda")
out = model.generate(input_ids=input_ids, max_length=2000, temperature=0.9, top_p=0.7, eos_token_id=tokenizer.eos_token_id)
decoded = tokenizer.batch_decode(out)
messages.append(dict(role="assistant",content=decoded[0].split("<|assistant|>\n")[-1]))
print("Model:", decoded[0].split("<|assistant|>\n")[-1])


### Output
## The announcement for the purchase of renewable energy generation capacity by JSW Neo Energy from Mytrah Energy India Pvt Ltd was made on August 10, 2019.<|endoftext|>

## We are trying to load Tokenizer and Model
## Tokenizing Text and generating the answer
## More Models are given at :: https://huggingface.co/state-spaces

Complete Colab Link:

YouTube Video:

--

--

Yogendra Sisodia

AI Leader. Sales-Tech, Mar-Tech and Legal-Tech Specialist, AI Leader at Conga, ex ContractWrangler, ex Netcore, ex Adroitbot, ex Cequity, ex Impetus