Kaggle Model Upload Made Easy

Published in

Google Cloud - Community

10 min readSep 7, 2024

How to upload your models on Kaggle without anything fancy

By enabling you to distribute your trained machine learning models to a larger audience, Kaggle models promote innovation, teamwork, and knowledge exchange. You can add to the collective collection of machine learning resources by uploading your model to Kaggle Models, which may help others with their projects and studies.

But hey! you might be thinking that this shall require toolsets beyond budget of my wallet. Well, looking at my account statements, I thought the same. Turns out you do not need a penny to get started!!

Uploading models as Datasets vs Kaggle Model upload

Why upload models as datasets?

No direct APIs: As of right now, Kaggle does not offer a direct API for uploading models to the Models section. You can share your model files with others under the Datasets section, which is the sole way to automate the upload. The upload for the “Models” area still needs to be done by hand.
Distributing Pretrained Models: Pretrained weights and configurations are essentially shared when a model is uploaded as a dataset. The model files (config.json, vocab.json, and safetensors.json) can be downloaded by users and loaded straight into their own applications.
Model Reusability in Kaggle Kernels: The model can be used in Kaggle notebooks (kernels) by you or by other Kaggle users when it has been posted as a dataset. Using Kaggle’s robust infrastructure for training or inference is one usage for this. Making it simple for others to use the model in their Kaggle projects.
Ease of Access: Users can download files or link datasets to their Kaggle notebooks with ease using Kaggle’s Datasets section. Working with models in a collaborative setting where notebooks and datasets are fully connected is made easier by this.
File Size: Since Kaggle’s Datasets section makes it simple to save and distribute larger files (up to 20 GB), some users upload their models as datasets.

Why upload models under Models section?

For structured machine learning models, versioning, tagging, and tracking metrics (such as accuracy, loss, etc.) are included in the Models section.
If you want people to be able to track updates, give input on various iterations, or actively engage with your model, the Models area is a better fit.

What is the final conclusion?

If you want to share the model with others so they may use it for additional purposes (such fine-tuning, loading for inference, or integrating into projects), publishing it to Kaggle as a dataset is still a viable and helpful method. Uploading manually to the Models area is a better option, though, if you’re seeking for structured model management with versions, model evaluation, and direct tracking of model usage.

You will need to manually upload the files through the Kaggle Models interface if you only want the model in the Models section for things like versioning and metrics.

Why should you contribute?

Cooperation: Make connections with other machine learning aficionados and data scientists who may offer suggestions, analysis, and possible enhancements.
Acknowledgment: Establish your reputation in the machine learning community by showcasing your knowledge. This is your chance to be Batman.
Community Contribution: By allowing others to use your models, you may help the field of machine learning grow.
Possibilities for Learning: Take advantage of the insightful conversations and feedback that your model generates.

We’ll walk you through the entire process of publishing your model to Kaggle Models in this blog post, so that the wider community may see your priceless work. We shall be using two methods to make our way through this:

Standalone — Directly uploading our models from our local environments/Kaggle notebooks.
Utilizing Huggingface — Using a middleman to achieve our goal.

Models Via Huggingface

Pre requisites

Google Colab
Huggingface account
Kaggle account
Brains

Process

For this example we shall be using Unsloth (https://github.com/unslothai/unsloth) and following the simple instructions provided to make our own model/chatbot.

Let us initialize our environment and get started with the installation. Installing the unsloth package and configuring it specifically for Google Colab setups is the first step. This package can optimize big language models. The installed version of PyTorch, a well-known deep learning framework, is then examined. It installs a known-to-be-compatible version of the xformers package if the version is older than 2.4.0. Lastly, it installs a number of additional packages that are probably connected to hardware acceleration, optimization, and transformers — another kind of deep learning architecture.

%%capture
# Installs Unsloth, Xformers (Flash Attention) and all other packages!
!pip install "unsloth[colab-new] @ git+https://github.com/unslothai/unsloth.git"

# We have to check which Torch version for Xformers (2.3 -> 0.0.27)
from torch import __version__; from packaging.version import Version as V
xformers = "xformers==0.0.27" if V(__version__) < V("2.4.0") else "xformers"
!pip install --no-deps {xformers} trl peft accelerate bitsandbytes triton

Using the FastLanguageModel class from the unsloth library, we now load “Meta-Llama-3.1–8B” large language model (LLM) from the Hugging Face library. Additionally, a tokenizer is built up to translate text into a format that the model can comprehend. By selecting the right data type and utilizing 4-bit quantization, the code maximizes memory utilization. In the end, it retrieves the tokenizer and the model for later usage.

from unsloth import FastLanguageModel
import torch
max_seq_length = 2048 # Choose any! We auto support RoPE Scaling internally!
dtype = None # None for auto detection. Float16 for Tesla T4, V100, Bfloat16 for Ampere+
load_in_4bit = True # Use 4bit quantization to reduce memory usage. Can be False.
# 4bit pre quantized models we support for 4x faster downloading + no OOMs.
fourbit_models = [
    "unsloth/mistral-7b-v0.3-bnb-4bit",      # New Mistral v3 2x faster!
    "unsloth/mistral-7b-instruct-v0.3-bnb-4bit",
    "unsloth/llama-3-8b-bnb-4bit",           # Llama-3 15 trillion tokens model 2x faster!
    "unsloth/llama-3-8b-Instruct-bnb-4bit",
    "unsloth/llama-3-70b-bnb-4bit",
    "unsloth/Phi-3-mini-4k-instruct",        # Phi-3 2x faster!
    "unsloth/Phi-3-medium-4k-instruct",
    "unsloth/mistral-7b-bnb-4bit",
    "unsloth/gemma-7b-bnb-4bit",             # Gemma 2.2x faster!
] # More models at https://huggingface.co/unsloth

model, tokenizer = FastLanguageModel.from_pretrained(
    model_name = "unsloth/Meta-Llama-3.1-8B",
    max_seq_length = max_seq_length,
    dtype = dtype,
    load_in_4bit = load_in_4bit,
    # token = "hf_...", # use one if using gated models like meta-llama/Llama-2-7b-hf
)

Applying the PEFT (Parameter-Efficient Fine-Tuning) technique to initialize a fine-tuned language model next. Using an existing language model as input, we minimize the amount of trainable parameters while adapting the model to a new job by applying LoRA (Low-Rank Adaptation) to certain layers (q_proj, k_proj, v_proj, o_proj, gate_proj, up_proj, down_proj). Hyperparameters including the bias setting, dropout rate, and rank of the LoRA matrices are specified by the code. In addition, we set the random state for reproducibility and permit gradient checkpointing for memory efficiency.

model = FastLanguageModel.get_peft_model(
    model,
    r = 16, # Choose any number > 0 ! Suggested 8, 16, 32, 64, 128
    target_modules = ["q_proj", "k_proj", "v_proj", "o_proj",
                      "gate_proj", "up_proj", "down_proj",],
    lora_alpha = 16,
    lora_dropout = 0, # Supports any, but = 0 is optimized
    bias = "none",    # Supports any, but = "none" is optimized
    # [NEW] "unsloth" uses 30% less VRAM, fits 2x larger batch sizes!
    use_gradient_checkpointing = "unsloth", # True or "unsloth" for very long context
    random_state = 3407,
    use_rslora = False,  # We support rank stabilized LoRA
    loftq_config = None, # And LoftQ
)

Its time to upload some data, we won’t dive deep with the explanation of a simple dataset import from Google Drive. Here’s the code snippet:

Onto the next step, the following code helps us to be iterating over question-answer pairs, alpaca_prompt formats them into a particular prompt template and appends an End-of-Sequence (EOS) token (EOS_TOKEN) to indicate that the desired response has ended. Lastly, it batches (batched=True) the formatting function (formatting_prompts_func) over the whole dataset. This gets the data ready for training the LLM to produce suitable answers in response to cues.

import pandas as pd
from datasets import load_dataset
from datasets import Dataset

alpaca_prompt = """Below is an instruction that describes a task, paired with an input that provides further context. Write a response that appropriately completes the request.

### Input:
{}

### Response:
{}"""

EOS_TOKEN = tokenizer.eos_token # Must add EOS_TOKEN
def formatting_prompts_func(examples):
    inputs       = examples["question"]
    outputs      = examples["answer"]
    texts = []
    for input, output in zip(inputs, outputs):
        # Must add EOS_TOKEN, otherwise your generation will go on forever!
        text = alpaca_prompt.format(input, output) + EOS_TOKEN
        texts.append(text)
    #print(texts)
    return { "text" : texts, }
pass

dataset = dataset.map(formatting_prompts_func, batched = True,)

Printing our dataset:

We proceed with initializing an instance of SFTTrainer to begin optimizing the language model. The model and tokenizer are loaded using the trl library and transformers, and training settings including batch size, gradient accumulation steps, learning rate, and optimizer specified. Additionally, depending on hardware support, the algorithm decides whether to use mixed precision training using fp16 or bf16 so we don’t have to worry about that no more. Certain training factors, such the quantity of training steps, frequency of logging, and output directory, are defined with the trainer.

from trl import SFTTrainer
from transformers import TrainingArguments
from unsloth import is_bfloat16_supported

trainer = SFTTrainer(
    model = model,
    tokenizer = tokenizer,
    train_dataset = dataset,
    dataset_text_field = "text",
    max_seq_length = max_seq_length,
    dataset_num_proc = 2,
    packing = False, # Can make training 5x faster for short sequences.
    args = TrainingArguments(
        per_device_train_batch_size = 2,
        gradient_accumulation_steps = 4,
        warmup_steps = 5,
        max_steps = 20,
        learning_rate = 2e-4,
        fp16 = not is_bfloat16_supported(),
        bf16 = is_bfloat16_supported(),
        logging_steps = 1,
        optim = "adamw_8bit",
        weight_decay = 0.01,
        lr_scheduler_type = "linear",
        seed = 3407,
        output_dir = "outputs",
    ),
)

FastLanguageModel.for_inference(model) # Enable native 2x faster inference

Now finally, we load a pre-trained language model called “lora_model” into memory and import the FastLanguageModel class from the unsloth library. Particular parameters for quantization, data type, and sequence length are set up in the model. The model is sped up via inference optimization after loading. Next, we use a tokenizer to prepare an input prompt, which is then fed into the model to be generated. The tokenizer decodes the response that the model creates, returning text in the process.

if False:
    from unsloth import FastLanguageModel
    model, tokenizer = FastLanguageModel.from_pretrained(
        model_name = "lora_model", # YOUR MODEL YOU USED FOR TRAINING
        max_seq_length = max_seq_length,
        dtype = dtype,
        load_in_4bit = load_in_4bit,
    )
    FastLanguageModel.for_inference(model) # Enable native 2x faster inference

# alpaca_prompt = You MUST copy from above!

inputs = tokenizer(
[
    alpaca_prompt.format(
        "Explain the concept of inertia with an example", # instruction
        "Explain the concept of inertia with an example", # input
        "", # output - leave this blank for generation!
    )
], return_tensors = "pt").to("cuda")

outputs = model.generate(**inputs, max_new_tokens = 64, use_cache = True)
tokenizer.batch_decode(outputs)

Now that we are done with our creative process. Its time to upload.

Uploading our model on to Huggingface

Make sure you note down the very intricate code below:

model.push_to_hub_merged("os_lora_chat_model", tokenizer, save_method = "merged_16bit", token = "<your-token>")

Phew! that was a huge chunk (sarcastically).

Uploading to Kaggle Models

Uploading as a dataset

The following code provides a vivid instruction on how to upload the model as a dataset from Huggingface without the need to download the files and making it heavy on yourself and your computer.

Start off with generating a Kaggle token and uploading the same on Colab files:

You can do this by clicking on your Profile icon > Settings > API > Create New Token — This shall give you a new JSON token.

# Step 1: Install necessary libraries
!pip install -q kaggle transformers

# Step 2: Load and Save the Model Using Optimizations
import torch
from transformers import AutoModel, AutoTokenizer

# Define model name and directory
model_name = "<your huggingface model ID>"
save_directory = "./huggingface_model"

# Use CPU and half-precision to save memory
device = torch.device('cpu')

# Load the model and tokenizer
model = AutoModel.from_pretrained(model_name, torch_dtype=torch.float16).to(device)
tokenizer = AutoTokenizer.from_pretrained(model_name)

# Save the model and tokenizer
model.save_pretrained(save_directory)
tokenizer.save_pretrained(save_directory)

# Step 3: Authenticate with Kaggle: Make sure to store your Kaggle token on Colab files
from google.colab import files

# Upload your kaggle.json file here
files.upload()

# Move it to the correct directory
!mkdir -p ~/.kaggle
!mv kaggle.json ~/.kaggle/
!chmod 600 ~/.kaggle/kaggle.json

# Step 4: Create and Upload a Dataset on Kaggle

# Create a dataset metadata file
import os

dataset_metadata = {
    "title": "Amazing LoRA Chat Model",
    "id": "<your huggingface model ID>",  # Ensure that the ID uses only lowercase letters, numbers, and hyphens
    "licenses": [{"name": "CC0-1.0"}]
}

# Create the directory for the metadata file
os.makedirs(save_directory, exist_ok=True)
# Write the metadata to a JSON file
with open(os.path.join(save_directory, "dataset-metadata.json"), "w") as f:
    import json
    json.dump(dataset_metadata, f, indent=4)

# Upload the dataset to Kaggle
!kaggle datasets create -p ./huggingface_model

# Step 5: Verify the Upload

# List your datasets to verify (no '-u' flag is needed here)
!kaggle datasets list --mine

VOILA!! you just published your dataset!!

Uploading as a Model

We shall utilize KaggleHub for this operation. Starting off by initializing the kaggle credentials

Following the instructions provided by Kagglehub starting guide. Feel free to change the framework to your need and customize according to your requirements.