From Mobile Development to Training Large Language Models (LLMs): A Developer’s Journey.

15 min readApr 5, 2024

When you work in any tech area, getting newer updates in the technical world before the rest of the humans is, hmm…, I can’t say highly likely because it automatically happens. Well, that is what happened to me.

As a mobile developer, jumping to new things is more like behavior rather than a hobby. So, when everyone around me was talking about AI, GPTs, modals, and weird things., what do you think happened? Yes, that happened again; I jumped.

At first, there were only head-scratchings and starings on the screens. And what happened next is in this article, in a shorter form, along with some basic examples of how to work with basic LLMs.

Here we go!

Step 1: Understanding the Basics

In life, It is always better to know what you are dealing with before you start dealing with it. So, keeping that in mind, I spent time studying the backbone of LLMs, the Transformer architectures.

These architectures play a major role in natural language processing and were introduced in the paper “Attention is All You Need” by Vaswani et al. in mid-2017. This was a huge step up in how models process sequences, using a process called self-attention to weigh the importance of different parts of the input data.

Unlike Recurrent Neural Networks (RNNs) and Long Short-Term Memory Networks (LSTMs), which process data one after another and struggle with long-range dependencies, transformers process entire sequences simultaneously! This parallel processing allows transformers to quickly capture complex relationships in data, making them particularly adept at tasks like language translation and summarization in Sonic speed. Sonic is next level, right? :)

This section is for breaking down the key concepts of transformer architecture into more eatable cookies.

Attention Mechanism

The thing behind the transformer model is the attention mechanism.

This allows the model to weigh the importance of words in a sentence relative to each other. Yes, sounds a bit expensive, right?

Here is something that I learned.

This attention mechanism works by calculating a weighted sum of input vectors (numerical representations of the input sequences), where the weights show the importance of each input vector. This is done through a process called scaled dot-product attention, which involves three main components: queries, keys, and values.

Each word in the input sequence is associated with a query, key, and value vector.

The attention scores are calculated by getting the dot product of the query and key vectors, which are then measured by the square root of the dimension of the key vectors to prevent large dot products from having power over the softmax function.

These scores are passed through a softmax function to get the attention weights, which are used to calculate a weighted sum of the value vectors. This weighted sum represents the output of the attention mechanism for each word in the sequence, capturing the contextual information of the input sequence in a way that is sensitive to the relative positions of the words.

This is very important to understanding context and generating text that is not gibberish.

Think like this..,

The sentence: “The bank of the river is filled with money” is clear for a human that “bank” refers to the land alongside a river, not a ‘Bank’ bank, where we put our money. The transformer model uses attention to determine the context in which “bank” is used by considering all other words in the sentence.

Self-Attention

Self-attention is a specific attention mechanism that helps transformers process anything that is made with letters. It allows each word in the input sentence to attend to all other words to better understand the context and relationships between words.

Go to a memory where you’re reading a book (Or start reading one now). Remember you come across a word that makes you think about something else?

Something like: When you’re reading about Batman and you see the word “Batmobile,” all of a sudden, you are thinking about the Batmobile’s engine sound when it's going 200mph to beat Superman.

That’s what self-attention does with words in a sentence. It helps the computer to understand what each word means by looking at all the other words around it. This way, the computer can figure out what’s important and what’s not in the sentence, just like how you understand a story better when you pay attention to different parts of it.

Positional Encoding

Since transformers do not crunch data in sequence, they use positional encoding to understand the order of words in a sentence. This makes sure that the model doesn’t just understand the context but also knows how a word's position changes its meaning.

Step 2: Exploring Pre-trained Models

While studying and after getting reasonable knowledge of how LLMs work, It became clear to me that, instead of training an LLM on my own, seeing a pre-trained one in action is better for my exhausted motivation. It was good, believe me.

Why? Well, for starters, saving money and time and not needing Mr. Starks computer was enough for me.

These Pre-trained LLMs are machine learning models that have been trained on large datasets and pre-configured to perform specific tasks, such as image classification, natural language understanding, or speech recognition.

They are trained using vast amounts of data and computational resources, typically by organizations or researchers with significant expertise in machine learning.

The one I used, the Hugging Face Transformers library, is a gold pass for anyone interested in leveraging the power of LLMs, like GPT (Generative Pre-trained Transformer). With the superb training and Billions of tokens these models have been exposed to, they can perform a wide array of tasks.

Let’s go to meet an LLM with Python(not the long and scary one).

Setting Up Your Environment

To work with pre-trained models in Python, in machine learning or deep learning especially, you must set up your Python environment on your computer. This section will walk you through the basic steps to prepare your machine, install packages that need, and finally, see a pre-trained model in action. Please keep in mind that there could be unexpected errors when following this, depending on your setup. But it's just a thing in coding, right? So, if you face something like this, Just search through the internet. I'm also there ;)

After this section, you’ll have the correct version of Python installed, basic understanding of using pip to manage Python packages and then installing libraries such as Hugging Face’s transformers for accessing pre-trained models.

Baby-Step 1: Installing Python

First, make sure that Python is installed on your system. Python 3.6 or newer is recommended to ensure smooth integration with most machine-learning libraries. This is how to check your Python version and install Python if necessary:

Check Existing Python Version:
Open a terminal (Command Prompt on Windows, Terminal app on macOS and Linux/Unix) and type:

python --version

python3 --version

If all is good, you’ll see a version number starting with 3.6 or higher. If Python is not installed or an older version is installed, follow the next steps to install it.

Download Python:

For Windows and macOS: Visit the official Python website and download the installer for the latest version of Python. Run the installer and follow the instructions. Make sure that you select the option to “Add Python to PATH” during installation on Windows.

For Linux/Unix: Most Linux distributions come with Python pre-installed. Use your distribution's package manager if you need to install or upgrade it. For example, on Ubuntu, you can install Python 3 using apt:

sudo apt update
sudo apt install python3

Baby-Step 2: Installing pip

‘pip’ is the package installer for Python. It allows you to install and manage libraries and dependencies not distributed as part of the standard library. Python 3.4 and later include pip by default, so if you’ve installed Python from the official site or via a package manager on Linux, you most likely already have pip.

Check if pip is installed:

pip --version

or, for Python 3 specifically:


python3 -m pip --version
Install or Upgrade pip:

If pip is not installed or you need to upgrade to the latest version, you can do that by using the following command:

python3 -m pip install — upgrade pip

Heck yeah! Your Python environment is ready to rock.

Believe it or not, now you have the almost complete setup to access the pre-trained models. With Python and pip set up, you can install libraries that provide access to these models. Exciting, right? There are just a few more tiny things.

To see these models in action, you need a Python environment where you can run scripts. This could be a simple text editor with a terminal or an Integrated Development Environment (IDE) like PyCharm, Visual Studio Code, or Jupyter Notebook.

This guide will show you how to use Visual Studio Code(The one I’m using for this journey), a powerful and highly customizable editor that will make your time more valuable.
Note: By starting VS Code in a folder, that folder becomes your “workspace”.

Baby-Step 3: Install the ‘transformers’ Library and run the Script

You can use Hugging Face transformers library for this practice, which offers various pre-trained models for Natural Language Processing (NLP) tasks. Here is how,

pip install transformers

This command will get the latest version(4.38.2) of the transformers library and install it and its dependencies.

Note: If you are familiar with Python, you might have heard about Python virtual environment. If you are not, using virtual environments will make your life much easier. In either case, here is the Guide to installing Hugging Face Transformers in a virtual environment.

After installing this library, you’ll need to install these libraries to proceed.

brew install sentencepiece

brew install cmake  

brew install pkg-config

brew install protobuf   

pip install 'transformers[tf-cpu]'

python3 -m pip install numpy

When Installing ‘transformers’ Library, You might face errors such as ‘ModuleNotFoundError’. Just install missing libraries and you will be good to go. Always use latest versions and official documentations for the installations.

With the transformers library installed, you can use pre-trained models. The following Python script will generate text using the GPT-2 model, one of the pre-trained models available through Hugging Face.

Create a new file within your workspace, give it a name followed by a ‘.py’ extension, and paste this code.

from transformers import pipeline

# Load a pre-trained GPT model
generator = pipeline('text-generation', model='gpt2')

# Generate text
input_text = "As a mobile developer, the future of mobile apps is"
generated_text = generator(input_text, max_length=50, num_return_sequences=1)
print(generated_text[0]['generated_text'])

Understanding the Code

from transformers import pipeline:

This line imports the ‘pipeline’ function from the ‘transformers’ library to your script. The pipeline is a high-level utility that makes working with pre-trained models much easier.

generator = pipeline('text-generation', model='gpt2')

This line creates a text generation pipeline using the GPT-2 model, initializing the model and making it ready for use.

input_text = "As a mobile developer, the future of mobile apps is"

The ‘input_text’ variable is defined with a prompt that the model will use as a starting point for text generation.

generated_text = generator(input_text, max_length=50, num_return_sequences=1)

This line generates text based on the input_text. The max_length parameter controls the maximum length of the generated text, and num_return_sequences determines how many different sequences will be generated.

print(generated_text[0]['generated_text'])

Finally, this prints the generated text to the console.

Run and Behold!

To run this code, save it as a .py file. if you’re using a text editor and terminal, Navigate to your script, open the console and run it using:

python your_script_name.py

Replace your_script_name.py with the name of your Python script.

If you’re using an IDE, there should be an option to run the script directly within the environment. Like the ▶️ button in Visual Studio Code, that is available on every ‘.py’ file.

Now, with everything, we can test our first large language model (LLM) project. This moment isn’t just a test, trust me.

After running the above script, you should see something like this in your terminal.

/Users/asiripiyajanaka/Documents/Workspace/py-transformar/.venv/bin/python /Users/asiripiyajan
aka/Documents/Workspace/py-transformar/transformer_gpt2.py
model. safetensors: 100%
548M/548M [03:59<00:00, 2.29MB/s]
generation_config-json: 100%|
124/124 [00:00<00:00, 1.61MB/s]
tokenizer_config.json: 100%|
26.0/26.0 [00:00<00:00, 260kB/s]
vocab. json: 100%
1.04M/1.04M [00:00<00:00, 1.26MB/s]
merges. txt: 100%
456k/456k [00:00<00:00, 924kB/s]
tokenizer. json: 100% |
• 1.36M/1.36M [00:01<00:00, 1. 15MB/s]
Truncation was not explicitly activated but max_ length is provided a specific value, please use truncation=True to explicitly truncate examples to max length. Defaulti no to 'longest first' truncation strategy. If you encode pairs of sequences (GLUE-style) with the tokenizer you can select this strategy more precisely by providing a spec ific strategy to 'truncation'
Cattina
'enc tokon id'.SODE§ for anon_and goneratior
As a mobile developer, the future of mobile apps is largely driven by mobile device needs. However, as the smartphone technology becomes more and more popular, it is often difficult for developers to create robust and effective mobile apps.
For example, mobile code editors

Well, Cheers to our code! We just used a pre-trained modal and generated a text from it. Here’s an overview of what happens behind the scenes.

Initialization of the Pipeline:

The script imported serves as a high-level interface for deploying models for text generation. By coding 'text-generation' and the model ('gpt2'), the script requests the creation of a pipeline optimized for text generation using the pre-trained GPT-2 model.

Model and Tokenizer Downloading:

If this is the first time you’re using the GPT-2 model, the transformers library will automatically download the model and its tokenizer from the Hugging Face model repository. This has all the necessary configuration files, the model's weights, and the tokenizer's vocabularies. The model needs these components to understand the input text and generate meaningful outputs.

Text Generation Process:

The script then proceeds to the text generation with the model and tokenizer. The input text provided ("As a mobile developer, the future of mobile apps is") serves as a prompt to the GPT-2 model.

The tokenizer first converts this prompt into a series of tokens (numerical representations) that the model can understand. Then, the model uses these tokens as a base to generate new text, predicting the next token in the sequence iteratively until it reaches the specified max_length of 50 tokens or stop tokens.

During this process, the model considers multiple possible continuations at each step and selects them based on probabilities learned during its training phase. The num_return_sequences=1 parameter makes sure that only one sequence of generated text should be returned.

Output Processing:

Once the generation is complete, the output tokens are converted back into human-readable text using the same tokenizer that encodes the input text.

The script finally prints the generated text to the console. This text continues the provided prompt, which the model expanded based on its training.

Step 3: Training Large Language Models (LLMs)

Training an LLM from scratch requires some high-end machines and data, like Mr. Starks' machine. So, I chose to fine-tune a pre-trained model to some specific needs. This approach is more practical for individual developers or small teams.

I identified a dataset that matches my application’s domain and began fine-tuning it. The idea was to improve the model’s ability to generate text that would be useful for my task. For example, fine-tuning the model on health and fitness-related text for a fitness app to provide tips.

Finding and Preparing a Dataset

The first crucial step in this process is identifying a dataset that matches the application’s domain. For example, if you’re developing a fitness app, you’d want a dataset rich in health, fitness, nutrition, and workout-related content.

Public Dataset Repositories: Websites like Kaggle, Google Dataset Search, and UCI Machine Learning Repository are super collections of datasets for various domains.
Scraping Data: Sometimes, you might not find the best dataset ready-made. Scraping data from websites or using APIs from related services can be valuable in such cases. Respect copyright laws and terms of service.
Generating Synthetic Data: For certain tasks, synthetic data generated to mimic the real world can be useful, especially when dealing with sensitive information concerning privacy.

Once you have your dataset, the next step is to prepare it for training. This involves cleaning the data (removing irrelevant parts, handling missing values) and formatting it to be compatible with the model.

Here is a cool guide for data preparation

Fine-Tuning the Model

Fine-tuning a model means adjusting a pre-trained model to perform a specific task, such as generating text relevant to your app’s domain.

Resources: Fine-tuning LLMs is computationally expensive. Access to GPUs or cloud computing resources (e.g., Google Colab, AWS) can speed up the process.
Monitoring Performance: Keep an eye on the loss during training. If it’s not decreasing, you may need to adjust the learning rate and batch size or try a different optimization strategy.
Experimentation: Fine-tuning involves a lot of trial and error. Don’t hesitate to experiment with different hyperparameters, datasets, and model architectures.

Here is a sample Python code to showcase the training of an LLM with data preparation steps.

from transformers import GPT2Tokenizer, GPT2LMHeadModel, AdamW, get_linear_schedule_with_warmup
import torch
from torch.utils.data import Dataset, DataLoader

# Define a custom dataset class
class MyCustomDataset(Dataset):
    def __init__(self, tokenizer, file_path, block_size=128):
        with open(file_path, encoding="utf-8") as f:
            lines = [line for line in f.read().splitlines() if (len(line) > 0 and not line.isspace())]
        self.examples = tokenizer.batch_encode_plus(lines, add_special_tokens=True, max_length=block_size)["input_ids"]
    
    def __len__(self):
        return len(self.examples)
    
    def __getitem__(self, i):
        return torch.tensor(self.examples[i])

# Load tokenizer and model
tokenizer = GPT2Tokenizer.from_pretrained('gpt2')
model = GPT2LMHeadModel.from_pretrained('gpt2')

# Load dataset
dataset = MyCustomDataset(tokenizer, 'path_to_your_dataset.txt')
data_loader = DataLoader(dataset, batch_size=4, shuffle=True)

# Setup training
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
model.to(device)
optimizer = AdamW(model.parameters(), lr=5e-5)
scheduler = get_linear_schedule_with_warmup(optimizer, num_warmup_steps=500, num_training_steps=-1)

model.train()

for epoch in range(3):  # number of epochs
    for step, batch in enumerate(data_loader):
        inputs = batch.to(device)
        outputs = model(inputs, labels=inputs)
        loss = outputs.loss
        loss.backward()
        optimizer.step()
        scheduler.step()
        optimizer.zero_grad()

        if step % 100 == 0:
            print(f"Epoch: {epoch}, Step: {step}, Loss: {loss.item()}")

In this example, a ‘MyCustomDataset’ class is defined to handle loading and formatting the dataset. It’s then used with a ‘DataLoader’ for batch processing, which is crucial for handling large datasets efficiently. A learning rate scheduler is also introduced to adjust the learning rate throughout training, which can help improve model performance.

Understanding the Code

from transformers import GPT2Tokenizer, GPT2LMHeadModel, AdamW, get_linear_schedule_with_warmup

Imports the necessary components from the Hugging Face Transformers library: GPT2Tokenizer for tokenizing input text into tokens that GPT-2 can understand, GPT2LMHeadModel, which is the GPT-2 model with a language modeling head on top (useful for generating text), AdamW which is an optimizer with weight decay fixed, and get_linear_schedule_with_warmup for a learning rate scheduler that decreases the learning rate linearly after a warmup period.

import torch
from torch.utils.data import Dataset, DataLoader

Imports PyTorch, a deep learning framework that provides flexibility and speed in modeling, imports Dataset and DataLoader from PyTorch, which are essential for handling and batching data for training.

class MyCustomDataset(Dataset):

Defines a custom dataset class inherited from PyTorch’s ‘Dataset’ class.

def __init__(self, tokenizer, file_path, block_size=128):
        with open(file_path, encoding="utf-8") as f:
            lines = [line for line in f.read().splitlines() if (len(line) > 0 and not line.isspace())]
        self.examples = tokenizer.batch_encode_plus(lines, add_special_tokens=True, max_length=block_size)["input_ids"]

The initializer method takes a tokenizer, the path to the dataset file, and an optional block size, which specifies the maximum length of the tokens. ‘with open’ Opens the dataset file, reads the lines, and filters out empty lines or lines with only whitespace. ‘self.examples’ Tokenizes the lines using the provided tokenizer, adding special tokens (like start and end tokens) and ensuring each tokenized input does not exceed the block size. The tokenized inputs are stored as input IDs.

def __getitem__(self, i):
        return torch.tensor(self.examples[i])

The i-th item is retrieved from the dataset and returned it as a PyTorch tensor.

tokenizer = GPT2Tokenizer.from_pretrained('gpt2')
model = GPT2LMHeadModel.from_pretrained('gpt2')

Loads a pre-trained GPT-2 tokenizer and model. These are used to tokenize input text and generate text, respectively.

dataset = MyCustomDataset(tokenizer, 'path_to_your_dataset.txt')
data_loader = DataLoader(dataset, batch_size=4, shuffle=True)

Initializes the custom dataset with the specified tokenizer and dataset file, then creates a DataLoader for batching and optionally shuffling the dataset during training.

device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
model.to(device)

Sets up the device (GPU or CPU) for training and moves the model to the selected device.

optimizer = AdamW(model.parameters(), lr=5e-5)
scheduler = get_linear_schedule_with_warmup(optimizer, num_warmup_steps=500, num_training_steps=-1)

Initializes the optimizer and learning rate scheduler. The learning rate will warm over the first 500 steps, then linearly decay.

for epoch in range(3):  # number of epochs
    for step, batch in enumerate(data_loader):
        inputs = batch.to(device)
        outputs = model(inputs, labels=inputs)
        loss = outputs.loss
        loss.backward()
        optimizer.step()
        scheduler.step()
        optimizer.zero_grad()

Performs the training over three epochs. For each batch, it moves the batch to the device, computes the output and loss, performs backpropagation, updates the model parameters, updates the learning rate, and resets the optimizer gradients. This code effectively fine-tunes a GPT-2 model on a custom dataset, making it more adept at generating text relevant to the specific context or domain.

Step 4: Integrating the Fine-Tuned Model into Applications

The true use of LLMs is implementing them in applications, making their capabilities accessible to users. Given the huge computer resources required to run LLMs efficiently, hosting these models directly within client-side applications is no way to go. Instead, deploying the model on a server and accessing it through APIs will be more scalable and efficient.

Deploying the Model on a Server

Cloud-based services provide an ideal environment for hosting LLMs. These platforms offer scalable computing resources, making it possible to handle requests from multiple users simultaneously without degrading performance. One must consider response time, availability, and security factors when deploying the model to ensure a seamless user experience.

The deployment process typically involves containerizing the model using tools like Docker, which packages the model and its dependencies into a container image. This image can then be deployed on cloud platforms such as Amazon Web Services (AWS), Google Cloud Platform (GCP), or Microsoft Azure, which provide managed services to run containerized applications.

Conclusion

The transition from mobile development to training LLMs is a huge shift, but all the resources and help I can get through the internet make this journey much easier. This journey is to show you how to do it. Through this process, developers can enhance their applications, making them more interactive and responsive to users.

Integrating these models into mobile applications involves deploying them on servers and creating APIs for communications. This approach opens new ways for creating dynamic and intelligent app features and is already changing mobile app development. Understanding and applying AI and machine learning technologies is a key to staying ahead in the industry.

Cheers!

From Mobile Development to Training Large Language Models (LLMs): A Developer’s Journey.

Step 1: Understanding the Basics

Attention Mechanism

Self-Attention

Positional Encoding

Step 2: Exploring Pre-trained Models

Setting Up Your Environment

Baby-Step 1: Installing Python

Baby-Step 2: Installing pip

Baby-Step 3: Install the ‘transformers’ Library and run the Script

Understanding the Code

Run and Behold!

Step 3: Training Large Language Models (LLMs)

Finding and Preparing a Dataset

Fine-Tuning the Model

Understanding the Code

Step 4: Integrating the Fine-Tuned Model into Applications

Deploying the Model on a Server

Conclusion

Written by Asiri Piyajanaka