Lets ride the Llama2 with me !

Nando Teddy
Nando Teddy Lab
Published in
11 min readJun 13, 2024
Next gen — actually now we ady have LLama 3 Lol

### Becoming a Learning Machine in the Era of Machine Learning

In this fast-paced era of technology, I have no choice but to evolve and become a "learning machine" to keep up with advancements in machine learning. The next generation, including our children, will face intense competition centered around technological progress. It's our responsibility to prepare them for this transition.

We won't be here forever, so it's crucial to equip the next generation with the skills they need. Let’s embark on this exciting adventure together, breaking down complex concepts to make them easier to understand. By becoming lifelong learners ourselves, we can help guide our children and others through the technological challenges ahead.

Ollama.ai: The Local Gateway to Next-Generation AI

Llama (acronym for Large Language Model Meta AI, and formerly stylized as LLaMA) is a family of autoregressive large language models released by Meta AI starting in February 2023.[2][3] The latest version is Llama 3 released in April 2024.[4]

Llama 2 operates by leveraging a vast dataset of 2 trillion “tokens” drawn from publicly accessible sources, including Common Crawl, Wikipedia, and public domain books from Project Gutenberg. Each token represents a word or semantic fragment that enables the model to comprehend text and predict subsequent content plausibly. This enables Llama 2 to discern relationships between concepts, like understanding that “Apple” and “iPhone” are closely related but distinct from “apple,” “banana,” and “fruit.”

These models provide a robust foundation for customization, offering users the flexibility to tailor Llama 2 to their organization’s distinct style or voice. This is achieved through extensive training with diverse examples, enabling the generation of article summaries that resonate with the company’s unique identity. Additionally, users can refine chat-optimized models to better address customer support inquiries by incorporating relevant information, such as FAQs and chat logs.

Many prominent Large Language Models (LLMs), including OpenAI’s GPT-3 and GPT-4, Google’s PaLM and PaLM 2, and Anthropic’s Claude, typically operate as closed-source systems. While researchers and businesses may access these models through official APIs and adapt them for specific use cases, there remains a lack of transparency regarding the internal mechanisms of these models.

In contrast, Llama 2 distinguishes itself through its commitment to openness. Individuals interested in the model can access a comprehensive research paper detailing its development and training methodologies. Moreover, the model is available for download, allowing users with the requisite technical proficiency to deploy it on their own systems or inspect its code.

Foundational Understanding

Glossary

In this case input layer has 3
Hidden layer 4 nodes each
Output layer 1 nodes

General Architecture

What is WSL 2?
WSL 2 is the default distro type when installing a Linux distribution. WSL 2 uses virtualization technology to run a Linux kernel inside of a lightweight utility virtual machine (VM). Linux distributions run as isolated containers inside of the WSL 2 managed VM. Linux distributions running via WSL 2 will share the same network namespace, device tree (other than /dev/pts), CPU/Kernel/Memory/Swap, /init binary, but have their own PID namespace, Mount namespace, User namespace, Cgroup namespace, and init process.

RAG

Retrieval-Augmented Generation (RAG) is the process of optimizing the output of a large language model, so it references an authoritative knowledge base outside of its training data sources before generating a response. Large Language Models (LLMs) are trained on vast volumes of data and use billions of parameters to generate original output for tasks like answering questions, translating languages, and completing sentences. RAG extends the already powerful capabilities of LLMs to specific domains or an organization’s internal knowledge base, all without the need to retrain the model. It is a cost-effective approach to improving LLM output so it remains relevant, accurate, and useful in various contexts.

Possible Use Case

  1. Enhanced Market Research and Analysis: Llama 2 can analyze vast amounts of customer reviews, social media sentiment, and market research data to identify trends, understand customer preferences, and inform strategic decision-making.
  2. Intelligent Document Processing: Automating document processing is a game-changer. Llama 2 can be fine-tuned to process contracts, legal documents, or financial reports, extracting key information and summarizing content for faster review and analysis.
  3. Personalized Customer Experiences: Imagine a customer service experience tailored to each individual. Llama 2 can power chatbots that personalize interactions, recommend relevant products, and answer questions in a way that feels natural and helpful.
  4. Data-Driven Content Creation: Content marketing is essential, but creating fresh content can be time-consuming. Llama 2 can assist with content strategy by generating drafts, summarizing complex topics, or personalizing content for different audiences.
  5. Sales and Marketing Automation: Llama 2 can automate repetitive tasks in sales and marketing, such as generating personalized email campaigns, following up with leads, and analyzing marketing campaign performance.

Objective of This Exploration

  • Research on How LLM Work with Ollama
  • Explore and learn how to setting up LLM on WSL2
  • Explore how to setup Flask Python With Chroma DB
  • Explore how to utilize Large Language Model With Our Own Data Set (RAG)

Tech Stack

In order to setup local LLM

  1. WSL2
  2. Linux CLI
  3. Certificate Authority Import
  4. Bash
  5. Docker
  6. VSCode
  7. Python — Flask
  8. Hugging Face

Recommended

  1. CUDA
  2. NVidia GPU

MainSetup

To set up LLama2 on a local machine, there are two primary methods: using Docker containers or Windows Subsystem for Linux (WSL). This guide will focus on utilizing WSL as the hypervisor to streamline the process. Integrating Visual Studio Code (VS Code) with the appropriate plugins further simplifies this setup, making it more efficient and developer-friendly.

### Setting Up LLama2 with WSL

**1. Install WSL:**
First, ensure that WSL is installed on your Windows machine. You can enable WSL through PowerShell by executing the following commands:

```powershell
wsl --install
```

This command installs WSL and sets up a default Linux distribution. You can choose from various distributions like Ubuntu, Debian, or others according to your preference.

**2. Set Up Your Linux Environment:**
Once WSL is installed, you need to update and configure your Linux environment. Open your WSL terminal and run:

```bash
sudo apt update
sudo apt upgrade
```

This ensures that your system is up-to-date with the latest packages and security updates.

**3. Install LLama2 Dependencies:**
Next, you need to install the dependencies required for LLama2. This typically includes Python, pip, and other necessary libraries. Run the following commands:

```bash
sudo apt install python3 python3-pip

```

**4. Configure VS Code with WSL:**
To leverage the power of VS Code with your WSL environment, install the "Remote - WSL" extension in VS Code. This extension allows you to open any folder in the WSL environment and work with it directly in VS Code.

**5. Integrate and Debug with VS Code:**
VS Code, with its extensive range of plugins, significantly enhances the development experience. For instance, debugging a Flask application within the WSL environment becomes seamless. You can install the Python and Flask extensions from the VS Code marketplace to streamline this process.

### Benefits of Using WSL and VS Code Integration

- **Seamless Linux Integration:** WSL provides a full-fledged Linux kernel that runs directly on your Windows machine, offering a robust development environment without the need for dual-booting or using virtual machines.
- **Enhanced Productivity:** VS Code, combined with the "Remote - WSL" extension, allows you to utilize the full power of Linux alongside the extensive features and plugins of VS Code. This includes debugging, linting, and other development tools that make your workflow more efficient.
- **Ease of Use:** Setting up and managing your development environment becomes significantly easier. You can leverage the power of the Linux command line and the GUI features of Windows simultaneously.
- **Cross-Platform Compatibility:** As a developer, understanding and working within a Linux environment is crucial. WSL bridges the gap for Windows users, providing an easy transition to Linux-based development.

Setting up LLama2 on a local machine using WSL and integrating it with VS Code is a highly efficient approach for developers working in a Windows environment. This setup not only simplifies the process but also enhances productivity by combining the strengths of both Linux and Windows. It is essential for developers, especially those using Windows daily, to familiarize themselves with WSL to fully leverage the benefits of a robust Linux ecosystem alongside the extensive capabilities of VS Code.

https://code.visualstudio.com/docs/remote/wsl

Wsl remote extension

Setting Up WSL

https://learn.microsoft.com/en-us/windows/wsl/install

Setting Up Certificate Import

Export Host Certificate Inside Single File

Since we are using windows as our host and WSL run underneath with different kind of networking mode, then wsl unable to detect our certificate hence we need to export and import manually in order to make wsl able to run and accessible within our Network.

Run this in powershell Host

$certificateType = [System.Security.Cryptography.X509Certificates.X509Certificate2]
$includedStores = @("TrustedPublisher", "Root", "CA", "AuthRoot")
$certificates = $includedStores.ForEach({
Get-ChildItem Cert:\CurrentUser\$_ | Where-Object { $_ -is $certificateType}
})
$pemCertificates = $certificates.ForEach({
$pemCertificateContent = [System.Convert]::ToBase64String($_.RawData,1)
"-----BEGIN CERTIFICATE-----`n${pemCertificateContent}`n-----END CERTIFICATE-----"
})
$uniquePemCertificates = $pemCertificates | select -Unique
($uniquePemCertificates | Out-String).Replace("`r", "") | Out-File -Encoding UTF8 $HOME\ca-certificates.crt

And then copy this cert inside our WSL instance
cp /mnt/c/Users/Herna/ca-certificates.crt /usr/local/share/ca-certificates/

With assumption $HOME is your local user folder- and we need to copy via Mounting (since WSL able to copy and get file from host)

Once its done we need to setting up our DNS to make sure WSL can run within our VPN Domain Name server

Setting Up Local DNS inside WSL

In order to test network connectivity kindly run this command

ping google.com (“Destination Host Unreachable”)

Why? In this case WSL unfortunately is not same instance as “Our Host” they have their own network configuration in order to access internet.
Hence there is a some setup needto be done in order to use this WSL2

Disable resolv.conf generation in wsl:
sudo nano /etc/wsl.conf
Copy this text to the file inside wsl.conf

[network]                                                                        
generateResolvConf = false

Nslookup in HOST using cmd/powershell and You’ll get the IPv4 adress of your nameserver Copy this address.

Update this file following below picture
sudo nano /etc/resolv.conf

“nameserver (your ns lookup ip)”

Restart wsl in powershell:

wsl.exe --shutdown

Once its restarted kindly test Connectivity from WSL2 to Internet once its success kindly do next step

Setting Up Ollama

Go to this website https://ollama.com/download/linux

Copy the command line and run it inside our “WSL2” instances.
WSL is linux environment will download and install Ollama pull all the images related

And then kindly run this command to pull 1 of base layer “LLama2”
(in the event i do research on this — LLama3 ady available and if you want to explore other LLM model you also can try MistraL)

ollama pull llama2

since the image itself arround 3.8GB

Test The API

http://localhost:11434/api/generate
body json

{
"model": "llama2",
"prompt": "Which planet is closest to the sun?"
}
{
"model": "llama2",
"created_at": "2024-06-10T08:53:49.091281897Z",
"response": "The",
"done": false
}
{
"model": "llama2",
"created_at": "2024-06-10T08:53:49.591561235Z",
"response": " planet",
.....................
"total_duration": 50261658582,
"load_duration": 25748385579,
"prompt_eval_count": 29,
"prompt_eval_duration": 8310253000,
"eval_count": 37,
"eval_duration": 16198240000
}

Data Embedding

I will use my own data set as part of data embedding and introduce application layer through it — in this case i will use this data format

Frequently Asked Questions
Incident Reporting: For any incidents, please refer here to obtain detailed information regarding the issue.
Incident Categories: Incidents are classified into five priority levels:
P1: Critical
P2: High
P3: Moderate
P4: Low
P5: Planning
DevOps Tools: Our current DevOps toolkit includes:
Trello for project tracking and repository management.
GCR for artifact storage and distribution.
NPM for package management.
Software Installation Requirements: The following software is required for development:
Visual Studio 2022
.NET 8
MSSQL 2019
Angular 15
Node 16
API Integration: My API is planned with the following external APIs:
XX for enterprise resource planning.
YY for technical application support.

The image above demonstrates our successful end-to-end setup, showcasing how we integrated a large language model like LLaMA into our LangChain framework using Chroma DB for vector embeddings. This process involved several crucial steps, including the assembly and configuration of various components to ensure seamless functionality.

Initially, we focused on incorporating the large language model into LangChain. This required configuring the model to work efficiently with Chroma DB, which we selected as our vector embedding database due to its robust performance and scalability. Chroma DB's capabilities allow us to index and vectorize the dataset effectively, which is a critical aspect of our overall setup.

The integration process involves facilitating interactions among multiple systems: Hugging Face for model hosting and management, Chroma DB for embedding storage and retrieval, and LLaMAIndex for efficient querying and indexing. By utilizing LangChain as the backbone of this integration, we can streamline these interactions, enabling a cohesive and efficient workflow. This ensures that the data flows smoothly between components, enhancing the system's overall performance.

One of the significant advantages of this setup is its ability to support Retrieval-Augmented Generation (RAG). This approach allows us to harness the full potential of LLaMA by enabling deeper and more thorough analysis. RAG combines the strengths of retrieval and generation, improving the model's ability to generate relevant and accurate responses by leveraging external knowledge bases.

It is important to note that our current implementation operates without GPU support. We are running the entire setup on an Intel Core i7 9th generation CPU with 16GB of memory. Despite the lack of GPU acceleration, the system performs admirably, with processing times ranging from 3 to 5 minutes for various tasks. This demonstrates the efficiency of our configuration and the capabilities of the hardware in use.

However, we are keen to explore the potential performance improvements that GPU acceleration could bring. Our next step involves testing the setup with an NVIDIA GPU. We anticipate that this will significantly reduce processing times and enhance the overall efficiency of our system. By leveraging the parallel processing power of GPUs, we expect to achieve faster indexing, vectorization, and querying processes.

In conclusion, our current implementation represents a significant milestone in integrating large language models with robust data handling frameworks. The successful end-to-end setup showcases the potential of combining LLaMA with LangChain and Chroma DB, creating a powerful tool for deep analysis and exploitation of language models. We look forward to the next phase of testing with GPU support and sharing further updates on our progress. Stay tuned for more developments as we continue to refine and optimize our integration. Till then see you in next iteration !

References

https://en.wikipedia.org/wiki/Llama_(language_model)
https://learn.microsoft.com/en-us/windows/wsl/install
https://learn.microsoft.com/en-us/windows/wsl/about
https://github.com/microsoft/WSL/issues/3161
https://medium.com/@shahip2016/llama-2-explained-in-simple-step-by-step-process-5076e072cb69
https://aws.amazon.com/what-is/retrieval-augmented-generation/#:~:text=Retrieval-Augmented Generation (RAG),sources before generating a response .

API Reference

https://ollama.com/library/llama2
https://github.com/ollama/ollama/blob/main/docs/api.md

Online Learning

https://www.linkedin.com/learning/introduction-to-large-language-models/what-are-parameters
https://www.youtube.com/watch?v=5sLYAQS9sWQ
https://www.youtube.com/watch?v=dBoQLktIkOo

--

--