What I learned in my first 60 days working with LLM?

Filipe Pacheco
6 min readOct 3, 2023

--

A brief share knowleadge about my learning working with LLM in the last 60 days. From zero knowleadge to know how to choose the best approach for a specific application with LLM.

In the last episode

This is my second post on Medium. In my first post, I talked about the possible obsolescence of data scientists in three years and how to upskill to avoid it. I mentioned that I would start an upskilling plan and divide it into three stages:

  • LLM — Large Language Models
  • Upskill from ML in AWS
  • Become MultiCloud Practitioner

Today, I’m excited to share my insights, experiences, and learnings about the first stage of my upskilling plan — the use of LLM.

How I did it?

If you’ve checked out my LinkedIn account, you may have noticed that I use Databricks daily. As a Data Scientist (DS), I believe that Databricks is one of the top five tools for deploying Data & Analytics applications, especially for those involving Streaming or Big Data.

Recently, I’ve been reading about the ultra-hype of LLM, particularly with its popularity from ChatGPT. However, until a few months ago, I didn’t know where to begin. Once again, Databricks came to my aid, specifically the Databricks Academy (link below), where I completed three self-paced training classes:

  • Generative AI Fundamentals
  • LLM: Foundation Models from the Ground Up
  • LLM: Application through Production
Databricks Learning

Needless to say, watching a training class with the CTO of a company like Databricks — Matei Zaharia — is amazing. After completing the three classes and spending several hours on Notebook Lab (homework :D ), I can personally guarantee that these trainings can take you from the ground level to being capable of sharing informed opinions about possible utilization of LLM, including its benefits and difficulties to deploy.

In the next section, which I’ve divided into four parts, I will discuss the four main possibilities I see today to bring LLM into production. I’ve attempted not to go too deep into technical details, but I’ve provided links for those who are interested in amplifying their knowledge.

LLM from the Scratch

If the title is as above, the subtitle could be, “Don’t go this way”. Unless you are being paid to conduct research or developing a breakthrough in this technology, creating and training an LLM from scratch is extremely expensive, time-consuming, and uncertain like other new technologies whose heuristics are not widely known and come from trial-and-error approaches.

In the following link, Github, I’ve shared my repository code that I used during the Databricks Academy training classes. The process of training an LLM is not comparable at all to a regular ML model (tabular). Maybe it is something closer to training a Deep Learning Model because LLM also uses deep neural networks.

I often say that a problem is a problem, if this problem has a solution, well, there some alternatives. The first and most straightforward option is to use an already finished (trained) LLM. The leading platform to find these is Hugging Face portal, as linked below.

Fine-Tunning LLM

Perhaps the most important thing I’ve learned thus far in my upskilling is that something small and specialized is often worth more than an extremely large and generalist model. In the context of LLM, fine-tuning a model means updating its final layers to perform specific tasks, such as classifying whether a review is toxic or not. This idea is the same behind the transfer learning approach used in some computer vision and machine learning applications.

Hugging Face — The AI community building the future.

Then comes Hugging Face, this community is amazing and already plays an important role in the LLM realm. The portal has over 350k models, created and maintained by the open community, divided into six categories (Multimodal, Computer Vision, NLP, Audio, Tabular, and Reinforcement Learning) and 36 sub-categories. There are also more than 60k datasets to make it easier for you to train or fine-tune an LLM.

In Hugging Face portal, you can find several variations of foundation models with fine-tuning for specific applications realized by someone else. Before you start to consider building something yourself, try searching this portal. With this plethora of models, it’s not impossible to find a specialized model that will cut down on the time needed for possible LLM deployment.

Pay To Use

Another possibility and natural way to use LLMs today is to pay for GIANT LLMs, such as ChatGPT4. Sometimes, size does matter, and comparisons between ChatGPT3 / 3.5 with 4 only confirm it. The main benefit is that you don’t have to worry about training or fine-tuning the model, or ensuring LLMOps (what is the MLOps for LLMs). Depending on the service that you’re paying for, you may also be eligible for new versions of the LLM.

On the other hand, the cost involved in this utilization can easily spiral out of control, depending on your application. However, if you’re developing something in a company environment and pay-to-use is an option for you, make sure to look at some tutorials to teach you how to estimate these costs.

RAG — Retrieval Augmented Generation

After evaluating all of these possibilities, there is one more interesting approach that can make small and simple models outperform giant models like ChatGPT, this possibility is called RAG.

The idea behind this approach is quite simple, and it can use a Vector Library or a Vector Database. These Vectors (documents or sentences) related to a specific subject are embedded into a Vector using the same strategy that LLM uses to create its own prediction.

When I prompt a LLM, a technique such as Word2Vec turns the prompt’s word into a Vector. The LLM processes this Vector and generates the output by doing the reverse process, as illustrated below.

Example of LLM’s architecture, image from Databricks Academy: Generative AI Fundamentals

After the output generation, this entire processing (data) is discarded. What a Vector Library and Database does is to store specific words to augment, improve the output generated by an LLM. Simply put, RAG uses a specific information storage system in Vector format to improve the output given to the user, without the need for fine-tuning.

This technique stopped me from doing fine-tuning and drew my attention to using RAG. This approach makes it easier to deploy an LLM as a specialist in answering questions related to a very narrow knowledge domain, at a significantly lower cost and time than fine-tuning an LLM.

Conclusion

I’m extremely happy with the progress I’ve made in the first 60 days of my upskilling journey. I can now deploy LLMs available on the Hugging Face portal for specific solutions. In fact, I gave my first lecture about LLMs last month.

However, what makes me happiest is working with my co-workers on developing two new applications using LLMs. One of which involves LangChain and LLM Agent, which is a great topic we can explore in a future article. The other involves the RAG approach, which has shown to be a promising technique to improve output generated by LLMs.

In conclusion, this is my first faithful report about my upskilling journey. Stay tuned to read about my next one ;)

--

--

Filipe Pacheco

Senior Data Scientist | AI, ML & LLM Developer | MLOps | Databricks & AWS Practitioner