Exploring the potential of Small Language Models

Published in

Data Science at Microsoft

5 min readAug 13, 2024

In today’s world where we see new technology emerge every day, language models and generative AI have captured the interest of many people. Large Language Models (LLMs) like GPT-4 help people accomplish their goals by generating content that is human-like, insightful, and easy to understand. At the same time, LLMs can prove to be computationally expensive and require considerable resources (such as GPUs and NPUs) and power to operate.

Small Language Models (SLMs), on the other hand, might not be as accurate and robust as LLMs, but they can also prove to be useful, especially in specialized domains. They are computationally less expensive compared to LLMs and they can easily be run on personal devices or on devices with limited computational resources. An area where SLMs can be extremely beneficial is summarizing health logs and reports for effective monitoring of system health.

In this article I delve into some details around SLMs and how I leveraged Phi-3-mini, an SLM developed by Microsoft, to monitor system health on my laptop. I also highlight some of the differences between LLMs and SLMs and show some ways in which SLMs may have advantages compared to LLMs when it comes to specialized tasks and smaller information analytics.

Overview of LLMs and SLMs

In this section, I offer brief descriptions of LLMs and SLMs to compare and contrast some of their key aspects with each other.

What are LLMs?

Large Language Models (LLMs) like GPT-4 are designed to handle extensive datasets and complex language tasks. They are trained on vast amounts of data and require significant computational power, often requiring specialized hardware like GPUs or NPUs to function effectively. LLMs perform well in tasks requiring high levels of pattern recognition and summarization, such as sophisticated text generation, deep semantic analysis, and complex problem solving.

What are SLMs?

Small Language Models (SLMs), such as Phi-3, are scaled-down versions of their larger counterparts. They are optimized to perform specific tasks efficiently while requiring substantially fewer resources. SLMs can be deployed on personal devices without the need for high-end hardware, making them accessible for everyday use. Despite their smaller size, SLMs like Phi-3 can still deliver impressive performance in targeted applications.

Technical comparison

In this section, I present a series of comparisons between some aspects of the GPT-4 LLM and the Phi-3 SLM.

Architecture

GPT-4: Features a 100-trillion parameter architecture, leveraging deep neural networks to capture intricate language patterns. Its depth and complexity allow it to perform a wide range of language tasks with high accuracy.
Phi-3: Features a more streamlined architecture with 7 billion parameters, focusing on efficiency and speed. This design makes it more suitable for deployment on personal devices without the need for extensive computational resources.

Computational requirements

GPT-4: Requires significant computational power, often necessitating dedicated GPUs or NPUs to handle its processing demands. Running GPT-4 on a typical personal device is impractical due to its high resource consumption.
Phi-3: Designed to be resource-efficient, Phi-3 can be deployed on standard personal devices. In my experience with deploying Phi-3-mini-128K on a laptop with a 13th-generation Intel Core i7–1365U processor, 32 GB of RAM, and 1TB of storage, I found it able to function smoothly without specialized hardware.

Performance and accuracy

GPT-4: Excels in performance and accuracy across a wide array of tasks due to its extensive training and large parameter count. It is the go-to choice for tasks requiring top-tier language processing capabilities.
Phi-3: While not as powerful as GPT-4, Phi-3 offers commendable performance in specific tasks. Its smaller size allows for quicker processing times and lower latency, making it ideal for applications where speed and efficiency are crucial.

Use case

GPT-4: Suited for tasks requiring deep understanding and complex problem solving, such as advanced natural language processing and comprehension.
Phi-3: Ideal for applications needing efficient, on-device processing, such as mobile applications and edge computing.

Multimodal capabilities

GPT-4: Primarily focused on text-based tasks, though with some multimodal capabilities.
Phi-3: Phi-3-vision integrates both language and visual inputs, making it versatile for tasks like image captioning and OCR.

My experience with Phi-3

Deploying Phi-3-mini-128k on my personal laptop was a straightforward process using Phi-3 ONNX models that are hosted in a collection on Hugging Face.

To generate a concise summary of the system health, I collected the system logs from multiple sources and tools such as Microsoft Event Viewer, Battery Report, Windows Resource Monitor, Windows Firewall with Advanced Security, and Windows Management Instrumentation. The logs were in various formats such as .csv, .txt and .html.

Next, I converted these log files to Pandas DataFrames to facilitate easy data handling and analysis. Converting the logs to DataFrames also helped me convert the available information in the correct format and made it easier to access relevant logs based on my Pandas queries.

After analyzing the relevant data from the DataFrames, I created a .txt file, which I then passed as an input prompt to my SLM (phi-3-mini). Along the way, I found that one of the most impressive aspects of doing this was the ability to fine-tune the model parameters to monitor my system health efficiently.

By providing the system health report in the input prompt and tuning parameters such as top_p, top_k, repetition penalty, and temperature, I could generate accurate and concise reports. For example, the top_p parameter conducts nucleus sampling. It selects the tokens with the highest cumulative probability while answering the questions, thus limiting the diversity and improving reason within the answers. The top_k parameter restricts the selection of tokens to answer the question to the top k values.

The repetition penalty prevents the model from using the same tokens too frequently in the output, and temperature is mainly used for controlling the randomness and creativity of the generated text.

Entering the prompt, “Give me some interesting events that have happened in my system today?” yielded insightful results that helped me keep track of my system’s performance without spending too much time analyzing the logs, while allowing me to identify the problem areas with ease.

Conclusion

In conclusion, while LLMs like GPT-4 offer significant language processing power, their high resource requirements make them less accessible for everyday use on personal devices. SLMs like Phi-3, on the other hand, strike a balance between performance and efficiency, making them a practical choice for specialized tasks. My experience with deploying and fine-tuning Phi-3-mini-128k demonstrated its potential in providing valuable insights into system health without the need for high-end hardware. For those seeking a powerful yet resource-efficient language model, I believe that Phi-3 is a compelling option.

The author would like to thank Qiuyi Duan and Ankit Srivastava for contributing to and reviewing the work.

Abhishikta Bandyopadhyay is on LinkedIn.