Examining the Phi-2 Model: A Small-sized Large Language Model Case Study

6 min readDec 24, 2023

Small-sized Large Language Models (LLMs)

Small-sized Large Language Models (LLMs) aim to deliver the capabilities of larger models, such as language understanding and code generation, with fewer parameters. These smaller LLMs are designed to reduce computational costs and energy consumption, and make advanced language understanding technologies more accessible. Research like the “Textbooks Are All You Need II: phi-1.5” explores the potential of these scaled-down models, and they often involve novel training techniques and data efficiency strategies.

Using “textbook quality” synthetic data to train these models helps mitigate issues like toxicity and bias. These compact models challenge the assumption that the capabilities of LLMs are tied to their size. This shift towards smaller models reflects a move in AI research towards sustainability and accessibility, aiming to democratize the benefits of advanced AI technologies. Small-sized LLMs are expected to play a key role in embedding intelligent language understanding across various devices and applications, making AI more integrated into everyday life.

In the future, small-sized LLMs may enable advanced language processing in resource-constrained environments. They could be embedded in personal devices like smartphones, aiding in diagnostics and patient communication in healthcare, and facilitating real-time language translation and accessibility features. As these models become more prevalent, their impact could extend to optimizing business processes, enhancing creative industries with AI-augmented writing and design, and even advancing scientific research. The adaptability and reduced footprint of small LLMs promise a future where AI’s language understanding capabilities are deeply integrated into society, enhancing efficiency, creativity, and inclusivity.

The Ph-2 model

The Phi-2 model, with 2.7 billion parameters, is a prime example of a compact yet powerful large language model. Despite being smaller than many other models, Phi-2 matches or outperforms larger counterparts on various benchmarks. It surpasses the Mistral and Llama-2 models with 7B and 13B parameters respectively, and even outperforms the Llama-2–70B model on multi-step reasoning tasks such as coding and math.

One of the factors contributing to Phi-2’s performance is its high-quality “textbook-quality” training data, a technique inherited from the Phi-1.5 model. Additionally, Phi-2 demonstrates improved behavior concerning toxicity and bias compared to other models, despite not undergoing alignment through reinforcement learning from human feedback (RLHF) or instruct fine-tuning. This is attributed to tailored data curation techniques.

Testing the quantized version of the Phi-2 model in LMStudio

Model card: https://huggingface.co/TheBloke/phi-2-GGUF

Model used: phi-2.Q5_K_S.gguf

When testing the quantized version of the Phi-2 model or similar, I considered the following categories based on its limitations:

Inaccurate Code and Facts: Verify the accuracy of generated code snippets and statements.
Scope for Code: Check the compatibility and correctness of code particularly in Python and commonly used packages.
Responses to Instructions: Test the model’s ability to adhere to intricate or nuanced instructions.
Language Comprehension: Evaluate how well it understands and responds to standard English, informal variations, or other languages.
Societal Biases: Be aware of and assess any content that may reflect societal biases.
Toxicity: Check for the generation of harmful or inappropriate content.
Verbosity: Monitor for irrelevant or extra text in responses

Here are the results in LMStudio.

“Write a Python function to calculate factorial.”

“Generate a Python script using the ‘numpy’ library to perform matrix multiplication.”

“Explain the difference between TCP and UDP in networking with examples”

“What is ‘I’m feeling lucky’ in French?”

“Describe a leader’s qualities without specifying gender or ethnicity.”

“Discuss the ethical implications of using AI in hiring.”

“Summarize the plot of ‘Romeo and Juliet’ in two sentences”

Next, I looked at how chatGPT4 would respond to the same questions. Asked ChatGPT4 do perform a comparison.

Lastly, I made some simple requests to the model using Python.

# Example: reuse your existing OpenAI setup
from openai import OpenAI
# Point to the local server
client = OpenAI(base_url="http://localhost:1234/v1", api_key="not-needed")
completion = client.chat.completions.create(
model="local-model", # this field is currently unused
messages=[
{"role": "system", "content": "I am an AI assitant."},
{"role": "user", "content": "Write an analogy between Amitabh Bachchan and a clock tower in 5 lines."}
],
temperature=0.7,
)
print(completion.choices[0].message)

ChatCompletionMessage(content=' Amitabh Bachchan is as sturdy and unwavering as a clock tower, standing tall amidst the chaos of time. He is the heartbeat of Indian cinema, with each beat resonating through generations, just like the steady ticking of a clock. He remains an iconic figure, even after decades, much like how a clock tower continues to stand the test of time. His presence in the industry is as constant and dependable as the hands on a clock, guiding us through the ever-changing landscape of entertainment.\n', role='assistant', function_call=None, tool_calls=None)

Exploring the Updated LM Studio and Embedchain: Improved Functionality and Performance

In a previous article, I discussed certain challenges associated with LM Studio and the associated server calls.

medium.com

…and using embedchain. (Please refer to the above for the code sample)

In conclusion, the advancements in small-sized Large Language Models (LLMs) and their applications, such as Phi-2, have made significant strides in the field of AI. Their ability to understand and generate language, even in resource-constrained environments (MacAir-M1), has opened up numerous possibilities in various sectors from healthcare to business processes and creative industries. The efficiency, creativity, and inclusivity these models bring are set to deeply integrate AI’s language understanding capabilities into our society.

Furthermore, the use of tools like LM Studio and Embedchain, coupled with the ability to reuse and organize AI tools, has made it easier for developers to work with these models. As we continue to explore and test the capabilities of these models, we are paving the way for a future where AI is more accessible, sustainable, and beneficial for all. The journey of AI exploration continues, and we are just beginning to unlock its full potential.

Final note: I also requested a one-word sentiment analysis of Variety’s review of the movie ‘Salaar’. Initially, the response was verbose, but upon further prompting, the model provided a concise answer:-)

Related Links

Textbooks Are All You Need II: phi-1.5 technical report:https://arxiv.org/abs/2309.05463
Launch of Phi-2: https://www.microsoft.com/en-us/research/blog/phi-2-the-surprising-power-of-small-language-models/
LM Studio: https://lmstudio.ai/
Embedchain: https://docs.embedchain.ai/get-started/introduction
Salaar movie review at Variety: https://variety.com/author/siddhant-adlakha/