Examining the Phi-2 Model: A Small-sized Large Language Model Case Study

Lakshmi narayana .U
6 min readDec 24, 2023
Generated by author with DALL.E-3

Small-sized Large Language Models (LLMs)

Small-sized Large Language Models (LLMs) aim to deliver the capabilities of larger models, such as language understanding and code generation, with fewer parameters. These smaller LLMs are designed to reduce computational costs and energy consumption, and make advanced language understanding technologies more accessible. Research like the “Textbooks Are All You Need II: phi-1.5” explores the potential of these scaled-down models, and they often involve novel training techniques and data efficiency strategies.

Using “textbook quality” synthetic data to train these models helps mitigate issues like toxicity and bias. These compact models challenge the assumption that the capabilities of LLMs are tied to their size. This shift towards smaller models reflects a move in AI research towards sustainability and accessibility, aiming to democratize the benefits of advanced AI technologies. Small-sized LLMs are expected to play a key role in embedding intelligent language understanding across various devices and applications, making AI more integrated into everyday life.

In the future, small-sized LLMs may enable advanced language processing in resource-constrained environments. They could be embedded in personal devices like smartphones, aiding in diagnostics and patient communication in healthcare, and facilitating real-time language translation and accessibility features. As these models become more prevalent, their impact could extend to optimizing business processes, enhancing creative industries with AI-augmented writing and design, and even advancing scientific research. The adaptability and reduced footprint of small LLMs promise a future where AI’s language understanding capabilities are deeply integrated into society, enhancing efficiency, creativity, and inclusivity.

The Ph-2 model

The Phi-2 model, with 2.7 billion parameters, is a prime example of a compact yet powerful large language model. Despite being smaller than many other models, Phi-2 matches or outperforms larger counterparts on various benchmarks. It surpasses the Mistral and Llama-2 models with 7B and 13B parameters respectively, and even outperforms the Llama-2–70B model on multi-step reasoning tasks such as coding and math.

Source: Microsoft

One of the factors contributing to Phi-2’s performance is its high-quality “textbook-quality” training data, a technique inherited from the Phi-1.5 model. Additionally, Phi-2 demonstrates improved behavior concerning toxicity and bias compared to other models, despite not undergoing alignment through reinforcement learning from human feedback (RLHF) or instruct fine-tuning. This is attributed to tailored data curation techniques.

Photo by nehal oddu on Unsplash

Testing the quantized version of the Phi-2 model in LMStudio

Model card: https://huggingface.co/TheBloke/phi-2-GGUF

Model used: phi-2.Q5_K_S.gguf

When testing the quantized version of the Phi-2 model or similar, I considered the following categories based on its limitations:

  1. Inaccurate Code and Facts: Verify the accuracy of generated code snippets and statements.
  2. Scope for Code: Check the compatibility and correctness of code particularly in Python and commonly used packages.
  3. Responses to Instructions: Test the model’s ability to adhere to intricate or nuanced instructions.
  4. Language Comprehension: Evaluate how well it understands and responds to standard English, informal variations, or other languages.
  5. Societal Biases: Be aware of and assess any content that may reflect societal biases.
  6. Toxicity: Check for the generation of harmful or inappropriate content.
  7. Verbosity: Monitor for irrelevant or extra text in responses​

Here are the results in LMStudio.

“Write a Python function to calculate factorial.”

Author LMStudio Model Screenshot

“Generate a Python script using the ‘numpy’ library to perform matrix multiplication.”

Author LMStudio Model Screenshot

“Explain the difference between TCP and UDP in networking with examples”

Author LMStudio Model Screenshot
Author LMStudio Model Screenshot

“What is ‘I’m feeling lucky’ in French?”

Author LMStudio Model Screenshot

“Describe a leader’s qualities without specifying gender or ethnicity.”

Author LMStudio Model Screenshot
Author LMStudio Model Screenshot

“Discuss the ethical implications of using AI in hiring.”

Author LMStudio Model Screenshot
Author LMStudio Model Screenshot

“Summarize the plot of ‘Romeo and Juliet’ in two sentences”

Author LMStudio Model Screenshot

Next, I looked at how chatGPT4 would respond to the same questions. Asked ChatGPT4 do perform a comparison.

Author ChatGPT4 Screengrab

Lastly, I made some simple requests to the model using Python.

Author Screenshot- LMStudio
# Example: reuse your existing OpenAI setup
from openai import OpenAI
# Point to the local server
client = OpenAI(base_url="http://localhost:1234/v1", api_key="not-needed")
completion = client.chat.completions.create(
model="local-model", # this field is currently unused
messages=[
{"role": "system", "content": "I am an AI assitant."},
{"role": "user", "content": "Write an analogy between Amitabh Bachchan and a clock tower in 5 lines."}
],
temperature=0.7,
)
print(completion.choices[0].message)
ChatCompletionMessage(content=' Amitabh Bachchan is as sturdy and unwavering as a clock tower, standing tall amidst the chaos of time. He is the heartbeat of Indian cinema, with each beat resonating through generations, just like the steady ticking of a clock. He remains an iconic figure, even after decades, much like how a clock tower continues to stand the test of time. His presence in the industry is as constant and dependable as the hands on a clock, guiding us through the ever-changing landscape of entertainment.\n', role='assistant', function_call=None, tool_calls=None)

…and using embedchain. (Please refer to the above for the code sample)

author screenshot- Jupyter Notebook

In conclusion, the advancements in small-sized Large Language Models (LLMs) and their applications, such as Phi-2, have made significant strides in the field of AI. Their ability to understand and generate language, even in resource-constrained environments (MacAir-M1), has opened up numerous possibilities in various sectors from healthcare to business processes and creative industries. The efficiency, creativity, and inclusivity these models bring are set to deeply integrate AI’s language understanding capabilities into our society.

Furthermore, the use of tools like LM Studio and Embedchain, coupled with the ability to reuse and organize AI tools, has made it easier for developers to work with these models. As we continue to explore and test the capabilities of these models, we are paving the way for a future where AI is more accessible, sustainable, and beneficial for all. The journey of AI exploration continues, and we are just beginning to unlock its full potential.

Final note: I also requested a one-word sentiment analysis of Variety’s review of the movie ‘Salaar’. Initially, the response was verbose, but upon further prompting, the model provided a concise answer:-)

--

--