Phi-3-Mini: Microsoft’s Compact Powerhouse for Mobile and Local AI

Lakshmi narayana .U
Stackademic
Published in
21 min readApr 26, 2024

--

image generated by author and DALL.E-3

Introducing Phi-3-Mini: A Compact yet Powerful Language Model for Mobile AI Applications

Microsoft’s research paper introduces phi-3-mini, a groundbreaking language model notable for its compact size and robust capabilities. Despite being just 3.8 billion parameters and trained on 3.3 trillion tokens, phi-3-mini competes with larger models like Mixtral 8x7B and GPT-3.5 in performance benchmarks, achieving scores of 69% on MMLU and 8.38 on MT-bench. The innovation of phi-3-mini lies in its unique training dataset, which utilizes a mix of heavily filtered web data and synthetic data, allowing it to operate efficiently enough to be deployed on modern smartphones.

The introduction of phi-3-mini represents a significant stride in making powerful AI technologies accessible on mobile devices. The model’s architecture is based on the transformer decoder, optimized for mobile platforms by incorporating advanced techniques such as 4-bit quantization, enabling it to perform exceptionally well on the iPhone 14. By using a highly curated and optimized dataset for training, phi-3-mini is not only small but also mighty, showcasing that size isn’t always indicative of power. This approach deviates from traditional scaling laws which generally correlate increased performance with larger models, demonstrating that strategic data usage can yield competitive results even in smaller models.

Phi-3-mini’s ability to run locally on a phone and its potential for on-device AI applications is an exciting development for both developers and users. The model supports long-context interactions through its 128K context version and is adaptable for various AI tasks, proving that advanced AI can be both portable and powerful. The training and post-training methodologies employed ensure the model is not only efficient in understanding and generating language but is also aligned with safety and robustness standards, making phi-3-mini a pioneering tool in the mobile AI space.

While the phi-3-mini model competes with larger models in language understanding and reasoning, its smaller size limits its capacity for factual knowledge storage, as seen in its lower performance on TriviaQA. However, integrating it with a search engine could mitigate this weakness. The model is primarily English-focused, but expanding its multilingual capabilities is a promising next step.

Phi-3 Mini can be accessed on Azure, Hugging Face and Ollama/ LM Studio. Microsoft plans to launch more powerful models, Phi-3 Small and Phi-3 Medium, soon.

Running the Phi-3 Mini on a Mobile Phone

As a simple test, I used ‘LLM Farm,’ an app from the iOS store, which allows a downloaded model to run an inference. Here are the steps.

  1. Download the model from Hugging Face (link provided in the references section). I selected Phi-3-mini-4k-instruct-q4.gguf, which uses the Q4_K_M Quant method, has 4 bits, and a size of 2.2GB.
Source: Model card on Huggingface

2. Add a chat to a linked model, or download/browse a model that has already been downloaded.

Source: Author’s mobile

I tested a few queries and received the responses.

Sample ouput. Source: Author-LLM Farm

So, the model indeed runs on a mobile phone (mine has 6GB RAM). But the LLM farm has limited options when it comes to phi-3 mini as of now, so the app crashes after a couple of queries.

The phone starts to heat up when the model is running — it’s like a built-in, sizzling hot performance indicator! :-)

Running the Phi-3 Mini on a laptop with LM Studio

Intended use of phi-3 mini

Based on the above and general model characteristics, I chose the following approach to evaluate the model locally on LM Studio.

1. Memory/Compute Constrained Environments:

- Initial Query: “Can you summarize a 500-word article into just 50 words? Answer yes or no.”

- If Yes, Provide the Article: Once the model confirms its capability, provide a specific 400-500 word article on Bollywood. Then instruct, “Please summarize this 500-word article into no more than 50 words.”

- Testing Area: Tests the model’s ability to perform complex tasks like summarization under strict computational and memory constraints, which is crucial for deployment in environments with limited resources.

2. Latency in Real-Time Applications:

- Prompt: “Provide real-time, step-by-step guidance on how to perform CPR.”

- Testing Area: Evaluates the model’s ability to deliver immediate and accurate responses, essential for applications where timely information is critical, such as medical or emergency scenarios.

3. Strong Reasoning (Math and Logic):

- Prompt: “If a warehouse sells pallets of water bottles, with each pallet holding 250 bottles and each bottle costing $1.20, what would be the total cost for 3 pallets?”

- Testing Area: Assesses the model’s mathematical reasoning and accuracy, important for scenarios requiring financial calculations or logical problem-solving.

4. Long Context Understanding:

- Prompt 1: “Describe the events leading up to World War II.”

- Prompt 2: “Given those events, how did they contribute to the start of the war?”

- Testing Area: Tests how well the model retains and processes extensive historical information over multiple queries, crucial for educational and research applications.

5. Research on Language Models:

- Prompt: “Critique the impact of zero-shot learning capabilities on the utility of language models in unsupervised tasks.”

- Testing Area: Checks the model’s ability to discuss advanced machine learning concepts and their implications, demonstrating its utility in academic and research settings.

6. Multimodal Model Foundation:

- Prompt: “Discuss how integrating text and image data can improve the performance of AI models in recognizing emotional cues in social media posts.”

- Testing Area: Although phi-3-mini is primarily a language model, this prompt tests its theoretical knowledge about multimodal interactions, pertinent to its potential use in developing more comprehensive AI systems.

7. Generative AI Features:

- Prompt: “Create a detailed plan for a marketing campaign aimed at promoting renewable energy solutions in urban areas.”

- Testing Area: Evaluates the model’s ability to generate creative and strategic content, which is essential for marketing and advertising applications where generative AI can provide significant advantages.

These prompts and tests are designed to thoroughly assess the phi-3-mini model’s abilities, covering both its technical features and practical applications.

Here are the results in LMStudio

Source: Author-LM Studio

Note: Do enable GPU acceleration if your laptop/PC allows it.

Initial Query: “Can you summarize a 500-word article into just 50 words? Answer yes or no.”

- If Yes, Provide the Article: Once the model confirms its capability, provide a specific 400–500 word article on Bollywood. Then instruct, “Please summarize this 500-word article into no more than 50 words.”

USER: Can you summarize a 500-word article into just 50 words? Answer yes or no.
AI:Yes, I can condense the content of a 500-word article into about 50 words while retaining its core message and essence.
Summary: A brief encapsulation of an extensive 500-word article focusing on key points, main ideas, or conclusions presented by the author. (Exact word count may slightly vary)
USER: Please summarize this 500-word article into no more than 50 words. Dazzling Dreams and Dramatic Dances: A Deeper Look at Bollywood
Bollywood, a vibrant tapestry woven with song, dance, romance, and drama, is synonymous with Indian cinema for many. Based in Mumbai (formerly Bombay), it's the world's largest film producer, churning out hundreds of movies annually. But Bollywood is more than just prolific; it's a cultural phenomenon that has captivated audiences worldwide, offering a glimpse into Indian society, traditions, and dreams.
From Humble Beginnings to Technicolor Extravaganzas
The roots of Bollywood trace back to the early 20th century, with the first silent film, "Raja Harishchandra," released in 1913. This period laid the foundation for storytelling techniques that would become Bollywood staples – themes of social justice, family bonds, and overcoming adversity. With the arrival of sound in the 1930s, musicals took center stage. This fusion of storytelling with catchy music and elaborate dance routines became the signature style of Bollywood. Early musicals established the industry's penchant for grand visuals, historical epics, and larger-than-life characters.
The Power of Song and Dance: A Language Beyond Words
Music and dance are not mere embellishments; they are integral to the narrative. Songs express emotions, advance the plot, and create escapism. Lavishly choreographed dance sequences are a visual feast. The music itself blends traditional Indian melodies with contemporary influences. Popular playback singers achieved iconic status, their voices carrying generations of emotions.
Beyond Masala Movies: Evolving with the Times
While the classic "masala movie" formula remains popular, Bollywood has evolved. Independent filmmakers tackle social issues like poverty and women's rights. Films like "Lagaan" explored colonialism and "Taare Zameen Par" shed light on dyslexia. Bollywood stars increasingly use their platform for social change.
A Global Phenomenon: Crossing Borders and Hearts
Bollywood's storytelling transcends language barriers. The industry has a dedicated diaspora audience and is finding new fans through streaming platforms. Foreign collaborations and remakes are becoming more commonplace.
The Star System: Powerhouses of Glamour and Influence
The industry is a complex ecosystem with a star-driven culture. The Khans – Shah Rukh Khan, Salman Khan, and Aamir Khan – are household names, commanding immense fan loyalty. Actresses like Deepika Padukone and Alia Bhatt have achieved success. These stars are brand ambassadors, influencers, and cultural icons.
The Future of Bollywood: Embracing Change and Innovation
As technology advances, Bollywood is embracing visual effects and animation. There's a growing focus on portraying realistic storylines alongside the fantastical. Independent cinema continues to flourish. One thing remains constant – Bollywood's ability to entertain, inspire, and transport audiences. So, the next time you experience Bollywood's magic, appreciate its artistry.
AI: Bollywood is India’s cinematic powerhouse known for song-dance narratives, cultural impact, global reach, iconic stars, and evolving storytelling while maintaining a star-driven industry and embracing technological innovation.

2.

Prompt: “Provide real-time, step-by-step guidance on how to perform CPR.”

Source: Author-LM Studio; decent execution time.
Performing cardiopulmonary resuscitation (CPR) involves the following steps:

1. Assess the scene for safety before approaching the person in need.
2. Check responsiveness by gently tapping their shoulder and shouting, "Are you okay?"
3. Call emergency services (dial 911 or the relevant local emergency number) if there is no response.
4. If the individual is unresponsive and not breathing normally, begin CPR immediately:
a. Place the person on their back on a firm surface.
b. Kneel beside them and place the heel of one hand in the center of their chest.
c. Place your other hand on top of the first and interlock your fingers.
d. Keeping your elbows straight, use your upper body weight to administer compressions at a depth of 2 inches (5 cm) for adults, with a rate of 100-120 compressions per minute. Allow full chest recoil between compressions.
e. Perform CPR cycles (30 compressions followed by 2 rescue breaths). Continue until emergency medical services arrive or the person shows signs of life.

3.

Prompt: “If a warehouse sells pallets of water bottles, with each pallet holding 250 bottles and each bottle costing $1.20, what would be the total cost for 3 pallets?

To calculate the total cost for 3 pallets of water bottles:

1. Determine the number of bottles in one pallet: 250 bottles.
2. Multiply this by the price per bottle: 250 * $1.20 = $300 (cost for one pallet).
3. Since there are 3 pallets, multiply the cost for one pallet by 3: $300 * 3 = $900.

The total cost for 3 pallets of water bottles is $900.

4.

Prompt 1: “Describe the events leading up to World War II.”

The events leading up to World War II were complex and multifaceted, with a combination of political, economic, social, and technological factors contributing to its outbreak in 1939. Key elements include:

1. Treaty of Versailles (1919): The treaty that ended World War I imposed heavy reparations on Germany, stripped it of territory, limited its military capabilities, and placed blame for the war squarely on Germany and Austria-Hungary. These punitive measures created resentment among Germans and set the stage for political unrest and a desire to overturn these terms.

2. Rise of Totalitarian Regimes (1930s): The global economic downturn following the Great Depression provided fertile ground for extreme ideologies, such as fascism in Italy under Benito Mussolini, Nazi Germany under Adolf Hitler, and militaristic Japan to gain influence. These regimes promoted nationalism, military expansion, and territorial conquests, which eventually led to conflicts with neighboring countries.

3. Expansionist Policies: The desire for territorial expansion was a key motivation behind the aggression of Nazi Germany (German invasion of Poland), Fascist Italy (Invasion of Ethiopia in 1935 and Albania in 1939), and Imperial Japan (Japanese invasions of China, Manchuria, Korea). These actions violated international agreements and strained relationships with neighboring countries.

4. Failure of the League of Nations: The League of Nations was established after World War I to promote peace and prevent future conflicts. However, its effectiveness was limited by its inability to enforce decisions or sanctions on aggressor states, as well as the absence of key world powers such as the United States from the organization. This weakness allowed for unchecked expansionism by fascist regimes.

5. Non-Aggression Pact (Molotov-Ribbentrop Pact, 1939): The Soviet Union and Nazi Germany signed a non-aggression pact in August 1939. This agreement allowed Hitler to plan his invasion of Poland without fearing interference from the Soviets, while providing Stalin with time for military preparations against potential future Western threats.

6. Invasion of Poland (September 1, 1st 1939): The immediate spark that ignited World War II was Nazi Germany's invasion of Poland on September 1, 1939. This violated numerous international agreements and prompted Britain and France to declare war against Germany two days later, officially launching the global conflict.

In summary, a combination of factors such as unresolved issues from World War I, economic hardship, totalitarian ideologies, territorial ambitions, weakened diplomatic institutions, strategic alliances, and provocative military actions led to the outbreak of World War II in 1939.

Prompt 2: “Given those events, how did they contribute to the start of the war?”

The key events leading up to World War II contributed significantly towards its onset through a complex interplay between economic conditions, political ideologies, power dynamics, and military actions:

1. Treaty of Versailles: The harsh terms imposed upon Germany after World War I bred resentment among Germans, who felt humiliated by the perceived unjust punishment. This anger created a fertile environment for extremist ideologies to flourish and eventually contributed to Adolf Hitler's rise to power with promises of revitalizing German pride and restoring national greatness.

2. Rise of Totalitarian Regimes: The emergence of totalitarian governments such as Nazi Germany, Fascist Italy, and Imperial Japan introduced aggressive foreign policies centered around expansionism and militarization. These regimes sought to establish dominance in Europe, Africa, and Asia by forcefully annexing neighboring territories, violating international laws, and disregarding global peace agreements.

3. Expansionist Policies: The aggressive military campaigns initiated by the fascist states demonstrated their intentions to redraw borders through conquest rather than negotiation or compromise. Germany's annexation of Austria (Anschluss) and Czechoslovakia's Sudetenland, along with Italy's invasions of Ethiopia and Albania, exposed a dangerous trend towards global territorial ambitions that threatened regional stability.

4. Failure of the League of Nations: The inability of the League of Nations to prevent or deter aggressive actions by fascist regimes undermined faith in international diplomatic institutions and collective security mechanisms. This weakness, along with its lack of enforcement power, emboldened leaders like Hitler, Mussolini, and Tojo to pursue their expansionist goals without significant fear of repercussions from the global community.

5. Molotov-Ribbentrop Pact: The Soviet Union's decision to sign a non-aggression pact with Nazi Germany allowed Hitler to invade Poland without immediate Soviet opposition, while granting Stalin time to build up his military strength against potential future threats from the West. This strategic maneuver further complicated international efforts at maintaining peace and stability, as it appeared that two of Europe's most powerful nations had aligned themselves against each other or remained indifferent to one another’s interests.

6. Invasion of Poland: The invasion of Poland by Nazi Germany was a direct trigger for World War II. This act of aggression defied existing agreements and shocked the world, prompting Britain and France to declare war on Germany in support of Poland's sovereignty. As nations mobilized their forces and declared neutrality withdrawn, a global conflict ensued that would involve more than 30 countries within less than two years.

In conclusion, the events leading up to World War II created an atmosphere conducive to aggression, mistrust among world powers, unchecked ambitions by authoritarian states, and ultimately, a failure of diplomatic mechanisms to prevent war. These factors combined in 1939 to ignite a devastating conflict that would shape the course of the 20th century.

5.

Prompt: “Critique the impact of zero-shot learning capabilities on the utility of language models in unsupervised tasks.”

Zero-shot learning refers to a model's ability to understand and perform tasks without having seen any example from that specific task during training. This capability has significantly impacted the utility of language models, especially when it comes to unsupervised tasks where labeled data is scarce or nonexistent. Here’s a critique of this impact:


**Positive Aspects:**

1. **Flexibility and Generalization**: Zero-shot learning enhances the flexibility of language models, allowing them to generalize across tasks without task-specific tuning. This means that with zero-shot capabilities, a single model can potentially handle multiple unsupervised tasks such as semantic analysis, summarization, or topic detection, which is particularly useful when labeled data for every specific task is not available.

2. **Efficiency in Data Usage**: Unsupervised learning often struggles with the lack of labeled datasets; however, zero-shot capabilities can mitigate this issue by leveraging pre-training on diverse and extensive datasets, thus making efficient use of existing resources to tackle new tasks without additional training data.

3. **Innovation in Task Design**: With a robust zero-shot learning capacity, the design of unsupervised tasks could shift from creating extensive annotated datasets toward developing more creative and abstract task descriptions that models can understand and perform autonomously. This may lead to innovations in how we approach problem solving with language models.


**Critical Concerns:**

1. **Quality of Output**: Zero-shot learning might not always produce results as high quality as those achieved by supervised methods, which are specifically tuned for certain tasks. There is a risk that the generalization ability could lead to unreliable or less accurate outcomes on specific tasks due to overarching abstraction without task-specific refinements.

2. **Lack of Tailored Solutions**: While zero-shot learning allows models to work with various tasks, it might fail to provide tailor-made solutions for certain niche applications where specialized adjustments could significantly enhance performance. This is particularly critical in fields like healthcare or cybersecurity, where precision and specificity are paramount.

3. **Misalignment of Expectations**: There's a possibility that users might overestimate the zero-shot capabilities based on successes in certain areas, leading to disappointing results when applied to more complex tasks or domains with unique challenges not well-represented in the training data.

4. **Dependence on Pre-trained Models**: The effectiveness of zero-shot learning is heavily dependent on the breadth and quality of pre-training datasets. If the pre-trained model doesn't cover a wide enough array of concepts or linguistic structures, its performance in zero-shot scenarios could be limited despite theoretical capabilities.


In conclusion, while zero-shot learning significantly expands the utility of language models for unsupervised tasks by reducing the need for task-specific data and allowing for broader applications, it also raises concerns regarding output quality, precision for niche tasks, user expectations, and reliance on extensive pre-training. These factors should be considered when deploying these models in real-world scenarios to ensure that they meet desired standards of performance and utility.

6.

Prompt: “Discuss how integrating text and image data can improve the performance of AI models in recognizing emotional cues in social media posts.”

Integrating both textual and visual information, often referred to as multimodal learning, has shown promising improvements in various areas of artificial intelligence (AI), including emotion recognition from social media content. Here's a comprehensive discussion on the subject:


**Enhanced Context Understanding**:

Combining text and image data allows AI models to gain more context for interpreting emotional cues. Text alone can provide explicit expressions of sentiment, but visual elements such as facial expressions or body language often offer additional information about a user's true feelings that may not be fully expressed in the written content.


**Increased Accuracy**:

When AI models process both text and images concurrently, they can achieve higher accuracy levels compared to analyzing each modality separately. The complementarity of textual and visual data helps overcome individual limitations—text might be ambiguous or lack emotional depth, while an image could have misleading information due to poor lighting or other factors.


**Real-time Feedback Loop**:

Emotion recognition systems that integrate both modalities can create a feedback loop where the interpretation of visual cues informs text analysis and vice versran:

Integrating text and image data can improve AI's performance in recognizing emotional cues from social media posts by leveraging the complementary strengths of each modality. Here are several ways this integration enhances emotion recognition systems:


**Contextual Interpretation**:

Text often lacks contextual information, which can be vital for understanding emotions accurately. Images can provide cues such as facial expressions and scene setting that give more depth to the text analysis. For example, a post with seemingly positive words might actually convey sarcasm or disappointment if paired with an unhappy expression in the accompanying image.


**Cross-modal Confirmation and Corroboration**:

By analyzing both modalities simultaneously, AI systems can cross-verify emotional cues across text and images to reach more reliable interpretations. If a post's language indicates happiness but the corresponding image shows closed eyes (typically associated with sadness or tiredness), the system might weigh these conflicting signals and arrive at an interpretation that considers both aspects before making a judgment.


**Enhanced Learning from Underrepresented Emotions**:

Some emotions are less frequently expressed in textual form but can be vividly depicted through images, such as shock or surprise (e.g., someone with wide eyes). AI systems that process both modalities can better learn and represent these subtler emotional states than those focusing on a single modality.


**Robustness to Data Variability**:

Social media data is highly variable, including different languages, slang, emojis, image quality, and contextual nuances. Multimodal systems are more robust as they can leverage text for language processing while using images to fill in the gaps or clarify ambiguities that arise from text alone.


**Innovative Feature Engineering**:

The integration of multimodal data encourages innovative feature engineering, where features are derived not just from text and image separately but also from their interactions. For example, a system might learn specific patterns linking certain phrases with corresponding facial expressions or post content to enhance emotion recognition accuracy.


**Challenges in Integration**:

While the benefits of multimodal integration are substantial, it also presents several challenges including data synchronization (ensuring text and images align correctly), computational demands for processing large amounts of diverse data, and designing algorithms that can effectively combine features from both modalities.


In conclusion, integrating text and image data in emotion recognition AI models enhances their performance by providing a richer contextual understanding and enabling more accurate interpretations through cross-validation between the two types of information. Despite its challenges, this approach is pivotal for creating sophisticated systems capable of nuanced social media analysis.

7.

Prompt: “Create a detailed plan for a marketing campaign aimed at promoting renewable energy solutions in urban areas.”

Title: Urban Renewal - A Marketing Campaign for Promoting Renewable Energy Solutions

Campaign Objective: To raise awareness and promote the adoption of renewable energy solutions, such as solar panels, wind turbines, and geothermal systems in urban areas. The campaign aims to highlight benefits like cost savings, environmental impact, and government incentives for both residential and commercial property owners.

Target Audience: Urban residents and businesses with an interest in sustainable living and green energy solutions.

Campaign Duration: Six months leading up to Earth Day (April 22)

Key Messages:
1. Transforming cities through renewable energy - making a positive environmental impact while saving money on energy costs.
2. Government incentives available for adopting green solutions in urban settings.
3. Enhancing the quality of life and promoting community pride by embracing clean, efficient energy sources.
4. Solar power as an affordable solution to reduce carbon footprints without compromising on comfort or convenience.
5. Wind energy: harnessing nature's power in urban spaces for sustainable growth and development.
6. Geothermal heating/cooling systems - reliable, efficient, eco-friendly alternatives to conventional HVAC solutions.

Key Components of the Marketing Campaign:

1. Branding & Visual Identity: Develop a consistent brand identity with an emphasis on urban sustainability and green living. The visuals should communicate energy efficiency and modernity through clean lines, earth tones, and vibrant illustrations showcasing renewable energy solutions in action.

2. Multi-Channel Marketing Strategy: Launch the campaign across various platforms to maximize reach and engagement - social media (Facebook, Twitter, Instagram), email marketing, print ads, event sponsorships, webinars/podcasts, influencer partnerships, and outdoor advertising.
Written content should be designed with an urban perspective in mind while emphasizing benefits like cost-savings, reduced environmental impact, government incentives, increased property values, community pride, and improved living conditions.
3. Educational Content & Webinars: Host webinars featuring experts discussing the importance of renewable energy in urban areas, showcasing successful implementation case studies from similar cities, and addressing common concerns related to costs, installation, and maintenance. Provide downloadable resources such as guides on calculating ROI, government incentives available, and step-by-step installation instructions for solar panels or geothermal systems.
4. Events & Partnerships: Host green energy expos/conferences featuring keynote speakers from industry leaders, renewable energy companies, NGOs, and municipal officials promoting urban sustainability initiatives. Establish partnerships with eco-friendly businesses and local influencers to broaden the campaign's reach through cross-promotions and sponsorships.
5. Digital Advertising: Utilize targeted online ads (Google, social media) emphasizing key benefits of adopting renewable energy solutions in urban areas such as cost savings, reduced environmental impact, government incentives, and community pride. Create share-worthy content featuring compelling visuals and testimonials from early adopters to encourage engagement and virality on social media platforms.
6. Print Advertising: Partner with popular urban lifestyle magazines for print advertisements highlighting the benefits of renewable energy solutions in an accessible, relatable manner. Develop eye-catching posters featuring iconic city landmarks and images of clean, green technology to be displayed at public transportation hubs, shopping centers, and community gathering spots.
7. Influencer Marketing: Collaborate with popular local influencers who are passionate about sustainability for social media takeovers, sponsored posts, and lifestyle vlogs focused on eco-friendly urban living. Leverage their reach to engage the target audience more effectively.
8. Earth Day Event: Organize a high-profile event showcasing successful renewable energy solutions in urban areas featuring live demonstrations, panel discussions, workshops, and interactive activities for attendees. This will serve as the campaign's culmination point while reinforcing key messages around community pride and environmental responsibility.
9. Measurement & Analysis: Establish KPIs (key performance indicators) such as website traffic, social media engagement, event attendance, lead generation, conversion rates, cost per acquisition, return on investment to measure campaign success and adjust strategies accordingly.

These responses were evaluated with that of GPT-4 and the following table was arrived.

source: Author-ChatGPT4

In the evaluation methodology used above, each model — the phi-3-mini and GPT-4 — was tested across the aforementioned domains that cover a broad spectrum of capabilities, such as general knowledge, real-time applications, reasoning, long context understanding, and specific AI functionalities. For each domain, a specific prompt was given, designed to test particular skills or applications relevant to that area. The models’ responses were then rated by ChatGPT4 on a scale of 1 to 5 based on how effectively and comprehensively they addressed the prompt. This evaluation aimed to compare the effectiveness, accuracy, and depth of understanding demonstrated by each model in responding to complex queries and tasks. The ratings provided a quantitative measure to help discern the performance level of each model across the tested areas.

Using Phi-3-mini with LM Studio, Neo4j Knowledge graph, Langchain and Langsmith

Please refer to my previous article on the same topic, as we will reference the same knowledge graph here.

Here is additional code to integrate Langsmith (a simple registration is required to set up a basic account).

import os

# Update with your API URL if using a hosted instance of Langsmith.
os.environ["LANGCHAIN_ENDPOINT"] = "https://api.smith.langchain.com"
os.environ["LANGCHAIN_API_KEY"] = "your-key" # Update with your API key
os.environ["LANGCHAIN_TRACING_V2"] = "true"
project_name = "project_name" # Update with your project name
os.environ["LANGCHAIN_PROJECT"] = project_name # Optional: "default" is used if not set

Modified retriever code

# Create a retriever from the vector store
windowless_retriever = neo4j_vector_store.as_retriever()

# Create a chatbot Question & Answer chain from the retriever
windowless_chain = RetrievalQAWithSourcesChain.from_chain_type(
ChatOpenAI(model="microsoft/Phi-3-mini-4k-instruct-gguf", base_url="http://localhost:1234/v1", api_key="lm-studio"),
chain_type="stuff",
retriever=windowless_retriever
)

Ouput:

Source: Author-Langsmith

The output is pretty decent, and the inference time is less than 7 seconds, which is quite good for a small model. The accuracy of the response could be improved by enhancing the knowledge graph and chain types.

In conclusion, the Phi-3-mini model represents a significant advancement in the field of AI, demonstrating that smaller models can indeed deliver high performance. Its compact size makes it suitable for mobile applications, opening up a host of possibilities for on-device AI applications. Despite limitations in certain areas, such as factual knowledge storage and multilingual capabilities, the model’s overall performance is commendable. The ability to run locally on a phone and access additional context through external applications is an added advantage, and its potential in various AI tasks is promising.

References:

  1. microsoft/Phi-3-mini-4k-instruct-gguf · Hugging Face
  2. [2404.14219] Phi-3 Technical Report: A Highly Capable Language Model Locally on Your Phone (arxiv.org)
  3. Neo4j Graph Database & Analytics | Graph Database Management System
  4. Laksh-star · GitHub

Stackademic 🎓

Thank you for reading until the end. Before you go:

--

--