Exploring Mistral Large with Function Calling — Code

azhar
azhar labs
10 min readFeb 29, 2024

--

In the ever-evolving world of artificial intelligence, Mistral AI has emerged as a noteworthy contender with its latest model release. This article delves into the unique aspects of Mistral AI’s approach and how it stands out, especially in comparison to other giants like OpenAI. We’ll examine its capabilities in normal prompting and function calling, and how it presents itself as a viable alternative in the AI domain.

Before we proceed, let’s stay connected! Please consider following me on Medium, and don’t forget to connect with me on LinkedIn for a regular dose of data science and deep learning insights.” 🚀📊🤖

Mistral AI Company: A Rapid Rise to Prominence

Mistral AI, though a relatively new player in the field, has quickly made a name for itself. Within a span of less than a year, the company has not only developed proprietary models but also launched a significant open-source 7 billion parameter models. This feat becomes even more impressive when considering that they achieved this with a budget of just 20 million euros, a fraction of what companies like OpenAI spend on similar endeavors.

Key Points of Interest

  1. Cost-Effectiveness: Mistral AI’s ability to produce high-quality models with considerably lower budgets.
  2. Rapid Development: The impressive speed of development, releasing advanced models in under 10 months.
  3. Open Source and Proprietary Models: Offering both open-source and proprietary models, catering to a wide range of needs.

Mistral AI’s latest model, Mistral Au Large symbolizes their venture into new territories in the AI landscape. What sets them apart is their straightforward approach to release and distribution. They’ve partnered with Azure for easy accessibility, yet the model remains proprietary.

A Strategic Advantage

A crucial differentiating factor for Mistral AI is its openness to on-premises deployment. This flexibility is a game-changer for clients with sensitive data, like hedge funds, who require on-prem hosting for security reasons. This strategy could give Mistral AI a significant edge in attracting a niche but substantial client base.

Performance and Accessibility

The Mistral AI API, which I had the opportunity to test, proved to be remarkably fast and efficient. The model is accessible via both their platform, La Plateforme, and Azure, providing flexibility and ease of access to various users.

Benchmarking and Comparisons

When evaluating the performance of Mistral’s model, it’s important to remember that benchmarking in the AI field often involves comparing different models under varying conditions, which might not always be a like-for-like comparison. Nonetheless, the benchmarks will provide insights into the model’s capabilities in various domains such as natural language understanding, generation, and function calling.

MMLU Benchmark

Mistral AI’s latest release, showcased in their MMLU benchmark, places them commendably second to GPT-4. However, a broader perspective, including other models like Gemini Ultra and Gemini Pro 1.5, repositions them at fourth. This shift underscores a significant issue in the AI field: benchmarks can often be misleading or inadequate in capturing a model’s true capabilities. While Mistral AI’s model is robust, aligning it with GPT-4 or an unnerfed Gemini model, benchmarks alone cannot encapsulate its distinct features and strengths.

MultilingualCapabilities

One of the standout features of Mistral AI’s model is its multilingual nature. Focused primarily on Western European languages, it bypasses Asian languages, a common trend among AI models. Although limited in linguistic diversity, its proficiency in multiple European languages is a notable advancement beyond English-centric models.

Enhanced Context Window

The 32k context window of the model is noteworthy. While not groundbreaking, it is a substantial improvement over smaller context windows, allowing for more nuanced and complex interactions. Future iterations may even expand this window, enhancing the model’s depth and versatility.

Precise Instruction Following

A key advantage of Mistral AI’s model is its ability to precisely follow instructions. This trait not only improves user experience but also offers developers the flexibility to set their own moderation policies. This aspect raises questions about the ease of jailbreaking prompts, a concern for any AI system.

Native Function Calling and Reasoning

The model’s strength in reasoning, particularly in scenarios like GSM 8K, is commendable. Its ability to perform native function calling indicates a shift towards more intuitive and context-aware AI interactions. This functionality could set a new standard in the AI landscape, especially compared to other models that may compromise reasoning abilities for safety and alignment training.

Introducing “Le Chat”

Alongside the model, Mistral AI introduced “Le Chat,” a chat platform that enables users to interact with different models, including testing outputs and function calling capabilities. This platform serves as a practical tool for users to directly experience the model’s strengths and idiosyncrasies.

Exploring Mistral Large with LangChain

Setup and API Integration

import os

os.environ["MISTRAL_API_KEY"] = ''

To evaluate Mistral Large, the latest model from Mistral AI, we use LangChain for API calls. Users need to acquire a Mistral key and integrate it.

from langchain_core.messages import HumanMessage, SystemMessage
from langchain_core.prompts import ChatPromptTemplate
from langchain_mistralai.chat_models import ChatMistralAI

from IPython.display import display, Markdown

We utilize various functions from LangChain, such as invoke_chain, streaming_chains, and batch_chains, to interact with Mistral Large.

Testing and Responses

chat = ChatMistralAI(
model="mistral-large-latest",
temperature=0.7,
max_tokens=128,
)

messages = [HumanMessage(content="knock knock")]
response = chat.invoke(messages)

display(Markdown(response.content))

>> Who's there? (This is how you respond to a "knock knock" prompt in a conversation, indicating that you're ready to hear the name or punchline that follows.)

The model exhibits a unique response style, distinct from OpenAI, Google, and Anthropic models. Interestingly, its response structure and style differ from the Mistral 7B model, likely due to enhanced training on more extensive data sets.

response =  chain.invoke({"topic": "Space"})

display(Markdown(response.content))

>> Why don't planets like to gossip?

Because they know that in space, nothing is truly private - even whispers can travel across the vast emptiness and reach unintended ears. Plus, they're always focused on maintaining their orbits and not causing any cosmic chaos! But here's a lighter, more traditional joke for you:

Why did the sun go to school?

To get a little brighter!

Test prompts, ranging from analogies to specific instructions, reveal Mistral Large’s adeptness at following detailed system prompts and producing diverse, structured responses.

Instruction-Based Responses

generate('Write a short email to Sam Altman giving reasons to open source GPT-4',
system_prompt="You are Mistral Large, a large language model trained by Mistral. Write out your reasoning step-by-step to be sure you get the right answers!",
max_length=512)

>>
Subject: The Compelling Case for Open-Sourcing GPT-4

Dear Sam Altman,

I trust this message finds you well. I am writing to discuss a potential strategy that could significantly benefit the AI community and OpenAI's reputation: open-sourcing GPT-4.

Accelerating AI Development: Open-sourcing GPT-4 would allow developers worldwide to contribute to its development, leading to a faster pace of innovation. This collective effort could lead to breakthroughs that might have taken longer to achieve in a closed- source environment.

Transparency and Trust: Open-sourcing would demonstrate OpenAI's commitment to transparency, a key aspect of building trust in AI. It would allow the community to understand how the model works, its limitations, and potential biases, fostering a more informed conversation around AI ethics and safety.

Community Engagement: Open-source projects often foster a vibrant, engaged community. By open-sourcing GPT-4, OpenAI could attract a diverse range of talent and perspectives, enriching the AI community and potentially leading to the development of new use cases and applications.

Educational Opportunities: GPT-4's source code could serve as a valuable educational resource for those interested in AI, helping to cultivate the next generation of AI researchers and developers.

Feedback and Improvement: An open-source GPT-4 would allow the community to provide feedback and suggestions for improvement, potentially leading to a more robust and reliable model.

I understand that open-sourcing a project as significant as GPT-4 is a complex decision with potential risks. However, I believe the benefits could far outweigh these risks, and I would be happy to discuss this further if you're interested.

Thank you for considering this proposal.

Best regards,

[Your Name]

Tests for creating analogies, explaining differences, and writing emails demonstrate Mistral Large’s precision in instruction-following. Unlike some models that might focus solely on reasoning or output, Mistral Large balances content generation with explicit instructions, adding a layer of user-directed customization.

Creative Writing and Code Generation

generate("x + 2x + 4x =  847 What is x?",
system_prompt="You are Mistral Large, a large language model trained by Mistral. Write out your reasoning step-by-step to be sure you get the right answers!",
max_length=2048)

>>
Sure, let's solve this step by step:

Step 1: Combine like terms on the left side of the equation. The terms x, 2x, and 4x are like terms, so we can add them together:

x + 2x + 4x = 1x + 2x + 4x = 7x

So, the equation becomes:

7x = 847

Step 2: To solve for x, we need to isolate x. We can do this by dividing both sides of the equation by 7:

7x / 7 = 847 / 7

So, x = 847 / 7

Step 3: Perform the division:

x = 121

So, the solution to the equation x + 2x + 4x = 847 is x = 121.

While Mistral Large performs well in creative writing tasks, it may not be the best in this domain compared to other models. However, its code generation capabilities, particularly in GSM 8K problems, are outstanding, matching the performance level of GPT-4.

Comparing Mistral Medium and Large

In addition to Mistral Large, we explored Mistral Medium, which, while similar in some respects, showed differences in logic and reasoning capabilities. Mistral Medium seems more reliant on step-by-step logic, akin to Mistral 7B, and in some cases, even outperforms the larger model in character capture in specific prompts.

Performance in GSM 8K (Medium)

generate("x + 2x + 4x =  847 What is x?",
system_prompt="You are Mistral Medium, a large language model trained by Mistral. Write out your reasoning step-by-step to be sure you get the right answers!",
max_length=2048)

>>
Step 1: Combine like terms on the left side of the equation. x + 2x + 4x = (1x + 2x + 4x) = 7x

Step 2: The equation now reads 7x = 847. To solve for x, divide both sides of the equation by 7.

Step 3: x = 847 / 7

Step 4: Perform the division to find the value of x. x ≈ 121.0

So, x is approximately equal to 121.0.

Both models excel in GSM 8K questions, with the medium version showing different but equally effective reasoning patterns. This suggests that Mistral AI’s training approach maintains a high standard of quality across different model sizes.

Setting Up for Function Calling

def take_away_order(food_items: str) -> str:
return f"Your {food_items} is on the way!"

def online_booking(day: str,time: str) -> str:
return f"Your booking is set for {day} at {time}. no need to reconfirm please arrive on time"

Mistral Large showcases its prowess in function calling, a feature becoming increasingly vital in practical AI applications. We simulate a restaurant scenario with two functions: a takeaway order and an online booking system. The setup involves creating simple tools that can be extended for actual API calls or integrations with point-of-sale systems.

JSON Schema for Function Arguments

tools = [
{
"type": "function",
"function": {
"name": "take_away_order",
"description": "Put an order in for the food that wou want to pick up and takeaway",
"parameters": {
"type": "object",
"properties": {
"food_items": {
"type": "string",
"description": "The food items you want to order",
}
},
"required": ["food_items"],
},
},
},
{
"type": "function",
"function": {
"name": "online_booking",
"description": "placing a booking at the restaurant for lunch or dinner",
"parameters": {
"type": "object",
"properties": {
"day": {
"type": "string",
"description": "The day you want to book for to come and eat at the restuarant",
},
"time": {
"type": "string",
"description": "The time you want to book for lunch or dinner",
}
},
"required": ["day", "time"],
},
},
}
]

Creating a JSON schema for each tool is crucial. It defines the arguments that the model needs to process, like food items for the takeaway order and day and time for the online booking. This schema guides the model in understanding and extracting necessary information from user interactions.

Conversation Flow and Tool Integration

import functools

names_to_functions = {
'take_away_order': functools.partial(take_away_order),
'online_booking': functools.partial(online_booking)
}

messages = [
ChatMessage(role="user", content="Hi Can I put in a food order to takeaway pick up in 30min please")
]

We observe the model’s ability to handle a conversation, identify when to call a function, and extract the required arguments. The model demonstrates proficiency in understanding user requests, asking clarifying questions, and processing the information to trigger the appropriate tool. For instance, in the booking scenario, the model skillfully extracts the day and time from the conversation and uses these details to make a booking.

Handling Multiple Arguments and Conditional Logic

response = client.chat(
model=model,
messages=messages,
tools=tools,
tool_choice="auto"
)

response

>>
ChatCompletionResponse(id='78c0fc06243d4687af0e317d85324a87', object='chat.completion', created=1709108038, model='mistral-large-latest', choices=[ChatCompletionResponseChoice(index=0, message=ChatMessage(role='assistant', content='', name=None, tool_calls=[ToolCall(id='null', type=<ToolType.function: 'function'>, function=FunctionCall(name='take_away_order', arguments='{"food_items": "fish and chips and a souvlaki"}'))]), finish_reason=<FinishReason.stop: 'stop'>)], usage=UsageInfo(prompt_tokens=262, total_tokens=294, completion_tokens=32))

The model efficiently handles scenarios where multiple arguments are required or when some arguments are initially missing. It engages in a logical and natural conversation flow to gather all necessary information before executing a tool call. This capability shows the model’s adaptability and potential in various interactive and user-facing applications.

Conclusion

Mistral Large represents a significant advancement in the realm of large language models, particularly in the context of practical application and function calling. Its ability to process and execute complex tasks while maintaining a natural conversational flow is a testament to its sophisticated design and capabilities.

Future developments, such as the anticipated function-calling feature in Gemini models, will further enrich the landscape of AI applications. For now, Mistral Large offers a compelling alternative, especially for those seeking a cost-effective yet powerful AI tool.

As the AI field continues to evolve, Mistral Large sets a high standard in versatility and efficiency. It’s a model worth exploring for various use cases, promising to enhance and streamline numerous tasks and processes across industries.

--

--

azhar
azhar labs

Data Scientist | Exploring interesting (research paper / concepts). LinkedIn : https://www.linkedin.com/in/mohamed-azharudeen/