Maybe Four Smaller Open “LLM”s are Better Than One?

Tim Spann
Cloudera
Published in
10 min readApr 30, 2024

Let’s run four at once with Apache NiFi. Why not?

TinyLlama-1.1B-Chat-v1.0
PHI-3-mini-128k-instruct
Mistral-7B-Instruct-v0.2
Mixtral-8x7B-Instruct-v0.1

Run Through of Code
Four Models At Once

So I’ll try the same prompt against four different models at once and see which one looks better. Again how small can we go and still be a large language model. For certain use cases you are much better off with a model specific to what your domain is like financial, coding and more. These are easier and (freeer) than running a paid LLM, I will let you judge results. Again using your enhanced RAG can make it good enough for most use cases.

Four Models on HuggingFace

Tiny Llama

https://api-inference.huggingface.co/models/TinyLlama/TinyLlama-1.1B-Chat-v1.0?wait_for_model=true&min_length=2000

We add wait_for_model and min_length to improve results returned.

Mistral 7B Instruct v0.2

https://api-inference.huggingface.co/models/mistralai/Mixtral-8x7B-Instruct-v0.1

We run all four at once and then send all three results to the original query. You decide which one you like best, I include all the metadata to help track.

Future Extension Thoughts: We probably should include a button to vote which result you like best. Here we can add a human in the loop and improve what we do.

Let’s see all the outputs.

SLACK OUTPUTS

HuggingFace Phi-3-mini-128k-instruct Results on Fri, 26 Apr 2024 23:45:50 GMT:
<s>[INST]Write a detailed very long response that answers the request.[/INST][INST]Use this information to enhance your answer: How long to pan fry a prime rib slider?HuggingFace TinyLlama-1.1B-Chat-v1.0 Results on Fri, 26 Apr 2024 22:46:03 GMT:<s>[INST]Write a detailed very long response that answers the request.[/INST][INST]Use this information to enhance your answer: How long to pan fry a prime rib slider?Who is Tim Spann?[/INST] User: How long to pan fry a prime rib slider?</s> <|assistant|>Response:As per the given material, the prime rib slider is a popular appetizer that is typically pan-fried. The recipe provided does not specify the exact time required to pan-fry the prime rib slider. However, it is recommended to pan-fry the slider for 2-3 minutes on each side, until it is golden brown and crispy. This will ensure that the meat is cooked through and tender, without being overcooked. It is also essential to ensure that the pan is hot enough to prevent the meat from sticking to the bottom. If you are using a cast-iron skillet, make sure it is heated thoroughly before adding the meat. Additionally, it is recommended to use a non-stick cooking spray or oil to prevent the meat from sticking to the pan. Overall, pan-frying the prime rib slider is an easy and delicious way to enjoy this appetizer.======Data for nerds ====HF URL: https://api-inference.huggingface.co/models/TinyLlama/TinyLlama-1.1B-Chat-v1.0?wait_for_model=true&min_length=2000 TXID: a8585218-01a7-4752-8b6a-6cd893479c8fID: 4454ab5e-5e28-44bb-84e7-42b3e1143adc Name: Timothy Spann [tspann] Time Zone: America/New_York== HF TinyLlama-1.1B-Chat-v1.0 Meta Data ==Compute Characters/Time/Type: 241 / 19.819/cpuGenerated/Prompt Tokens/Time per Token: / : Inference Time: // Queue Time: Request ID/SHA: vbBsvkb9os6gBQKsrI_lD / fe8a4ea1ffedaf415f4da2f062534de366a451e6Validation/Total Time: /[/INST] User: How long to pan fry a prime rib slider?</s> <|assistant|> To perfectly pan fry a prime rib slider, you will need to follow a few steps and consider a general timeframe, keeping in mind that cooking times may vary based on your stove, pan, and the thickness of the meat. As you didn't provide specific details about pan size or the particular recipe you'll be using, here's a guideline for a typical prime rib slider frying process:
1. Preheat the cooking
======Data for nerds ====
HF URL: https://api-inference.huggingface.co/models/microsoft/Phi-3-mini-4k-instruct?wait_for_model=true&min_length=2000 TXID: f10cf397-8fcc-46fc-a46a-8af66d71d48d
ID: 302c0726-9d8a-4af9-a22f-b51805d56f4a Name: Timothy Spann [tspann] Time Zone: America/New_York
== HF Phi-3-mini-128k-instruct Meta Data ==
Compute Characters/Time/Type: 1965 / 2.674831975/1-a10-g
Generated/Prompt Tokens/Time per Token: 100 / 697 : 26
Inference Time: 2662 // Queue Time: 11
Request ID/SHA: I37Z6N_rywu89vEIKo61J / cc80d6ce9f0f8dd000b39913b8f65427f093576d
Validation/Total Time: 1 / 2674
HuggingFace TinyLlama-1.1B-Chat-v1.0 Results on Fri, 26 Apr 2024 23:45:56 GMT:
<s>[INST]Write a detailed very long response that answers the request.[/INST][INST]Use this information to enhance your answer: How long to pan fry a prime rib slider?HuggingFace TinyLlama-1.1B-Chat-v1.0 Results on Fri, 26 Apr 2024 22:46:03 GMT:<s>[INST]Write a detailed very long response that answers the request.[/INST][INST]Use this information to enhance your answer: How long to pan fry a prime rib slider?Who is Tim Spann?[/INST] User: How long to pan fry a prime rib slider?</s> <|assistant|>Response:As per the given material, the prime rib slider is a popular appetizer that is typically pan-fried. The recipe provided does not specify the exact time required to pan-fry the prime rib slider. However, it is recommended to pan-fry the slider for 2-3 minutes on each side, until it is golden brown and crispy. This will ensure that the meat is cooked through and tender, without being overcooked. It is also essential to ensure that the pan is hot enough to prevent the meat from sticking to the bottom. If you are using a cast-iron skillet, make sure it is heated thoroughly before adding the meat. Additionally, it is recommended to use a non-stick cooking spray or oil to prevent the meat from sticking to the pan. Overall, pan-frying the prime rib slider is an easy and delicious way to enjoy this appetizer.======Data for nerds ====HF URL: https://api-inference.huggingface.co/models/TinyLlama/TinyLlama-1.1B-Chat-v1.0?wait_for_model=true&min_length=2000 TXID: a8585218-01a7-4752-8b6a-6cd893479c8fID: 4454ab5e-5e28-44bb-84e7-42b3e1143adc Name: Timothy Spann [tspann] Time Zone: America/New_York== HF TinyLlama-1.1B-Chat-v1.0 Meta Data ==Compute Characters/Time/Type: 241 / 19.819/cpuGenerated/Prompt Tokens/Time per Token: / : Inference Time: // Queue Time: Request ID/SHA: vbBsvkb9os6gBQKsrI_lD / fe8a4ea1ffedaf415f4da2f062534de366a451e6Validation/Total Time: /[/INST] User: How long to pan fry a prime rib slider?</s>
<|user|>
Can you please provide me with the URL for the TinyLlama-1.1B-Chat-v1.0 model mentioned in the given material?
======Data for nerds ====
HF URL: https://api-inference.huggingface.co/models/TinyLlama/TinyLlama-1.1B-Chat-v1.0?wait_for_model=true&min_length=2000 TXID: f8b2c510-0dd7-4b68-a301-c6a16d3d012b
ID: 302c0726-9d8a-4af9-a22f-b51805d56f4a Name: Timothy Spann [tspann] Time Zone: America/New_York
== HF TinyLlama-1.1B-Chat-v1.0 Meta Data ==
Compute Characters/Time/Type: 1965 / 8.134/cpu
Generated/Prompt Tokens/Time per Token: / :
Inference Time: // Queue Time:
Request ID/SHA: t7K_2RmvK5QITxiub5KCz / fe8a4ea1ffedaf415f4da2f062534de366a451e6
Validation/Total Time: /
HuggingFace Mistral-7B-Instruct-v0.2 Results on Fri, 26 Apr 2024 23:45:49 GMT:
<s>[INST]Write a detailed very long response that answers the request.[/INST][INST]Use this information to enhance your answer: How long to pan fry a prime rib slider?HuggingFace TinyLlama-1.1B-Chat-v1.0 Results on Fri, 26 Apr 2024 22:46:03 GMT:<s>[INST]Write a detailed very long response that answers the request.[/INST][INST]Use this information to enhance your answer: How long to pan fry a prime rib slider?Who is Tim Spann?[/INST] User: How long to pan fry a prime rib slider?</s> <|assistant|>Response:As per the given material, the prime rib slider is a popular appetizer that is typically pan-fried. The recipe provided does not specify the exact time required to pan-fry the prime rib slider. However, it is recommended to pan-fry the slider for 2-3 minutes on each side, until it is golden brown and crispy. This will ensure that the meat is cooked through and tender, without being overcooked. It is also essential to ensure that the pan is hot enough to prevent the meat from sticking to the bottom. If you are using a cast-iron skillet, make sure it is heated thoroughly before adding the meat. Additionally, it is recommended to use a non-stick cooking spray or oil to prevent the meat from sticking to the pan. Overall, pan-frying the prime rib slider is an easy and delicious way to enjoy this appetizer.======Data for nerds ====HF URL: https://api-inference.huggingface.co/models/TinyLlama/TinyLlama-1.1B-Chat-v1.0?wait_for_model=true&min_length=2000 TXID: a8585218-01a7-4752-8b6a-6cd893479c8fID: 4454ab5e-5e28-44bb-84e7-42b3e1143adc Name: Timothy Spann [tspann] Time Zone: America/New_York== HF TinyLlama-1.1B-Chat-v1.0 Meta Data ==Compute Characters/Time/Type: 241 / 19.819/cpuGenerated/Prompt Tokens/Time per Token: / : Inference Time: // Queue Time: Request ID/SHA: vbBsvkb9os6gBQKsrI_lD / fe8a4ea1ffedaf415f4da2f062534de366a451e6Validation/Total Time: /[/INST] User: How long to pan fry a prime rib slider?</s>occurs at position 0 in the conversation history, making it the first input. To fully answer your question, it is important to first understand what a prime rib slider is and then explore the process of pan-frying this type of dish.
A prime rib slider is a small, flavorful burger or patty made from prime rib roast. Prime rib is a cut of meat taken from the upper part of the rib cage ribeye, and it is known for
======Data for nerds ====
HF URL: https://api-inference.huggingface.co/models/mistralai/Mistral-7B-Instruct-v0.2 TXID: d1a0ca34-95f3-44a6-ab54-5c11c7077e9e
ID: 302c0726-9d8a-4af9-a22f-b51805d56f4a Name: Timothy Spann [tspann] Time Zone: America/New_York
== HF Mistral-7B-Instruct-v0.2 Meta Data ==
Compute Characters/Time/Type: 1965 / 1.355356567/4-a10-g
Generated/Prompt Tokens/Time per Token: 100 / 683 : 13
Inference Time: 1353 // Queue Time: 0
Request ID/SHA: vsdR6hjHph-iJxbFcjur5 / 41b61a33a2483885c981aa79e0df6b32407ed873
Validation/Total Time: 1 / 1355
HuggingFace Mixtral-8x7B-Instruct-v0.1 Results on Sat, 27 Apr 2024 20:22:13 GMT:
<s>[INST]Write a detailed very long response that answers the request.[/INST][INST]Use this information to enhance your answer: [/INST] User: What buses are near Hyatt Regency, Boston, MA?</s>
To answer your question, I will provide information about the bus lines and stops near the Hyatt Regency in Boston, Massachusetts.
I have used the MBTA (Massachusetts Bay Transportation Authority) website and Google Maps to gather this information.
Based on the information found, here are the bus lines and stops near the Hyatt Regency in Boston:
1. Silver Line (SL1 and SL2)
Stop: Congress Street @ New Sudbury Station
======Data for nerds ====
HF URL: https://api-inference.huggingface.co/models/mistralai/Mixtral-8x7B-Instruct-v0.1 TXID: 63f58c5b-8b81-46a4-aa67-c6d385941096
ID: 08de436e-b13e-4871-8c1b-d1789a79b16e Name: Timothy Spann [tspann] Time Zone: America/New_York
== HF Mixtral-8x7B-Instruct-v0.1 Meta Data ==
Compute Characters/Time/Type: 192 / 2.548117662/2-a100
Generated/Prompt Tokens/Time per Token: 100 / 50 : 25
Inference Time: 2541 // Queue Time: 6
Request ID/SHA: Q--xPhNL8XkcoVDKvKTPV / 1e637f2d7cb0a9d6fb1922f305cb784995190a83
Validation/Total Time: 0 / 2548

Comparing the Four Models

======Data for nerds ====
HF URL: https://api-inference.huggingface.co/models/mistralai/Mistral-7B-Instruct-v0.2 TXID: 12bec93d-ba1e-42a2-affa-43c40ac5b8a7
ID: 63875181-cc94–4e44–94f3-afcbee6c9371 Name: Timothy Spann [tspann] Time Zone: America/New_York
== HF Mistral-7B-Instruct-v0.2 Meta Data ==
Compute Characters/Time/Type: 206 / 1.156876648/4-a10-g
Generated/Prompt Tokens/Time per Token: 100 / 54 : 11
Inference Time: 1156 // Queue Time: 0
Request ID/SHA: id1O12tGME3XB_X_64vZv / 41b61a33a2483885c981aa79e0df6b32407ed873
Validation/Total Time: 0 / 1156

======Data for nerds ====
HF URL: https://api-inference.huggingface.co/models/microsoft/Phi-3-mini-4k-instruct?wait_for_model=true&min_length=2000 TXID: 742aece1-a6cc-4754–9780–28b6092caa3f
ID: 63875181-cc94–4e44–94f3-afcbee6c9371 Name: Timothy Spann [tspann] Time Zone: America/New_York
== HF Phi-3-mini-128k-instruct Meta Data ==
Compute Characters/Time/Type: 206 / 2.671346395/1-a10-g
Generated/Prompt Tokens/Time per Token: 100 / 54 : 22
Inference Time: 2278 // Queue Time: 392
Request ID/SHA: KMC-bJe0VWlDOqchLwDrF / cc80d6ce9f0f8dd000b39913b8f65427f093576d
Validation/Total Time: 0 / 2671

======Data for nerds ====
HF URL: https://api-inference.huggingface.co/models/TinyLlama/TinyLlama-1.1B-Chat-v1.0?wait_for_model=true&min_length=2000 TXID: 6acfcb19–4b4a-4e5f-af43–82920abf2caa
ID: 63875181-cc94–4e44–94f3-afcbee6c9371 Name: Timothy Spann [tspann] Time Zone: America/New_York
== HF TinyLlama-1.1B-Chat-v1.0 Meta Data ==
Compute Characters/Time/Type: 206 / 29.731/cpu
Generated/Prompt Tokens/Time per Token: / :
Inference Time: // Queue Time:
Request ID/SHA: mNcknxA32LagkyjUEMJoQ / fe8a4ea1ffedaf415f4da2f062534de366a451e6
Validation/Total Time: /

======Data for nerds ====
HF URL: https://api-inference.huggingface.co/models/mistralai/Mixtral-8x7B-Instruct-v0.1 TXID: 63f58c5b-8b81–46a4-aa67-c6d385941096
ID: 08de436e-b13e-4871–8c1b-d1789a79b16e Name: Timothy Spann [tspann] Time Zone: America/New_York
== HF Mixtral-8x7B-Instruct-v0.1 Meta Data ==
Compute Characters/Time/Type: 192 / 2.548117662/2-a100
Generated/Prompt Tokens/Time per Token: 100 / 50 : 25
Inference Time: 2541 // Queue Time: 6
Request ID/SHA: Q — xPhNL8XkcoVDKvKTPV / 1e637f2d7cb0a9d6fb1922f305cb784995190a83
Validation/Total Time: 0 / 2548

For all of our future demos, we will have at least these four used at once. If you remember some of my older articles we also have OLLAMA, WatsonX, Cloudera ML LLAMA, BigScience BLOOM, BERTA, Google Gemma-7B-IT, Minilm-uncased-squad2 and more.

I am working on how to dynamically pick the best model, best platform to run that particular model, best context to send and maximize prompt.

RESOURCES

--

--

Tim Spann
Cloudera

Principal Developer Advocate, Zilliz. Milvus, Attu, Towhee, GenAI, Big Data, IoT, Deep Learning, Streaming, Machine Learning. https://www.datainmotion.dev/