Develop LLM Apps with OS models using Azure ML Model Catalog and PromptFlow

Ozgur Guler
Microsoft Azure
Published in
5 min readJul 26, 2023
Spoilt for choice — an LLM robot shoping for models

It is now possible to deploy OS foundation models from HuggingFace model Hub or MS curated OpenSource models (including the new comer LLaMA2, Falcon variants) on AzureML as well as import your own models with Azure ML Model Catalog. Announced at MS Build in May, Azure ML Model Catalog (AMLMC) enables LLM developers with a one-clock deploy experience on Azure ML compute where deployed models are exposed as REST API’s, which can then be used in Azure PromptFlow as any other “LLM tool” like the AOAI GPT models.

Azure ML Model Catalog

Azure ML model catalog helps you filter models based on use cases. (An interesting feature that can be added would be an LLM recommendation engine based on user requirements. With the increasing number of OS models it gets increasingly difficult to find the right model for your use-case.)

facebook-opt-2.7b is one of the few higher quality text generation models that can run on a CPU without quantization (albeit very slow). In this blog, for the sake of demonstration I will deploy facebook-opt-2.7b on a non-GPU VM instance and use the model as an LLM tool within promptflow (since I do not have access to GPU instances at the moment).

a- Deploy the OS model to AzureML, expose with a REST API

Find the model in Azure ML Catalog and click “Deploy”
deploying facebook-opt-2.7b from Azure ML Model Catalog

For some of the models in Model Catalog, “Model Cards” have model performance benchmarks (inference latency) tested on different types of VM’s. VM types mentioned in the model card should be preferred for deployment since they are already tested for your specific model.

You can deploy the models as they are (if they are advanced enough to have good zero-shot performance) or after fine-tuning with your own data. I will cover fine-tuning of OS LLM’s on Azure ML in a seperate post.

Don’t forget to test your llm endpoint once deployment is completed to avoid “no healthy upstream” type of responses from your endpoint which are difficult to troubleshoot and indicate a problem with your model deployment. You will find a sample test input in “model overview” you will see once you click any model in the catalog…

test the endpoint before moving to PromptFlow

b- Create a PromptFlow custom connection to make the model available in PromptFlow

To use the model as another LLM in PromptFlow we need to create a “custom connection” to the model endpoint.

on Azure-ML portal go to “Endpoint” and choose your deployment, in our case this is the endpoint with name “epfbookopt2”.

Collect information required to create a custom connection from endpoint “consume” tab

Copy the model API key and the model-endpoint URL.

You are also given Python, C# & R code to integrate your os model endpoint to promptflow. Copy this as well for the next step…

Add a custom PromptFlow connection with model endpoint url and the key…

c-Create a PromptFlow sample standard flow to test the model

Add a simple prompt step and a python step to integrate dollyv2…

Add the below python skeleton code below to the flow python module.

import requests
from promptflow import tool
from promptflow.connections import CustomConnection

@tool
def fbook_opt_model(prompt: str, fbook_opt_llm:CustomConnection):
query = {
"inputs":"Hello World"
}

headers = {
"Content-Type": "application/json",
"Authorization": "Bearer " + fbook_opt_llm.key,
}


response = requests.post(fbook_opt_llm.url, headers = headers, json = query)
r = response.text

return r

Click on “Validate and parse input” and choose your custom connection for dollyv2 in the inputs section below the code section.

PromptFlow Python module to use the OS model from within PromptFlow

Once the PF flow is run end-to-end we can confirm the model responds correctly.

This has been a simple demonstration of how you cannow deploy OS LLM’s onto Azure ML through Azure ML model catalog and use these LLM’s from within your PromptFlow flows. As we are now moving onto a multi-model world weather to find a fine-balance between cost, inference latency, output quality or now it looks more likely that LLM’s going forward will be multi-model.

Hope you found the content useful…

About The Author

Ozgur Guler is a Solutions Architect at MS where he works with Startups & Digital Natives. You can connect with him on medium, Twitter, LinkedIn.

Subscribe to AzureOpenAI Builders Newsletter where we cover the lates on building with #AzureOpenAI on LinkedIn here.

References

[MS Learn] How to use Open Source foundation models curated by Azure Machine Learning (preview) [link]

[MS Learn] PromptFlow custom connection tool reference [link]

--

--