Procedural Access to Azure OpenAI Token Usage

Published in

CarMax Engineering Blog

5 min readJun 3, 2024

Generative AI can be expensive and is definitely worth monitoring. Since costs are associated directly with token usage, it’s important to be aware of your implementation’s current usage. This guide details how to set up procedural access to an Azure OpenAI resource’s monitoring metrics to acquire the Prompt Token Usage and Generation Token Usage metrics. Doing this allows for programmatic decision making in your app based on token usage.

Step 1: Install Required NuGet Packages

To access these metrics we’ll be using two separate NuGet packages, so we’ll start by installing the Azure.ResourceManagerand Azure.ResourceManager.Monitor packages.

Step 2: ArmClient Initialization

Importantly, in order for your app to access this data it will need authorization to access your OpenAI resource. Hopefully this is already set up in your app, but in case it isn’t, you’ll want to initialize the ArmClient in Program.cs with your Azure Credentials. I recommend the DefaultAzureCredential class. More information on that here.

The end goal of this step is to have an ArmClient available for use such as:

ArmClient armClient = new(myAzureCredentials);

Step 3: Preparing the ArmResourceGetMonitorMetricsOptions

The method that we’ll be calling can take a number of options and will require some purposeful set up (Microsoft documentation here). While there are plenty of Azure Monitor related options, the two most relevant to this task are:

A) Metricnames: a CSV of the metrics that you wish to acquire for this resource

There are numerous metrics listed as available for CognitiveServices resources, but the most relevant ones in my experience have been the two token usage ones: ProcessedPromptTokens and GeneratedTokens. The AzureOpenAIRequests metric may be of interest, too. There’s also an AzureOpenAITimeToResponse and many more metrics available, though most are more niche.

While TokenTransaction may be useful in a general sense as it provides your total token usage for this resource, I’ve found that combining ProcessedPromptTokens with GeneratedTokens informs more precisely regarding input and output tokens which also allows for a more accurate cost estimate. If you use just one prompt with the deployment then you may even be able to automate and trust a cost-per-request by using these in conjunction with the AzureOpenAIRequests metric. The end result of step 3A will be something like:

Metricnames = "PromptProcessedTokens,GeneratedTokens";

B) Timespan: a string in the format of startTime/endTime, where both times are declared in ISO.

If no Timespan is provided it defaults to the most recent 60 minutes, broken into one minute segments. You can redefine the total time returned (‘TimeSeries’) by using the Timespanproperty, however.

The ISO requirement is simple enough to create using the ‘o’ formatting shorthand resulting in:

Timespan = $"{startTime:o}/{endTime:o}";

There is also an Interval property that you may optionally include. These metrics default to one minute intervals, which is relevant when it comes to accessing the data. More about intervals discussed in the Access section below.

The final result of Step 3 is a fully formed ArmResourceGetMonitorMetricsOptions object:

ArmResourceGetMonitorMetricsOptions options = new() 
{ 
    Timespan = $"{currentHour:o}/{currentTime:o}",
    Metricnames = $"{PromptTokensMetricName},{GeneratedTokensMetricName}",
};

Step 4: Getting Your ResourceIdentifier

Now that we’ve got the options set up, let’s look into the method that we’ll actually be using: GetMonitorMetricsAsync. This method will call to the resource and access the metrics that are available within Azure by selecting the OpenAI resource and looking at the “Monitoring” tab. The GetMonitorMetricsAsync method is part of the ArmClient and takes in two parameters: an ArmResourceGetMonitorMetricsOptions that we just set up above, and a ResourceIdentifier.

The ResourceIdentifier is simply a new ResourceIdentifier(ID) where the ID is the proper ID of your OpenAI resource. The easiest way to find this is by traversing Azure into your specific OpenAI resource, then clicking on the “Json View” link in the upper right:

The ID of the resultant JSON will begin with “/subscription” and end in the name of your OpenAI Resource. It’s pretty long, but there’s also a copy button right there:

Since that Resource ID contains your Subscription ID, resource group name, and resource ID, it may be preferable to not introduce it into your codebase directly, but rather include it in your App Config and import it from there.

Step 5: Making the API Call

Now that we have the setup out of the way, the call that you’ll be making is pretty straight-forward:

return await _armClient.GetMonitorMetricsAsync(
  new ResourceIdentifier(resourceId), options)
  .ToDictionaryAsync(mm => mm.Name.Value);

Using the ArmClient’s GetMonitorMetricsAsync method, passing in the ResourceIdentifier and options containing the preferred metrics will return an AsyncPagable<T>. This is readily converted into a more easily accessible object by using the .ToDictionaryAsync(mm => mm.Name.Value) conversion, which converts it into a Dictionary<string, MonitorMetric> that contains the name of the metric as the key for each entry.

Step 6: Accessing the data

All that remains then is to utilize the data that’s been returned! It’s nested fairly deeply, but each metric contains a Value which contains a List<Timeseries>. Though it is declared as a list, I have yet to discover how to acquire data for multiple Timeseries in one call. As a result the list defaults to contain just one lone Timeseries. Within the Timeseries is the Data associated with it, divvied up into one minute intervals (by default, at least). Lastly, each data object contains a Total property that can be summed up for analysis. With that in mind, below is an example of summing up the entirety of the return metric’s data:

private static int GetTotalFromMetric(string nameOfMetric, Dictionary<string, MonitorMetric> metrics) 
{ 
    metrics.TryGetValue(nameOfMetric, out MonitorMetric? matchingMetric); 
    double metricSumTotal = matchingMetric?.Timeseries?.FirstOrDefault()?.Data.Sum(d => d.Total) ?? 0; 

    return Convert.ToInt32(metricSumTotal); 
}

Though this usesFirstOrDefault(), in my experience it is always only the lone Timeseries and your chosen interval duration set within the options only affects the intervals within the Data.Should you encounter multiple Timeseries, please comment below and share what options created such a scenario. By default there is only the one and this example should provide a good starting point.

Conclusion

I hope that this has provided some clarity into the procedural acquisition of metrics related to Azure OpenAI Token Usage. In my experience, it’s enabled some decision making surrounding overall usage that would not otherwise be possible. The token usage metrics seem to take just 2–3 minutes to update allowing for near-real-time decision making as a result.