Getting an image description using GPT-4o or GPT-4 Turbo

Published in

medialesson

5 min readMay 18, 2024

Let’s check how easily you can get an image description using the newly GPT-4o or GPT-4 Turbo and let’s compare these two models.

Introduction

At the beginning of the week, OpenAI announced the release of their next Large Language Model, called GPT-4o, during their Spring Update Keynote. I have already published a post on Medium comparing the new model to other available models from OpenAI. In this post, I will take a closer look at the vision functionality of the model.

Let’s code

I will create a simple .NET console application using the Spectre.Console, Azure.AI.OpenAI and SkiaSharp NuGet packages. So let’s open Visual Studio and create a new console application using .NET 8. Just add the three above mentioned NuGet packages.

We will create a folder called Utils. In this folder, we will add a new file called Statics.cs. This file will contain the two hosts, Azure OpenAI and OpenAI, as well as the two vision models from OpenAI.

internal class Statics
{
    public const string AzureOpenAIKey = "Azure OpenAI";
    public const string OpenAIKey = "OpenAI";

    public static string GPT4TurboKey = "gpt-4-turbo";
    public static string GPT4oKey = "gpt-4o";
}

Next, we will create the class ConsoleHelper.cs in the Utils folder. This class will be used as a helper for working with Spectre.Console. It provides methods to create a header, ask the user for a selection of models, and get a string or a URL from the console. Additionally, it offers methods to write an error message or a general message to the console.

using Spectre.Console;

namespace OpenAIImageDescription.Utils;

internal class ConsoleHelper
{
    public static void CreateHeader()
    {
        AnsiConsole.Clear();

        Grid grid = new();

        grid.AddColumn();

        grid.AddRow(
            new FigletText("Image Description").Centered().Color(Color.Red));

        grid.AddRow(
            Align.Center(
                new Panel("[red]Sample by Thomas Sebastian Jensen ([link]https://www.tsjdev-apps.de[/])[/]")));

        AnsiConsole.Write(grid);
        AnsiConsole.WriteLine();
    }

    public static string SelectFromOptions(
        List<string> options, string prompt)
    {
        CreateHeader();

        return AnsiConsole.Prompt(
            new SelectionPrompt<string>()
            .Title(prompt)
            .AddChoices(options));
    }

    public static string GetUrlFromConsole(
        string prompt)
    {
        CreateHeader();

        return AnsiConsole.Prompt(
            new TextPrompt<string>(prompt)
            .PromptStyle("white")
            .ValidationErrorMessage("[red]Invalid prompt[/]")
            .Validate(prompt =>
            {
                if (prompt.Length < 3)
                {
                    return ValidationResult.Error("[red]URL too short[/]");
                }

                if (prompt.Length > 250)
                {
                    return ValidationResult.Error("[red]URL too long[/]");
                }

                if (Uri.TryCreate(prompt, UriKind.Absolute, out Uri? uri)
                    && uri.Scheme == Uri.UriSchemeHttps)
                {
                    return ValidationResult.Success();
                }

                return ValidationResult.Error("[red]No valid URL[/]");
            }));
    }

    public static string GetStringFromConsole(
        string prompt)
    {
        CreateHeader();

        return AnsiConsole.Prompt(
            new TextPrompt<string>(prompt)
            .PromptStyle("white")
            .ValidationErrorMessage("[red]Invalid prompt[/]")
            .Validate(prompt =>
            {
                if (prompt.Length < 3)
                {
                    return ValidationResult.Error("[red]Value too short[/]");
                }

                if (prompt.Length > 200)
                {
                    return ValidationResult.Error("[red]Value too long[/]");
                }

                return ValidationResult.Success();
            }));
    }

    public static void WriteErrorMessageToConsole(
        string message)
    {
        AnsiConsole.MarkupLine($"[red]{message}[/]");
    }

    public static void WriteMessageToConsole(
        string message)
    {
        AnsiConsole.MarkupLine($"[white]{message}[/]");
    }
}

Now let’s open the Program.cs file to create the main logic. The program is straightforward. First, we ask the user to select the host. If the user selects OpenAI, we will ask for the OpenAI API key and let the user select the vision model. If the user selects Azure OpenAI, we will ask for the endpoint, the API key, and the deployment name.

Next, we let the user enter the path to a local image file. We validate this file and use SkiaSharp to resize the image to use fewer tokens for our request.

Finally, we create the ChatCompletionsOptions by passing the base64-encoded string to the API. Last but not least, we print the image description to the console and provide information about the prompt tokens and the completion tokens.

using Azure;
using Azure.AI.OpenAI;
using OpenAIImageDescription.Utils;
using SkiaSharp;

List<string> _imageExentions = [".jpg", ".jpeg", ".png", ".gif", ".bmp"];

// Show header
ConsoleHelper.CreateHeader();

// Get Host
string host =
    ConsoleHelper.SelectFromOptions(
        [Statics.OpenAIKey, Statics.AzureOpenAIKey],
        "Please select the [yellow]host[/].");

// OpenAI Client
OpenAIClient? client = null;
string deploymentName = Statics.GPT4oKey;

switch (host)
{
    case Statics.OpenAIKey:

        // Get OpenAI Key
        string openAIKey =
            ConsoleHelper.GetStringFromConsole(
                $"Please insert your [yellow]{Statics.OpenAIKey}[/] API key:");

        // Get Model
        deploymentName =
            ConsoleHelper.SelectFromOptions(
                [Statics.GPT4oKey, Statics.GPT4TurboKey],
                "Please select the [yellow]model[/].");

        // Create OpenAI client
        client = new(openAIKey);

        break;

    case Statics.AzureOpenAIKey:
        // Get Endpoint
        string endpoint =
            ConsoleHelper.GetUrlFromConsole(
                "Please insert your [yellow]Azure OpenAI endpoint[/]:");

        // Get Azure OpenAI Key
        string azureOpenAIKey =
            ConsoleHelper.GetStringFromConsole(
                $"Please insert your [yellow]{Statics.AzureOpenAIKey}[/] API key:");

        // Create OpenAI client
        client =
            new(new Uri(endpoint), new AzureKeyCredential(azureOpenAIKey));

        // Get deployment name
        deploymentName =
            ConsoleHelper.GetStringFromConsole(
                "Please insert the [yellow]deployment name[/] of the model:");

        break;
}

if (client is null)
{
    return;
}

while (true)
{
    // Show header
    ConsoleHelper.CreateHeader();

    // Get path to file
    string imageFilePath =
        ConsoleHelper.GetStringFromConsole(
            "Please insert the [yellow]full path[/] to your picture:");

    // Show header
    ConsoleHelper.CreateHeader();

    // Validate image
    if (!File.Exists(imageFilePath))
    {
        ConsoleHelper.WriteErrorMessageToConsole(
            "Path doesn't exist.");
        return;
    }

    // Show header
    ConsoleHelper.CreateHeader();

    // Check if file is image
    var fileExtension = Path.GetExtension(imageFilePath).ToLower();
    if (!_imageExentions.Contains(fileExtension))
    {
        ConsoleHelper.WriteErrorMessageToConsole(
            "Not a image file is provided.");
    }

    // Resize image and create base64 string
    using SKBitmap originalBitmap = SKBitmap.Decode(imageFilePath);
    SKImageInfo resizedInfo = new(640, 480);
    using SKBitmap resizedBitmap = new(resizedInfo);
    originalBitmap.ScalePixels(resizedBitmap, SKFilterQuality.High);
    using SKImage image = SKImage.FromBitmap(resizedBitmap);
    using SKData data = image.Encode(SKEncodedImageFormat.Jpeg, 75);
    byte[] imageArray = data.ToArray();
    string base64Image = Convert.ToBase64String(imageArray);

    // Create ChatCompletionsOptions
    ChatCompletionsOptions chatCompletionsOptions = new()
    {
        Messages =
        {
            new ChatRequestUserMessage("What's in this image?"),
            new ChatRequestUserMessage(
                new List<ChatMessageContentItem>
                {
                    new ChatMessageImageContentItem(
                        new Uri($"data:image/{fileExtension};" +
                        $"base64,{base64Image}"))
                }
            )
        },
        MaxTokens = 1000,
        Temperature = 0.7f,
        DeploymentName = deploymentName
    };

    // Make request
    Response<ChatCompletions> result =
        await client.GetChatCompletionsAsync(chatCompletionsOptions);

    // Write Output
    ConsoleHelper.WriteMessageToConsole(
        result.Value.Choices[0].Message.Content);
    ConsoleHelper.WriteMessageToConsole(
        "");
    ConsoleHelper.WriteMessageToConsole(
        $"Prompt Tokens: {result.Value.Usage.PromptTokens}");
    ConsoleHelper.WriteMessageToConsole(
        $"Completion Tokens: {result.Value.Usage.CompletionTokens}");
    ConsoleHelper.WriteMessageToConsole(
        "");
    ConsoleHelper.WriteMessageToConsole(
        "Press any key to restart.");
    Console.ReadKey();
}

Screenshots

Let’s run the application. First, we need to select the host.

If we are using OpenAI, the user has to select the model to be used.

Here is the output using GPT-4 Turbo.

And here is the output using GPT-4o.

Conclusion

As you have seen, the results between GPT-4 Turbo and GPT-4o are quite similar, but GPT-4o is much faster and also much cheaper. For 1 million input tokens, you only need to pay $5 instead of $10 for GPT-4 Turbo. So, it makes perfect sense to use the new GPT-4o.

You will find the source code in my GitHub repository.

Getting an image description using GPT-4o or GPT-4 Turbo

Introduction

Let’s code

Screenshots

Conclusion

Written by Sebastian Jensen