Sitemap
kaleidoprompt Blog

Official blog for kaleidoprompt, the multi-LLM AI aggregator. Discover usage tips, feature updates, development insights.

Follow publication

How I built an AI aggregator using Semantic Kernel & .NET Aspire

--

As a developer, I previously relied on Google, Stack Overflow, or documentation for information; however, I now frequently turn to ChatGPT or similar Large Language Models (LLMs) for help with my tasks at work or when coding in my spare time. These new AI tools can be immensely helpful, but I often run into message limits for the more powerful models and have also noticed that some models seem to be better at certain tasks than others. For instance, I found Claude to be superior when dealing with frontend code and ChatGPT being better at .NET. I wanted to find a way to access these different models without running into message limits or having to pay for multiple subscriptions. At the same time, I had been reading up on .NET Aspire and Microsoft’s Semantic Kernel framework and was looking for a project that would allow me to get some hands-on experience with these new technologies. Combining these interests, I decided to build a multi-chat bot service that would leverage the APIs of the LLM providers through the use of Semantic Kernel and .NET Aspire orchestration.

Introducing kaleidoprompt

Figure 1: The main chat interface for kaleidoprompt. Shown here with 3 different models opened.

kaleidoprompt is a micro-SaaS that lets the user simultaneously interact with multiple LLMs stacked side by side in a single interface. Sites that let users interact with multiple AI models in one place are also known as AI aggregators. In kaleidoprompt there is no subscription, instead, users deposit funds, and the cost of each chat interaction is then deducted from the user’s funds. The tech stack is based on Microsoft .NET 9.0 and hosted in the Azure cloud. The main chat interface of kaleidoprompt can be seen in [Figure 1].

Key features include:

  • Chat simultaneously with one or more LLMs in the same interface.
  • Invoke models from OpenAI, Google, Anthropic, xAI, DeepSeek, Meta, and more.
  • Pay based on usage with no subscription required.
  • Branch conversations at any point to a different model.
  • Handle a wide range of file formats as input.
  • Access full chat thread history.
  • Rename, delete, or favorite chat threads.
  • Create custom profiles to set system messages, temperature, and top-p values.
  • View detailed graphs of your token usage and costs.
  • Download full usage reports.
  • See input costs calculated and displayed in real time.
  • Export chat threads as Markdown.
  • Delete all your data or your entire account.

The side-by-side interface allows users to quickly gain diverse insights from their prompts. The branch out feature is particularly useful if the user is dealing with a problem that cannot be solved satisfactorily by one of the cheaper models, then the user can easily branch out and get a better answer from one of the more powerful models. kaleidoprompt is under active development, and new features and more chat models will be added in the future.

Azure Architecture

Figure 2: Azure architecture for kaleidoprompt.

The cloud architecture for kaleidoprompt is outlined in [Figure 2]. Azure has been selected as the cloud provider as .NET Aspire currently focuses primarily on Azure.

The focus has been on maintaining a simple, scalable cloud architecture with low operational expenditures. Some of the more expensive cloud services often found in enterprise solutions have therefore been excluded.

The kaleidoprompt application is divided into a Blazor server frontend and an API service. Both are hosted in Azure Container Apps. The Blazor server application acts as the frontend, while the backend service handles all data access. This approach enhances decoupling and adheres to the separation of concerns principle. It also makes it possible to replace the frontend framework in the future if so desired.

User data is stored in a SQL database, chat threads are saved as JSON documents in Cosmos DB, while uploaded files are stored as blobs in Azure Blob Storage. Azure Service Bus queues handle asynchronous communication and further increase the decoupling of components. Azure Key Vault securely stores secrets, and Azure Container Registry stores and manages the application’s container images. A Managed Identity with Role-Based Access Control (RBAC) is configured for the Container Apps in the Container App Environment to securely access required Azure services.

kaleidoprompt also integrates with services outside of Azure. Secure payment processing is handled by Stripe. Multiple AI providers are integrated by leveraging Microsoft’s Semantic Kernel framework.

Identity provider

Figure 3: Sign In for kaleidoprompt.

Azure Active Directory B2C (Azure AD B2C) was chosen as the identity provider due to its straightforward integration with the .NET ecosystem and its support for social logins. A significant factor contributing to this decision was its cost-effectiveness. Azure AD B2C provides a generous free tier that includes the first 50,000 monthly active users completely free of charge, aligning with the project’s goal of low operational costs.

In kaleidoprompt, the user can either create a custom account or log in using one of three social accounts: Microsoft, Google, or Amazon, see [Figure 3]. The underlying authentication mechanisms adhere to standard OpenID Connect and OAuth 2.0 protocols.

The authentication sequence for the Blazor Server frontend involves redirecting the user to Azure AD B2C. Upon successful authentication, B2C returns the user to a designated callback URI within the Blazor application. The ASP.NET Core authentication middleware securely handles this callback. It validates the user’s identity and establishes an authenticated session. This user session with the Blazor Server application itself is maintained using a standard, secure ASP.NET Core authentication cookie issued to the browser.

During the authentication process, the Blazor Server backend acquires the following critical tokens from Azure AD B2C:

  • Access Token: A short-lived JSON Web Token (JWT) used as a bearer token. The Blazor Server backend includes this token in the Authorization header when making calls to the backend API service, thereby authenticating the request.
  • Refresh Token: A long-lived token used to obtain new access tokens from Azure AD B2C after the current access token expires, without requiring the user to re-authenticate. To ensure persistence across server restarts and facilitate token renewal in potentially scaled-out environments, this refresh token is securely stored in the application’s distributed cache, which in this architecture utilizes a dedicated table within the Azure SQL database.

This configuration allows kaleidoprompt to leverage Azure AD B2C for secure identity management and token issuance, maintain the interactive user session with the Blazor Server frontend via standard cookie mechanisms, and utilize access and refresh tokens for secure, long-lived delegated access to the backend service.

.NET Aspire

.NET Aspire is an opinionated, cloud-ready stack of tools, templates, and packages for building observable, production-ready, distributed applications.

kaleidoprompt is built upon the Aspire template in Visual Studio. When creating a .NET Aspire solution, two Aspire-specific projects are added.

  • App Host: An orchestrator project designed to connect and configure the different projects and services of your app.
  • Service Defaults: A shared project to manage configurations that are reused across the projects in your solution related to resilience, service discovery, and telemetry.

In the App Host project, the Aspire app model is defined within the Program.cs file. The app model outlines the resources in our .NET Aspire solution and their relationships. Adding the storage and messaging services is easily achieved with the following lines of code.

var cosmosdb = builder.AddAzureCosmosDB("cosmosdb");
var sqldb = builder.AddAzureSqlServer("sqldb-server").AddDatabase("sqldb");
var blobs = builder.AddAzureStorage("storage").AddBlobs("blobs");
var messaging = builder.AddAzureServiceBus("messaging");

Secrets, such as API keys, are also added to the app model.

var apiKeysOpenAI = builder.AddParameter("ApiKeys-OpenAI", secret: true);
var apiKeysGoogle = builder.AddParameter("ApiKeys-Google", secret: true);
...

Adding the backend API project and registering its dependencies:

var apiService = builder.AddProject<Projects.kaleidoprompt_Backend_ApiService>("apiservice")
.WithReference(cosmosdb)
.WithReference(sqldb)
.WithReference(messaging)
.WithReference(blobs)
.WithEnvironment("ApiKeys-OpenAI", apiKeysOpenAI)
.WithEnvironment("ApiKeys-Google", apiKeysGoogle)
...

Finally, the frontend project is added along with its dependencies:

builder.AddProject<Projects.kaleidoprompt_Frontend_Web>("webfrontend")
.WithExternalHttpEndpoints()
.WithReference(sqldb) // Only for distributed caching
.WithReference(apiService)
.WithEnvironment("AzureAdB2C-ClientSecret-Frontend", azureAdB2CClientSecretFrontend);

Once the app model has been defined, .NET Aspire orchestration takes care of service discovery and connection string management, which simplifies the developer experience. For local development, connection strings and secrets are defined in the secrets.json file using the user secret manager in Visual Studio.

One of the great features of .NET Aspire is that the app model can easily be converted into Infrastructure as Code with the command azd infra synth. This takes the setup and dependencies defined in the app model and generates the appropriate Bicep files that mirror the app model’s configuration. In order to limit the hosting costs for kaleidoprompt, the Bicep files were edited to deploy lower-cost versions of the services needed. For instance, both the SQL DB and the Service Bus were scaled down to the Basic tier.

Software Architecture

The backend service for kaleidoprompt is organized as a modular monolith employing the vertical slice architecture, see [Figure 4]. The kaleidoprompt frontend, on the other hand, is built around a more traditional Onion architecture. Communication between the frontend and backend service is facilitated by REST-based minimal APIs.

Figure 4: Vertical slice architecture for the backend service.

The vertical slice architecture was chosen for the backend as it improves cohesion and decoupling and is well-suited for endpoints based on minimal APIs. The Messages slice in [Figure 4], for instance, exposes message-related endpoints and handles all message-related logic such as streaming replies and saving messages to Cosmos DB. Direct coupling between slices is kept to a minimum by leveraging Azure Service Bus queues. By keeping the direct coupling between slices low, it makes the code easier to maintain and simplifies extracting slices into their own microservices if needed for future scaling purposes.

Message Processing Flow

The Messages slice handles the primary processing of chat messages. A message can be either a user prompt or a bot reply. The main flow for when a user sends a prompt and the subsequent reply is processed is illustrated in [Figure 5].

Figure 5: Message Processing Flow Diagram.

The processing sequence is as follows:

  1. Receive Request: Accept the incoming user prompt via a POST request. The request payload includes the prompt text and other relevant metadata.
  2. Validate Input: Verify that the parameters within the incoming request object are valid. If validation fails, return an error response.
  3. Load Context: Retrieve the chat thread context, consisting of the system message and the history of previous user prompts and bot replies for the current conversation.
  4. Check Funds: Calculate the estimated cost for processing the user’s prompt and verify if the user has sufficient funds remaining. If funds are insufficient, return an error response.
  5. Perform Moderation: Submit the user’s prompt to the OpenAI moderation endpoint to check for content policy violations (e.g., harassment, violence, harmful content).
  6. Evaluate Moderation: Check the result from the moderation endpoint. If the content is flagged as violating policy, return an error response.
  7. Invoke Model & Start Streaming: Send the validated user prompt, along with the loaded chat context, to the language model using the Microsoft Semantic Kernel framework. Begin streaming the generated response back to the user interface in real-time for display.
  8. Verify Stream Completion: Once the response stream ends, confirm that it completed without any errors during transmission. If errors occurred, return an error response.
  9. Persist Messages: Atomically save both the user’s prompt message and the complete streamed bot reply message to the corresponding chat thread in the Cosmos DB database. This transactional approach ensures data integrity, guaranteeing that every persisted prompt has an associated reply.
  10. Check Save Success: Confirm that the save operation to Cosmos DB was successful. If the save failed, return an error response.
  11. Publish Events: Asynchronously publish messages containing metadata related to the processed interaction (e.g., cost, token usage, user ID, thread ID, file upload info if applicable) to the appropriate message bus queues.
  12. Return Success: Send a success response (e.g., HTTP 200 OK) back to the client, indicating the request was processed successfully.

A critical aspect of this architecture occurs in step 11: publishing event data to Azure Service Bus queues. This asynchronous approach is preferred over direct database writes for several reasons. Primarily, it enhances system performance and perceived responsiveness by allowing the main API call, responsible for handling the user request and streaming the response, to return more quickly, as these subsequent tasks are managed asynchronously. Furthermore, the message queues significantly bolster resilience and scalability. They function as a robust buffer, absorbing sudden spikes in load. Should downstream services, such as the SQL database, become temporarily overloaded or unavailable, messages are securely persisted in the queue for later processing, thereby preventing data loss and improving overall fault tolerance. Moreover, this design promotes architectural decoupling by separating the core message processing logic from various auxiliary tasks like updating user funds, logging token usage, registering associated file uploads, etc. Such modularity simplifies system maintenance and allows different parts of the system (slices) to evolve independently.

This flow ensures user interactions are handled efficiently, securely, and reliably, while maintaining system responsiveness and architectural flexibility.

Semantic Kernel

Semantic Kernel (SK) is an open-source SDK created by Microsoft. Its primary goal is to make it easier for developers to integrate LLMs into their applications. Instead of manually integrating APIs from several different LLM providers, kaleidoprompt leverages SK. In the code snippet below, taken from Program.cs, we register the SK services with the dependency injection (DI) container using the AddKernel method. The Kernel object is then available for injection into other parts of the code. Subsequent method calls are chained onto the Kernel, such as .Add<Provider>ChatCompletion(…), for each of the LLMs we want to support in kaleidoprompt. Please note that the snippet does not show all the models supported by kaleidoprompt.

builder.Services.AddKernel()
.AddOpenAIChatCompletion(
BotModel.gpt_4o.ToModelString(), apiKey: builder.Configuration["ApiKeys-OpenAI"]!)
.AddGoogleAIGeminiChatCompletion(
BotModel.gemini_2dot5_pro.ToModelString(), apiKey: builder.Configuration["ApiKeys-Google"]!)
.AddGoogleAIGeminiChatCompletion(
BotModel.gemini_2dot0_flash.ToModelString(), apiKey: builder.Configuration["ApiKeys-Google"]!)
.AddAzureAIInferenceChatCompletion(
BotModel.meta_llama_3_1_405B.ToModelString(),
endpoint: new Uri("xxx"),
apiKey: builder.Configuration["ApiKeys-AzureAIStudio-Meta-Llama-3–1–405B-Instruct"]!);

With the kernel registered in the DI layer, the Kernelobject and its configured services become available throughout the application. The method below returns a specific chat completion service based on the input bot model.

public static IChatCompletionService GetChatCompletionService(Kernel kernel, string botModel)
{
var chatCompletionServices = kernel.GetAllServices<IChatCompletionService>();
var chatCompletionService = chatCompletionServices.FirstOrDefault(
x => x.Attributes.Contains(new KeyValuePair<string, object?>("ModelId", botModel)))
?? throw new InvalidOperationException("Chat completion service not found.");

return chatCompletionService;
}

The chat completion service in SK can return either the complete model reply at once or stream the reply as it is generated. We choose to stream the reply for a better, more responsive user experience. The code snippet below shows how the streaming is handled. This is equivalent to step 7 in the Message Processing Flow.

var result = chatCompletionService.GetStreamingChatMessageContentsAsync(
chatHistory,
executionSettings: promptExecutionSettings,
kernel: kernel);

iterator = result.GetAsyncEnumerator();
bool more = true;

while (more)
{
try
{
more = await iterator.MoveNextAsync();
}
catch (HttpOperationException e)
{
LogHttpOperationException(e);
errorMessage = "<br><br>&emsp;<span style='color:red;'>ERROR:</span> " +
e.Message +
"<br><br>";
}
catch (Exception e)
{
LogException(e);
errorMessage = "<br><br>&emsp;<span style='color:red;'>ERROR:</span> " +
e.Message +
"<br><br>";
}

if (!string.IsNullOrWhiteSpace(errorMessage))
{
yield return errorMessage;
// Loop will terminate naturally because 'more' is false now.
}
else if (more && iterator.Current is not null)
{
if (!string.IsNullOrEmpty(iterator.Current.Content))
{
botReply.Append(iterator.Current.Content);
yield return iterator.Current.Content;
}
}
}

To catch and handle exceptions mid-stream, we use GetAsyncEnumerator() and MoveNextAsync(). The method call MoveNextAsync() is where the system tries to fetch the next item in the stream. By wrapping this in a try-catch block, we can react appropriately by logging the exception and returning an error message to the user.

SQL Database & Entity Framework

For storing data, kaleidoprompt leverages several different data stores. The most suitable data store is chosen depending on the data type. For files, an Azure Blob Storage account (Hot tier) has been added. Message data, in JSON format, is stored in Azure Cosmos DB. User data and message metadata are stored in a managed Azure SQL Database. Azure ensures that all data stored in Azure SQL Database, Cosmos DB, and Blob Storage is automatically encrypted at rest by default.

A code-first approach using Entity Framework (EF) Core is applied for table configuration. The code-first approach has the benefits of a developer-centric workflow with a single source of truth for the database schema residing within the migration scripts. It also allows for quick changes to the database schema and iterative improvements.

Figure 6: ER-diagram for the SQL DB.

In [Figure 6], the database schema is visualized with an Entity-Relationship (ER) diagram. This schema, defined through C# entity classes, maps directly to the tables within the Azure SQL Database. The Users table serves as a central entity, linked via one-to-many relationships to tables such as BotProfiles, Files, Funds, Thread, and Tokens. History tables provide an audit trail of user activity related to file uploads, funds, and tokens.

For distributed caching, a Redis cache is often used, but to keep the operational costs to a minimum, we have instead opted for a cache table within the SQL database. Choosing between Redis and SQL Cache wasn’t obvious, but prioritizing minimal operational cost led me to the SQL table approach, accepting the potential performance trade-off.

To ensure efficient data retrieval from the Azure SQL Database, several optimization techniques are employed. Indexes, including composite indexes, are added where appropriate. LINQ queries incorporate joins and projections to optimize query speed and memory usage. For read-only queries, the .AsNoTracking() method is applied, this speeds up the query and reduces the memory footprint by telling EF Core not to track entity changes.

By combining the developer-friendly code-first approach of EF Core with careful consideration of database performance techniques, kaleidoprompt aims to provide a responsive user experience supported by a well-structured and optimized SQL database layer.

Cosmos DB

Complementing the Azure SQL database, Azure Cosmos DB serves as the dedicated repository for chat message data within kaleidoprompt. Messages, represented naturally as JSON documents, are well-suited for Cosmos DB’s NoSQL document model.

A primary driver for this choice is performance and scalability, particularly the need for rapid retrieval of entire chat threads. Every time a user sends a new prompt, the application must quickly load the relevant message history to provide context for the LLM. Storing this potentially large volume of message data in Cosmos DB also strategically offloads the primary Azure SQL database, reserving its relational capabilities primarily for user data and metadata. Furthermore, Azure Cosmos DB offers a generous free tier, aligning with the goal of minimizing operational costs.

To maximize read performance for chat histories, ThreadId is designated as the partition key for the messages container. This is a crucial design decision. It ensures that all messages belonging to the same conversation thread are grouped together within the same logical partition. As a result, fetching the entire history for a specific thread translates into a highly efficient single-partition query in Cosmos DB. This approach guarantees consistently low latency for retrieving chat context, essential for maintaining a responsive user experience as the number of users and threads grows.

This specialized use of Cosmos DB allows kaleidoprompt to leverage a scalable NoSQL database for its high-volume, read-intensive chat data, while maintaining a structured relational store (Azure SQL Database) for other critical application data.

Summary and Conclusion

I developed kaleidoprompt partly out of a personal desire to simplify AI-driven coding and research, and partly out of curiosity for new .NET concepts. By combining .NET Aspire’s structured approach with Microsoft’s Semantic Kernel, I could bring multiple large language models together in one interface, streamlining everyday tasks like brainstorming, coding support, and creative writing.

Although I’m pleased with how kaleidoprompt currently handles conversations, the real excitement lies in the fact that kaleidoprompt is constantly growing. The plan is to keep adding new AI providers, enhanced features, and refined user experiences to ensure this service remains a powerful, convenient tool for everyone seeking flexible multi-LLM interaction.

I’m excited to keep improving kaleidoprompt and would love any feedback. Please feel free to connect with me on LinkedIn.

--

--

kaleidoprompt Blog
kaleidoprompt Blog

Published in kaleidoprompt Blog

Official blog for kaleidoprompt, the multi-LLM AI aggregator. Discover usage tips, feature updates, development insights.

No responses yet