Generating applications from sketches with LLMs

An implementation with LangChain and GPT-3.5-turbo

Valentina Alto
Microsoft Azure
Published in
7 min readSep 8

In last months, we’ve witnessed the great capabilities of LLMs in the context of code understanding and generation. However, the most incredible scenario I saw insofar was streamed during the GPT-4 Developer Livestream, where the multi-modal capabilties of the GPT-4 have been leveraged to create the HTML code for a web portal.

Source: GPT-4 Developer Livestream — YouTube

While waiting for the GPT-4 vision capabilities, I’ve tried to replicate it on my own. In fact, multi-modality can also be achieved by combining single-modal models or tools into one agent or sequential chain.

In my latest article, we’ve covered how to achieve multi-modality with the Azure Cognitive services toolkit available in LangChain using an agentic approach. This means that we leverage the back end LLMs to decide which tool to use depending on user’s request.

In this article, we will use a similar approach, with the difference of maintaining a hard-coded strategy using LangChain’s sequential chains.

The idea is the following:

  • Using the Image Analyzer API from Cognitive Services to generate a detailed description of a picture of a webpage.
  • Using the GPT-3.5-turbo API to generate the HTML code for the webpage, given the image description of the previous step.

Let’s start!

Initializing the Agent

The first thing to do to enable the agent to describe a picture is creating a multi-service resource for Cognitive Services (to create your own multi-service resource, you can follow the instructions here). You can retrieve your keys and endpoint in the left-hand bar:

In a similar way, you can retrieve keys and endpoints of your Azure OpenAI instance.

--

--

Valentina Alto
Microsoft Azure

Data&AI Specialist at @Microsoft | MSc in Data Science | AI, Machine Learning and Running enthusiast