From Idea to Reality: Building the Instant Web with Gemini (Part 1)

Thu Ya Kyaw
Google Cloud - Community
6 min readSep 4, 2024

Inspiration

When I was a web developer, I often had to implement wireframes provided by the design team. It was fun for a while, but sooner or later, the repetition in the work started to feel mundane. At that time, I wished I could automate some of those repetitive tasks and use my time on more interesting problems, like web optimization or implementing new features.

I explored various low-code/no-code platforms, hoping to automate some of those repetitive tasks, but they didn’t quite fit my needs. You see, I’m a developer at heart — I love coding, and I want the flexibility to change the code as I please. I was looking for automation that would complement my abilities, not abstract them away completely.

Then came Google’s Gemini models, a family of Generative AI models that can handle diverse data types, including text, images, video, and audio. This sparked my curiosity: could I build an application to generate working websites (HTML, CSS, and JavaScript) directly from wireframes?

Experimentation

To ensure I was on the right track, I started by exploring different prompt combinations and wireframe variations in Vertex AI Studio. If you are not aware, Vertex AI Studio is currently available to use without a Google Cloud account or credit card to get started!

Here’s a glimpse into some of the prompts that proved successful:

Using this wireframe, produce a website

  • This prompt is straightforward, but sometimes Gemini left out JavaScript or CSS code. Explicitly requesting them in the prompt helped.

Using this wireframe as a reference, produce an html page. Use CSS and JavaScript where appropriate

  • This prompt requests JavaScript and CSS explicitly. While it includes JavaScript and CSS, the generated code quality could be better.

Gemini, you are an experienced web developer. Your task is to produce a web page using the provided wireframe. Make sure to design and place all the elements exactly as shown in the wireframe. Also use CSS and JavaScript where appropriate.

  • In this prompt, I added more refined details such as level of expertise and expectations to design and place all the elements as shown in the wireframe. This prompt worked with various types of wireframes.

If you’re interested in trying the prompts, remember a wireframe is required. You may choose to use this wireframe or create your own. I have provided a detailed guide to walk you through the process.

Step 1: Navigate to Vertex AI Studio

After accepting the Terms of Use, you are all set to use the Vertex AI Studio. No login or credit card is needed.

Figure 1: Vertex AI Studio

Step 2: Insert the wireframe and write the prompt

Upload your wireframe using the “Insert Media” dropdown. Then, paste one of the earlier prompts. If everything works, you should see a result similar to this.

Figure 2: Vertex AI Studio, with wireframe and prompt

To process your wireframe and prompt, simply click the right arrow button located at the bottom right corner. This will send your input to the Gemini model.

Step 3: Render the response

The response will be returned in text format, but you will notice that any HTML code within it is wrapped in backticks with the ‘html’ decorator (``` html). To view the rendered HTML, you’ll need to copy the code and either paste it into an online HTML renderer like W3Schools or save it as an .html file and open it in your browser.

Here’s an example of the output you can expect, based on the input wireframe that I have provided. Due to the nature of Generative AI, your results will vary. Each iteration offers a unique and creative take, with different colors and styles, etc. Pro tip: Save the HTML file to revisit specific iterations later.

Figure 3.1: Input Wireframe
Figure 3.2: Output Webpage

Making It Happen

After numerous experiments, I was ready to create an application that can convert hand-drawn wireframes into working websites. I chose Flask as my backend because I’m more comfortable with Python, and it integrates easily with most AI/ML libraries. (Of course, other languages like Java, JavaScript, or Go would work too!).

The backend is integrated with the Gemini family of models for the core functionality. For the frontend, I went with HTML5 and JavaScript to form a base website and Tailwind CSS to make everything look polished. The upload page for the application was as shown below:

Figure 4: Upload Page

Of course, it wasn’t all smooth sailing. I encountered a couple of challenges along the way. For instance, if a user uploaded a screenshot of an actual website instead of a hand-drawn wireframe, the model would sometimes flag it as a potential copyright infringement and refuse to generate code. To minimize this issue, provide only the hand-drawn original content instead of a screenshot from a live website.

I also noticed that for false positive instances (i.e. when the safety filters wrongly flagged a wireframe as copyright infringement), retrying again would resolve the issue.

Additionally, the output from Gemini could be enclosed in code block (```html ``` quotes), requiring a post-processing step to prevent formatting issues. These hurdles were overcome with careful prompt engineering and some additional code logic, including using str.replace().

Once the demo was functional, the application was dockerized and hosted on Cloud Run. Cloud Run’s instance autoscaling let me effortlessly handle increased traffic without any complex configuration. The “always-on” instances feature also eliminated cold start delays, ensuring a smooth user experience.

Demo in Action

Earlier this year, I had a chance to present the demo at Google Cloud Next’s Innovator’s Hive area. The initial version was straightforward: users would draw a wireframe, take a picture, upload it, and the system would generate the corresponding HTML, CSS, and JavaScript code.

However, I felt something was missing — the creative element. So, I added a feature allowing users to customize the prompt. This simple change opened up a world of possibilities, leading to incredibly creative combinations of prompts and wireframes.

Figure 4: Generate Page

One highlight was when a game designer cleverly submitted a sequence of frames depicting a bouncing ball, along with a prompt requesting HTML5 and anime.js for animation. The result? A dynamic canvas element showcasing the ball in motion, a testament to the creative possibilities unlocked by user-defined prompts. Brilliant!

The result was mesmerizing! A perfectly rendered bouncing ball, brought to life on the screen, showcasing the potential of user-driven prompts. Take a look:

Figure 5: Output Animation

Next Steps

Want to understand how this all works? This codelab walks you through the entire process of building a sample wireframe-to-website converter. You’ll gain insights into the various tools and how they fit together, plus you’ll have a functional website to experiment with.

This demo was a hit and a solid starting point for an interactive demo, but I’m keen to take it further. For future iterations, I will include prompt improvement, a retry mechanism for handling errors, and the ability to process live webcam feeds directly. Stay tuned for Part 2!

--

--