Leveraging OpenAI GPT-4 Vision API for SwiftUI App Generation

Jake Lin
7 min readNov 7, 2023

--

Introduction to the Experiment

In this experiment, I explore Large Language Models (LLMs) and Large Multimodal Models (LMMs) for source code generation. My focus is on creating a property listing app using SwiftUI, drawing upon my expertise in this domain.

Development and UI Design Reference

The app is developed using publicly accessible data, referencing the screenshots from the realestate.com.au App Store app. It uses SwiftUI with the Model-View-ViewModel (MVVM) architectural pattern.

Architecture Diagram

Below is the architecture diagram of the app:

Exploring Various LLMs for Code Generation

My experiments explored various LLMs, including OpenAI’s GPT-3.5-turbo, GPT-4, Google Bard, Anthropic Claude, Meta Code Llama, and GitHub Copilot. These models assisted in generating non-UI layers, including the Model, Service, Repository, and ViewModel, all derived from a single prompt message.

Generating SwiftUI Code

Using ChatGPT-4 with image upload capability or Google Bard, I successfully generated SwiftUI code for the UI layer. The following diagram illustrates the LLMs used in the code-generation process:

Introducing GPT-4 Vision API

A few hours ago, OpenAI introduced the GPT-4 Vision API to the public. Having previously used GPT-3.5-turbo and GPT-4 models for code generation, this new API enabled me to address the final challenge: generating SwiftUI code through pure API interaction, yielding impressive results.

Generated App Demo

The app, as demonstrated in the above video, is fueled by live, publicly accessible data. It can display any property listing on the market in Australia. The code generation approach can be adapted to various scenarios, such as car or job listings, requiring two key inputs: an API with a schema and UI images as a design reference.

Generating Code Using OpenAI GPT-4 Vision API

For generating the Model, Service, Repository, and ViewModel, we use the GraphQL API schema and query. However, this article focuses on generating SwiftUI code using an image and the OpenAI GPT-4 Vision API.

API Usage Example

Calling the OpenAI GPT-4 Vision API is straightforward. According to the OpenAI Vision API documentation, we pass either a URL or a base64 encoded image as part of the user’s content, as demonstrated below:

payload = {
"model": "gpt-4-vision-preview",
"messages": [
{
"role": "user",
"content": [
{
"type": "text",
"text": "What’s in this image?"
},
{
"type": "image_url", # this part is new
"image_url": {
"url": f"data:image/jpeg;base64,{base64_image}"
}
}
]
}
],
"max_tokens": 300
}

Image Used for Code Generation

The following image, a screenshot from the realestate.com.au App Store app, was used to generate the SwiftUI code:

Enhancing Generated SwiftUI Code Quality

To enhance the quality of the generated SwiftUI code, a detailed and clear prompt is essential. I divided the message into a system message and a user message for reusability.

System Message

The system message remains consistent for all UI components. Here is the reusable system message:

system_message = '''
### SwiftUI View Generation from ViewModel
In this task, we aim to generate a SwiftUI view, underpinned by a ViewModel. The ViewModel facilitates data representation and logic, while an uploaded image will serve as a reference for the UI design.

#### Instructions:
1. **ViewModel Understanding**:
- Examine the ViewModel meticulously. Recognize its properties, computed properties, and methods. Remember, in this scenario, our ViewModel doesn't conform to `ObservableObject` and lacks `@Published` properties.

2. **Core Design Tenets**:
- **Separation of Concerns**: Retain a demarcation between UI and logic. The ViewModel encapsulates data and logic, while the SwiftUI view delineates the UI.
- **Reusability**: Structure the SwiftUI view for potential reuse across various application segments or even distinct applications.
- **Maintainability**: Adopt a modular design approach. Use comments to elucidate intricate UI elements or logic.
3. **Essential SwiftUI Elements**:
- **State & Binding**: Determine the apt instances for using `@State`, `@Binding`, and similar property wrappers. These govern data flow and the UI's response to data alterations.
- **Layout Utilization**: Capitalize on SwiftUI's layout structures, such as `VStack`, `HStack`, and `ZStack`, to emulate the UI design faithfully.
- **Async Mechanisms**: Given the ViewModel might proffer URLs or other asynchronous tasks, ponder using SwiftUI's `AsyncImage` or analogous asynchronous functions.

#### Sample Structural Code:
```swift
public struct CustomNameView: View {
private let viewModel: CustomNameViewModel

public init(viewModel: CustomNameViewModel) {
self.viewModel = viewModel
}

public var body: some View {
// Incorporate SwiftUI components based on the ViewModel's properties
}
}
```
'''

This app uses the MVVM pattern, where the SwiftUI view is powered by the ViewModel’s logic and data. In the system prompt message, we guide GPT in referencing the ViewModel to generate semantically accurate UI code.

User Message

Alongside the system prompt message, we can provide a user message outlining specific requirements, such as the designated ViewModel for the SwiftUI View. Here is an example of a user message:

user_message = f'''
#### Task:
Your task is to generate a SwiftUI View based on the View Model as below and only use the public properties and methods from the ViewModel. The generated SwiftUI view should be able to render the data from the ViewModel and the uploaded image(s) as expected:
```swift
{view_model_content}
`` `

#### Requirements:
- Ensure the generated Swift code is syntactically correct and error-free.
- Adhere to the design principles mentioned above for a robust SwiftUI view.
- Ensure the user interface matches the uploaded image design.
- When using `AsyncImage`, use this example
```swift
AsyncImage(url: URL(string: imageURL)) {{ image in
image
.resizable()
.scaledToFit()
}} placeholder: {{
ProgressView()
}}
`` `
- And only return syntactically correct Swift code according to the specifications provided. No explanations are needed.

#### Specific Requirements:
{specific_requirements}
'''

Given that the original GPT-4 training data cutoff was in late 2021 and SwiftUI’s AsyncImage was introduced in mid-2021, an example of AsyncImage usage was included in the user message to guide the model.

ViewModel for SwiftUI Views

In this user message, we can specify the ViewModel for different SwiftUI Views, such as ImageGalleryViewModel, as shown below:

import Foundation
import SwiftUI

public struct ImageGalleryViewModel {
private let imageGallery: ImageGallery
public init(imageGallery: ImageGallery) {
this.imageGallery = imageGallery
}
}
public extension ImageGalleryViewModel {
var imageURLs: [URL?] {
return imageGallery.imageURLs.map { URL(string: $0) }
}
var hasImageURLs: Bool {
return !imageURLs.isEmpty
}
}

This ViewModel supplies a list of image URLs for display.

Specific Requirements

For each SwiftUI view, we can set specific requirements. For the image gallery, we required horizontal scrolling, usage of TabView for edge-aligned scrolling, a fixed height of 320pt, and a badge indicating the number and total count of images. Here is the specific requirements in the user message:

specific_requirements = '''
- The image gallery should be able to scroll horizontally.
- Use `TabView` for the image gallery to make each image always scroll to the edge.
- The image gallery's heigh is set to 320pt.
- Add a number badge on the top right corner of the image gallery to indicate the current and total number of images. The badge is stuck on the top right corner with a padding of 8pt.
'''

Using the GPT-4 Vision API for Code Generation

Combining the system and user messages with the image, we used the OpenAI GPT-4 Vision API for code generation.

And here is the response from OpenAI GPT-4V Vision API:

```swift
import SwiftUI

public struct ImageGalleryView: View {
@State private var selectedIndex: Int = 0
private let viewModel: ImageGalleryViewModel

public init(viewModel: ImageGalleryViewModel) {
self.viewModel = viewModel
}

public var body: some View {
VStack {
ZStack(alignment: .topTrailing) {
TabView(selection: $selectedIndex) {
ForEach(viewModel.imageURLs.indices, id: \.self) { index in
if let url = viewModel.imageURLs[index] {
AsyncImage(url: url) { image in
image
.resizable()
.scaledToFit()
} placeholder: {
ProgressView()
}
.tag(index)
}
}
}
.tabViewStyle(PageTabViewStyle(indexDisplayMode: .never))
.frame(height: 320)

if viewModel.hasImageURLs {
Text("\(selectedIndex + 1) of \(viewModel.imageURLs.count)")
.font(.caption)
.padding(4)
.background(Color.black.opacity(0.5))
.foregroundColor(.white)
.cornerRadius(8)
.padding(8)
}
}

// Additional UI components can be added here
}
.onAppear {
// Set the initial selected index if needed
}
}
}
```

Please note that the `ImageGalleryView` is designed to be initialized with an `ImageGalleryViewModel` which contains the image URLs. The view uses a `TabView` to allow horizontal scrolling of images, and a number badge is displayed on the top right corner to indicate the current and total number of images. The badge's appearance and position are designed to match the uploaded image design.

The response from the API was a syntactically correct, bug-free Swift code snippet, directly applicable to the project. As seen in the video, the image gallery operates as expected.

Conclusion and Future Prospects

While there is room for improvement, the results were highly satisfactory, and I am eager to explore further possibilities with the OpenAI GPT-4 Vision API.

Please note, this project is a personal experiment and is not affiliated with realestate.com.au or any property agents.

--

--

Jake Lin

Dad, iOS dev. Passionate about design and animation. Make @IBAnimatable and SwiftWeather.