Making an AI-Assisted Game in 14 Days: Concept (Part 1/4)

Jumping On That Hype Train

10 min readFeb 5, 2024

Why?

There’s a lot of attention placed on “AI” these days. People make all sorts of claims. Supposedly, a developer working together with current AI tools can make apps faster than one working without them.

In that sense, AI “augments” a person’s ability, similar to how chess players now work with computer chess-playing systems to reach new levels of competition.

In that sense, we’re all cyborgs now.

Is that true?

I can’t speak for anyone else, but I wanted to find out for myself.

Around 2 weeks ago I was sick and lying in bed and an idea popped in my head. How fast can someone make an app from scratch with the help of free AI?

I challenged myself to make one, as much as possible, in 14 days.

Purpose.

Some questions I wanted to answer:

How should one begin making an app with AI assistance? Given AI tools are relatively new and there’s a confusing flood of info out there, I thought a beginner’s step-by-step workflow would be helpful for people.
Given the current state of “AI” (machine learning, mainly deep learning), what are practical problems and pitfalls people will face when attempting to build something with them?
What advantages does different types of AI bring, versus building stuff on your own?

This series of posts is for beginners and is from a beginner’s perspective in using AI tools.

The first post sets out beginner concepts and AI tools I chose. If you want to see highlights of using the tools themselves, skip to the next post.

I am documenting highlights for the next parts, and at the end of the series will list some takeaways and future considerations after my experimenting, to prepare for the future.

Concept

The app concept I settled on was a game I’ll call Chibi Toss. It’s very simple: Toss the chibis onto the bed.

Chibi Toss gameplay

There were a couple reasons I chose this particular game:

Simple and straightforward mechanics. I had limited time and didn’t want to be bogged down by too many design choices
Visually, doesn’t need to be perfect because it’s 2-D and more casual

More Reasons

Game-specific models already exist, as well as attempts to make games with AI assistance. For image models there’s static generated RPG images, like treasure chests or gems. For development, there’s some videos like here and here of making Mario-like platform games.

I wanted something a bit different.

The game has its own set of little challenges. The visual style is not quite typical, and I wasn’t aware of any common models that could fit well.

For coding, there’s the fact it’s a mobile app and not an RPG or platformer. I wasn’t sure if free AI had the necessary training data to help here. There’s some swipe action going on and a chibi must track multiple states.

Finally, I decided a chibi needed to have multiple facial expressions but a consistent look, which happens to not be as easy as it sounds for AI.

There’s also reasons I settled on a game instead of a regular app:

Games have more widespread use of images and other assets compared to other apps, which allow experimenting with different AI
Images also let me show results more visually
As of early 2024, AI generative models struggle with consistency.

The consistency problem is a significant one. This means animation is a big no-no, unless you want to see an acid trip.

Facebook’s Make-A-Video back in 2022. Definitely impressive, but note the bottom-right bear gives you nightmares.

Chibi style has big, expressive emoji-like faces, so I wanted to use this to partly make up for the lack of animation.

Example chibi. Exaggerated expressions are expected.

For regular apps, I also wasn’t confident AI could provide visual components that could match with one another. If you think about apps like YouTube, DoorDash, Discord, etc. there’s a common style in terms of colors and shape.

Plus, with many apps there’s not as much incentive to e.g. make AI-generated toast messages, buttons, or dashboards compared to using a regular old software library.

The discord app. Note the consistent styling for fonts, shapes, etc. required. I had doubts certain AI can be useful here, but likely in the near future. A lot of standard icons which leaves less room for AI images too.

Games often require more customized assets. Usually they need to look distinctive compared to other games.

Angry Birds. Lots more images, fonts, audio etc. and they are particular to the individual game.

The AI Landscape

By now you’ve probably figured out I was trying to think ahead and pick out appropriate tools for the app I wanted to make.

You’re correct.

Couple terms: By “tools”, I simply mean AI models and supporting software to use such models. Model is just a file, putting it simply.

You can run models on your own computer or in the cloud.

To have models run the best and fastest, you ideally run on specialized hardware, the GPU (Graphics Processing Unit), which is available on certain computers. Though it’s also possible to use a CPU (Central Processing Unit).

GPUs have the advantage because they can run repetitive calculations in parallel, which is good for AI models. Image from Heavy.Ai

The cloud becomes a sensible option when you need a lot of computing with GPU, but choosing a cloud service depends on your app’s use case. Consider having sensitive data which may be passed though a cloud service, or saving money on computing costs if you have hungry models, for example.

I don’t focus on cloud here and just used my own computer for a lot of the work.

There’s also a ton of models out there already. Some are trained on large amounts of data for specific tasks. Some for generating text, images, even video and audio.

This was back in 2022. Now, there’s even more choices. From a forum.

Even among one task like text-based code generation you can find at least 5 models with a quick search.

Knowing which to use is a headache in and of itself. I wanted to spend more time on implementation than research, so I made quick choices.

AI Tools

My rule for the AI tools were they had to be free and publicly available in some form.

I decided on the following:

ChatGPT 3.5

Accessed through the chatbot’s website. You only need to make an OpenAI account.

ChatGPT came out Nov. 2022, made by OpenAI which now has heavy investment from Microsoft, and I assume you’ve heard about it.

GPT stands for Generative Pre-trained Transformer, a transformer being a deep learning (aka deep neural network machine learning) model introduced by Google researchers. ChatGPT is able to respond to texts essentially by placing one word in front of another, and predicting the most likely next word to use in a sequence.

A good explanation by Filip Vitek on The Mighty Data.

It happens to be able to write code and answer coding queries, so I decided to use it as my coding assistant. There’s other coding assistants and free transformer-based models, but ChatGPT is the most famous, which weighed in on my decision to use it.

As of this writing the OpenAI company has two versions of ChatGPT, ChatGPT-3.5 and ChatGPT-4.0, but 4.0 requires a paid subscription (unless you use Bing).

So, I picked ChatGPT-3.5, though it may be less capable.

Stable Diffusion

Stable Diffusion, a generative model made by Stability AI, made huge waves when it first came out in Aug. 2022. It’s free, open-source, and able to generate a variety of images based on simple inputs.

Like other models it’s embroiled in controversy due to the fact its training data was scraped from online sources without artists’ consent. I don’t want to focus on that here, only on its current abilities.

Image generated by latest version of Stable Diffusion

Stable Diffusion is a latent diffusion model, which is a generative model that works by repeatedly removing applied random noise, or “denoising”, training images, guided by your given prompt, to get the output image.

Good explanation of how Stable Diffusion works by Tom Piechota on LinkedIn. Denoising done by a U-Net in the model, itself a type of neural network model.

There are a lot of versions of Stable Diffusion (SD): the original v1.4, improved v1.5, and the latest version as of writing called SDXL.

I chose SD v1.5 because its results are much nicer than v1.4, and just haven’t had time to mess with SDXL yet.

You might wonder, how did I access Stable Diffusion? Well, that brings me to the next tool:

Automatic1111

To be clear, this is not an AI model itself. Rather, this is a web-based user interface that unlocks a ton of options for Stable Diffusion. It’s open-source and free to install.

Part of the confusion with AI models, I think, is the dizzying amount of options you have for how exactly you want to get started.

Automatic1111 shows you the many choices for making images in all their glory, which results in a somewhat cluttered UI for a beginner.

Intimidating as a first-time user. Look at all the boxes. My instinct is over time we’ll see more streamlined UIs for the masses.

There are multiple types of prompts, not just text-based but image-based, and also extensions for getting specific types of images, which you can find from websites.

You can run Automatic1111 + Stable Diffusion either locally on your own computer or on the cloud. I used my own computer.

Side note: The cloud service Google Colab has banned the use of Automatic1111, so you will have to either find another cloud service or go local.

In a later post, I will go over Stable Diffusion with Automatic1111. There are other details I will explore, like other models that go together with Stable Diffusion called LoRAs. But for now just pointing out this is what I used.

Tech Stack

Now for some non-AI parts. I think it’s good to give a whole picture of what’s going on so you reading this can figure out how your own work might differ from mine.

Xcode

I wanted to make the game as an iOS app. Xcode is the standard editor for this.

SpriteKit

Typically, a game is made using a game engine (Unity, Unreal Engine, Godot, etc.). I decided not to use any of these and to use SpriteKit instead.

This is probably not what you should do if you’re making a commercial game. Use a game engine.

Why did I stick with SpriteKit? A few reasons:

Code-focused: SpriteKit doesn’t need a specialized game editor to work. I did not want to spend time fiddling with e.g. the Unity editor for certain aspects. I wanted focus on code itself to see how much useful code ChatGPT could produce for me.
Speed: I have some familiarity with iOS development. I wanted to quickly test out my app on my own device.

I was asking ChatGPT what I could use to make a game and it mentioned SpriteKit. After a quick search, I figured out SpriteKit was built into the iOS software library, made for 2-D games, and worked out-of-the box.

This would be a simple, 2-D game for my iPhone, so I was ok with the tradeoff.

Online Image Editor

Well, it’s not a part of the real tech stack, but it’s required when dealing with images. I used an online alternative to Photoshop, Photopea. I ended up spending quite a lot of time with it, so I’m mentioning it here for completeness’ sake.

Dev Background

When some guy on the Internet writes an article like this, I think it’s important to know that person’s background, to get a sense of their current knowledge and skills. They may know things you don’t, or you know things beyond them. That way you can gauge how useful what they say is for your own situation.

I have a programming background. This turned out to be quite helpful when I ran into errors.
I am not a professional game developer. If you are one, I will bluntly state I don’t make the best design choices in 14 days. That said, I have made a couple small games in the past as a hobby and for classes.
My current focus is on app development, not gaming specifically. In fact starting this project I had no knowledge of SpriteKit and some terms were new to me.
I can’t art. No photoshop skills. This turned out to be very, very unhelpful.
I have dabbled in using AI on my own a bit. I know about machine learning concepts. But still learning about how newer models work.
Still exploring how best to use models for a particular domain like game development or app development. Got ideas — don’t have all the answers.

It’s my 1st day. I go to ChatGPT-3.5 (hence ChatGPT) and decide to focus on the game mechanics first.

I’ve tried ChatGPT and know already it lies harder than Pinocchio. I need to be careful. It also can’t do anything too complex. It requires detailed input, so I decide to move in stages, and break each stage into distinct chunks.

My first step is to introduce a sort of “skeleton” of how I want my objects to interact, without worrying about how it looks yet. How to setup the game scene, show a basic shape, setup physics, and so on.

Immediately, disaster struck. Read next.