MADE IN AI

Cyrus James-Khan

Follow

14 min readAug 23, 2022

--

VOL. 1

Prompt: “A 2D platformer from the 1990’s where you play a robot character in a jungle environment”

Do Androids Dream of Game Development?

A study of integrating AI in games and development by Cyrus James Khan.

Today is the year 2022, we have reusable rockets, self-driving cars, software capable of rendering photorealistic imagery, immutable digital assets made possible through blockchains, and even AI models which can create text and visuals for us via simple prompts. The world has just come out of a pandemic and we are gearing towards a new chapter in human civilization with the increased emergence of AI.

There is a lot of concern coming to the foreground as AI becomes closer and closer to the singularity and takes down one job after the next while heading there. But what are we mere mortals supposed to do in the meantime? How do we collaborate with, and leverage this technology in our personal lives?

Do we let AI do all the work, or do we continue to push the boundaries, and explore ways in which these new powerful systems can compliment or augment our creative process?

After spending some time with OpenAI as part of their beta programs and playing around with deepfake models since 2018, I have new discoveries which I would like to share with you and that I hope will answer some of these questions and spark some inspiration for some of the applications in which an AI model can be used, using video games as a frame of reference.

*Prompt: “A realistic painting of a robot playing a 1990’s video game at an arcade”*

Chapter 1: GANimation

Before games like Doom and Quake changed the face of the gaming industry forever, the majority of games were created, perceived, and played in two dimensions. Limited to very small amounts of bits of data, designers, composers, and programmers had to find very clever techniques to realize their digital dreams despite the shortcomings of hardware. One of the most impressive hacks of all was sprites.

A sprite is a two-dimensional non-static element in a 2D game moving separately from the background, and is typically used for motion or interaction. They are often used to represent player characters, NPCs, props, and obstacles. In many cases, the sprite is the only thing that the player controls on the screen. Sprites can be combined together in what are called sprite sheets, a series of static sprites that are played in succession to create the illusion of motion. This was advantageous back in the day, as instead of loading a new image every frame, which would be very memory-consuming, it could instead load just one image but only display a certain portion of it at any given time.

*Source:* *https://img.craftpix.net/2020/01/Free-3-Character-Sprite-Sheets-Pixel-Art2.jpg*

While Sprites have been for the most part overshadowed by their polygonal 3D successors, the love for old-school 2D platformers has never faded, and has been brought back countless times over the years through reimagining the medium using today’s technology. After all, there is always going to be a supportive fanbase for the genre just out of sheer nostalgia for those simpler days.

Now that we have some context, let’s start with a simple 2D video game where every sprite visible and interactable to the player is AI-generated.

We will be using DALL-E 2, a powerful artificial intelligence program that creates images from textual descriptions. It uses a 12-billion parameter training version of the GPT-3 transformer model to interpret the natural language inputs and generate corresponding images.

A very interesting prompt I have found for DALL-E is to include the words “Sprite sheet”. You will find the prompt will generally produce a series of variants of a subject inside a single image.

In this example, a sprite sheet has been generated and applied to a controllable character using a game engine, the background is also a prompt result that has been alpha cut and tiled.

Prompt: “A sprite sheet of a cartoon sun”

You can see the animation for the walk cycle is a bit jittery and imperfect, this is due to DALL-E’s current limitations and is still something that has to be cleverly worked around. Elements that seem to currently work the best are more abstract loopable visual effects such as fire, energy balls, clouds, etc.

We can also generate static assets such as rocks, mushrooms, ferns, clouds, and background gradients. Descriptive prompts and exploration are necessary when trying to obtain consistent art-style results.

The generated sprite sheets for this character have not followed the instructions of a walk cycle to the letter, yet they still provide very interesting variations of one design.

This can almost be used as a DALL-E hack even if your final intent is not to generate a flipbook but simply to have variants of the exact same character or object. If we were to use the typical variation function, it would most likely fetch alternative designs that would be too far from the original subject and art style.

*Example of the DALL-E variant function*

*Prompt: “A sprite sheet of a robot walking animation”*

Here are some of the sprite sheets for the other elements and visual effects in our game world, my favorites are definitely the fire and cloud results:

Prompt: “A sprite sheet texture of a fire animation”

*Prompt: “cartoon fern bush with blue background”*

*Prompt: “a sprite sheet texture of a silver robot spider walk cycle animation, greenscreen background”*

*Prompt: “A sprite sheet texture of a cute blue flame character animation”*

*Prompt: “A sprite sheet texture of a cloud animation”*

In some cases, I recommend specifying a solid chroma background to help extract the subject more seamlessly, as DALL-E, unfortunately, can’t provide us with an alpha keyed texture just yet.

A big portion of building a video game that limits a lot of independent software engineers who aren’t visually inclined is the art and design, this could enable them to alleviate that portion of their work and focus on the building. Not to mention artists could also benefit from this and scale their production, especially once training a custom model becomes more abstracted and users are able to generate prompts based on their own datasets, making the results much more fine-tuned than multi-purpose models that are currently out there.

Though we have yet to see video generators perform to the extent of models such as DALL-E (Though very impressive progress), I hope that this little “hack” shows some ideas of what can be produced now and the new set of creative possibilities that will be available for motion generation in a not too distant future.

Chapter 2: Story Generation

*Prompt: “A robot writing a book next to a window in a cyberpunk city, realistic painting”*

Creating video games usually requires a good story, whether that is subtext or not. Perhaps you have a story but it needs revision and proofreading, or perhaps you aren’t quite as inclined in literature as you are in development. This is an area where language models can also help us in speeding up the process or even coming up with new creative ideas based on our initial configuration. While this is less of an issue for larger teams, it’s something that could greatly benefit the indie space.

*text-davinci-002 GPT-3 language model by OpenAI* */ Grey area is the AI response*

In this example, we will use OpenAI’s language model to help us write a story and define characters for our video game. Text-davinci-002 is one of the most complex chatbots out there and is capable of helping you write or rewrite stories, come up with formal emails, and even help you edit or write code and scripts.

Bear in mind both prompt and completions require tokens, usually equivalent to a few words. A simple way to approximate token count is to divide the number of characters by 4. The more tokens required in the prompt the more expensive things get, same goes for completions, we need to set a cap in order to limit it from going overboard (1K token is about 0.06$ with Davinci). We can also use stop sequences, they are keywords that will cancel the response generation.

We can create a detailed character biography from just a few contextual inputs to give the chatbot some context: the more context, the more relevant the answer. Good formatting is crucial for getting focused and less costly responses.

You can see how this can become powerful to help flesh out the themes, story beats, and even character descriptions of our game, or even to feed them into image generators for concept design.

*Prompt: “A robot performance of Romeo and Juliet, realistic painting”*

Chapter 3: GPT-3 NPCs

*An AI variant of a picture of a woman playing* *The Sumarian Game*

This one might be for the more nerdy types but I believe this will eventually become part of how we build and interact with NPCs in games.

One of the first video games that came before we had complex graphics capabilities was text-based adventures. Since their very first creation, we have continued to tell stories through dynamic characters and allowed the player to carve their own path. Let’s take a look at today’s AI and ponder how these deep learning systems could also be applied to virtual experiences.

In this use case, we will connect a GPT-3 language model to a 3D video game using OpenAI’s APIs. The game we will use for this study is PodzWorld, a title that is currently in its alpha stage as part of my Web3 project, CryptoPodz.

Below is an example of what our JSON Body will look like in the HTTP (POST) request we will perform from inside the game environment.

We can format our prompt to provide more context to the language model to provide a more focused and relevant answer that we can display and use in the game world. We will inform it of its name, history, environment, as well as different abilities.

*Example of API prompt / Content-type JSON*

The green text is the context that we will give the AI before asking our question. The blue area will be where we have a dynamic text input based on what the player types in the game. The yellow area is where the AI will fit in its answer to the player.

Above is one of the responses obtained from our prompt. The response text string will be extracted and displayed on a typewriter UI inside of PodzWorld.

*Example of HTTP Request body inside an Unreal Engine blueprint. Input pin B is connected to the string we capture from the player’s typed input.*

We can then program some keywords to be detected to trigger specific events. For example, I have included a special keyword the AI can say in order to destroy the player. If the player asked the AI to terminate them, it would eventually respond “HASTA LA VISTA BABY”, which will be detected by our script that will set the player's health to 0.

*Example of basic capture text logic inside UE using blueprints.*

In this example you can see the AI chatbot in action, it is being interacted with from within the game world via a text input box.

https://youtu.be/2weVS3HcaPk / Flyby: https://youtu.be/AXtpe6tle1Q

Of course, using a high-level base model such as text-davinci-002 will be a costly endeavor if implemented for players to use non-stop. Ideally, we would want to use a lower tier but yet performant models such as text-curie-001, or text-babbage-001. Though they are less sophisticated models, the answers can be quite relevant to the initial prompt and delivered at a higher speed and much lower cost.

Postman interaction and testing with OpenAI API

The more ideal option is to fine-tune OpenAI’s base models and train them with our custom data sets as seen in the example above, this allows us to create a more focused and relevant model as well as lower overall costs due to less required context inputs as seen in the previous examples.

Instead of including the context at every prompt (highly increasing token count), we would train a custom fine-tuned model to already have our required datasets. Our prompt can now be much cheaper as it won’t add a whole paragraph of context every time the user communicates with the chatbot, and will also be more focused on the task and environment at hand. In my scenario where I only need a single sentence response, I saved about 250–300 tokens a prompt moving over to a custom model.

You could also make an entirely custom GPT-3 from scratch but that would take some time to train it with the same amount of base datasets that OpenAI’s models have, not to mention it would require a cluster of computers to run. (A GPT-2 model could run on a single machine, but it’s hard to roll back after seeing the amazing completions of newer models.)

Prompt: “A robot tinkering on another robot, realistic painting”

We can imagine a scenario where we have multiple GPT-3 models for different purposes which we can switch to depending on the complexity of our needs. It would be capable of perceiving, predicting, and interacting inside an environment. I recommend having a look at the five deep learning models trained solely to play Dota 2 or the Video PreTrained model playing minecraft.

The basic AI controllers and behavior trees that we have today are excellent for simple tasks but imagine the interactions and conversations you could have in an RPG game with this additional layer of intelligence added to NPC-Player interaction. (I’m looking at you Cyberpunk 2077).

NPCs in games won’t just be limited scripts but will become dynamic and vibrant characters capable of profound conversations with the player and deep awareness and understanding of the game world.

*Overview node-flow of our HTTP request inside Unreal Engine*

Chapter 4: 3D Displacements and Terrain

For our final study, we will look at a more straightforward use case by coming back to DALL-E and using image generation to create textures and depth maps which can be used in very interesting ways once stepping into a 3D environment.

I absolutely love using image generators for textures that can be materialized.
In this example, I am searching for different variants of an abstract circuit texture which I can then apply to material inside Cinema 4D.

As the output resolution is limited to 1024x1024 pixels, we can use some other tools such as Gigapixel AI to help scale up the resolution.

*A personal Cinema4D animation of an AI-generated displacement map.*

We are not limited to just color but also depth. Displacement maps are grayscale images used to create the illusion of depth on a 2D surface by displacing the pixels in an image according to the intensity of the grayscale values, which adds detail without increasing the number of polygons.

I had a lot of fun discovering new patterns from circuit boards, to cities. In this next example, I am trying to obtain topographic depth maps for a science fiction city layout. It took some fiddling, but just adding or adjusting some words can yield great results.

*“A depth map of a sci-fi city, top-down view”*

*“A depth map texture of a sci-fi city, top-down view, black and white”*

An area that I personally love in general is landscape design. I have been using computers to generate fractal noises for the longest time, but there’s definitely something different when getting a source pattern from a system like GPT-3. There are so many possibilities in which we will be able to gather resources for 3D production.

*“A photorealistic topographical photo of long canyons, top-down perspective, depth information, black and white, z-pass”*

I found these methods to be a fun way of translating current generator outputs into usable 3D assets. Though I know we are not far from a model capable of directly generating voxel shapes based on any prompt. (There are already some attempts at generating 3D assets using deep learning, and some great successes using Neural Radiance Fields, NeRF.)

Who knows if future metaverses will even have 3D vectors anymore but rather have everything generated from the neural to the pixel at a ludicrous speed completely skipping over an actual 3-axis digital space.

Prompt “A matte white cyborg with a series of wires attached to his brain, realistic painting, cyberpunk”

I hope you enjoyed this summary of my latest explorations and hope to write about some more discoveries in the near future. While these methods are still a little ways away from being production ready, they provide a great sample and foundation to explore further. We are entering a fascinating and scary era in our society, and I hope we can find the right balance of curiosity, ambition, and skepticism moving forward. These technologies will be evolving regardless of if we like it or not, but they can become powerful and loyal friends and collaborators of ours if we develop them correctly, and understand them fundamentally.

With that, I would like to leave you with some words from our friend Davinci (text-davinci-002) from the prompt “Can you come up with an inspirational quote regarding AI and Art?”:

“Art is the expression of the soul, and AI is the expression of the mind. Together, they can create something beautiful.”

Bio: My name is Cyrus James Khan, I’m a digital artist and tech enthusiast who’s been learning computer graphics since I was 8 years old. I am a self-taught 3D Generalist, meaning I operate in everything from 3D modeling, rigging, texturing, animation, lighting, and rendering.

Visual arts has been my passion since I can remember, along with a deep appreciation of science fiction and technology, which has shaped my career path to this day. I have always used computers to push the boundaries of my imagination, and strive to remain on the cutting edge of technology, including working in Virtual Reality, Game Development, Blockchain, and AI.

You can find more of my work and stay tuned on my developments via my linktree https://linktr.ee/cyrusjameskhan