MoMALisa

6 min readDec 13, 2023

This article was produced as part of the final project for Harvard’s AC215 Fall 2023 course.

Table of Content

Introduction

MoMALisa allows users to walk between two artworks in the latent space. What does this mean? A user picks the start and end of their journey and MoMALisa generates images that are a fluid transition between the two. For example, one can transition between a giraffe and a battleship and see the output as an animation.

In this blog post we explain our inspiration for the project, the machine learning models and the deployment we use.

API

We use the FastAPI Python implementation of RestAPI to connect our frontend with the backend. The GET request to the api provides the model with two strings which it uses to create n continuous images between those two strings. The resulting images are stored in the saved_predictions GCP bucket. The frontend then makes an API call through the Google Cloud SDK to the bucket to stream the image. The image is shown on the frontend and the user can download it as well now.

Machine Learning Modelling

The animations are generated with StableDiffusion. The user inputs two strings in our frontend. An image embedding generates latent space representations of these two images. We then draw a high dimensional line between the two points and pick new points along this line. For each of those n points we generate an artwork by forwarding the representation through the repitive U-Net element of the StableDiffusion architecture. This results in n images.

To ensure a smooth transition between the final images we have to fix the noise added to the latent representation. Adding equal noise to all n during the image generation ensures that the final outputs create the illusion of continuity which is essential for the animation.

Technical Architecture Overview

Before jumping into the technical details of the frontend and deployment, we show an overview of the technical architecture in the image below. The technical architecture provides a high level view from development to deployment, illustrating the interactions between components and containers. It provides the blueprint of the system, helping to understand how the system will work.

For source control, we will use GitHub to keep track of our code & the changes to it. We will use Vertex AI & GCP for model deployment,
and GCR to host the needed container images. GCS buckets will be used to store the outputted gifs. GCE persistent volume will be used to store any files that need to be persisted when container images are updated. We will use a virtual machine instance on GCE to host a single instance with all the needed containers running on it.

Frontend

The frontend design is developed in html. The frontend we have created, allows a user to input two objects, from which a gif will be generated that captures the latent space walk. The generated gif is placed on the left side of the website and can be downloaded from the website. Our website additionally has a GIF gallery, where users can view previously generated gifs and gain inspiration. Lastly it has a small section discussing the project, link to our source code, and the team members.

Overall, we implemented a minimalist design, that is intuitive and easy to use. User’s can easily generate the gifs they wish to, and download the generated output. The overall design is artistic, that fits the target market of users who are interested in modern art.

Please see a demo of the website at the end of the video, linked at the bottom of this blog post.

Deployment

The backend API service connects to our deployed model which is hosted on Vertex AI. This allows us to make predictions, generating images along the latent space walk. When we call the model, the predictions are also written to a GCP bucket. We have two versions of deployment. The first, uses a single GCP Virtual Machine instance whilst the second deployment utilizes Kubernetes Engine. The latter is more scalable and allows us to handle more requests.

Both deployments utilize Ansible to automate the deployment process. This reduces the deployment time and allows us to easily deploy our application to multiple environments. The reason for this is that the manual steps like VM creating and installing dependencies are now automated, and can be easily executed from command-line. The step-by-step instructions for how to perform deployment can be found in the README on our GitHub repository. The following section provides a high-level overview of the deployment process. These steps collectively automate the deployment process, making it efficient and reproducible using Ansible.

Single GCP Virtual Machine Deployment:

Build the deployment container: Executes the docker-shell.sh script to build the deployment container, containing the necessary dependencies and configurations for our application.
Build & push docker containers to GCR: Uses the ansible-playbook command with the deploy-docker-images.yml playbook to automate the build and push of Docker containers to Google Container Registry (GCR), ensuring that the latest application images are available for deployment. Two containers are built and push — one for the api-service and the other for the frontend.
Create a VM instance in GCP: Utilizes the ansible-playbook command with the deploy-create-instance.yml playbook to create a Virtual Machine (VM) instance in Google Cloud Platform (GCP), specifying the instance details through the inventory.yml file. This sets upthe necessary networking architecture for communicating with the external world from the VM by specifying things like what firewall to use.
Provision the VM instance: Executes the deploy-provision-instance.yml playbook to provision the created VM instance, configuring it with necessary settings and ensuring that it is in the desired state (specified by cluster_state=present). The necessary settings include Docker and the Ansible libraries, as well as mounting the persistent disk.
Set up the containers: Uses the deploy-setup-containers.yml playbook to set up the Docker containers on the provisioned VM instance, ensuring they are ready to run the application.
Deploy the webserver: Executes the deploy-setup-webserver.yml playbook to deploy the web server, which includes configuring the necessary settings to serve the application and make it accessible. We use NGINX for this step.
Visit the website: Provides the external IP address of the VM for accessing the deployed application. Users can visit the website by navigating to http://<external_ip> in their web browser.

These steps collectively automate the deployment process, making it efficient and reproducible using Ansible.

Kubernetes Cluster Deployment:

The K8 cluster is generated in a single command, by running the Ansible playbook deploy-k8s-cluster.yml. This Ansible playbook automates the creation of the Kubernetes cluster on Google Kubernetes Engine, and the deployment of multiple containers within that cluster. The main steps that are executed in this file are listed below:

Create GKE cluster
Create node pool within the cluster, with the specific configurations such as machine type, image type and autoscaling settings.
Connect to Cluster, which updates the kubeconfig file to include the new GKE cluster, allowing subsequent commands to interact with the cluster.
Kubernetes setup: creates the namespace, and install NGINX Ingress using Helm in that namespace
Deploy the necessary containers for frontend and API-service
Deploy the frontend and API Service and exposing them as NodePort services
Set Nginx Ingress IP
Create the Ingress Controller, which creates an Ingress resource to route traffic to the frontend and API service, by setting up path-based routing.

Next steps

We note that currently we are only generate 4 points in the latent space walk between the two user inputted points. The result of this is that the latent space walk is coarse, and does not smoothly transition from one object to the next. The reason we have set the value is so low, is that for any higher number of points there is a timeout when calling our model. The reason for this is that a diffusion model is a large model, that takes considerable time to predict. We would like to improve this in order to generate smoother gifs. We would like to explore deploying multiple models in parallel, and then splitting the predictions across these models so that neither time out, also reducing latency. This will require extensive compute power, so it was not covered due to limitations of Google Cloud credits.

Links

Project Github Repo — Link

Video — Video