Ta2Studio — Using Stable Diffusion and Text Inversion to Create Personalized Tattoo Designs

Published in

Institute for Applied Computational Science

8 min readJan 24, 2024

This article was produced as part of the final project for Harvard’s AC215 Fall 2023 course within the Institute for Applied Computational Science.

Authors: Luis Henrique Simplicio Ribeiro, Emilia Mazzolenis, Isidora Diaz, Alison Yuhan Yao

All code can be found on GitHub. Please check out the demo video.

Background

Recent data from the Pew Research Center has uncovered the appeal of tattoos in the United States, with 32% of Americans having at least one tattoo [1]. This inclination toward body art is not exclusive to North America, as studies have shown a growing interest in tattoos in Latin American countries as well [2], emphasizing tattoo’s universal cultural significance. Tattoos serve various purposes from recreational expression to medical applications like camouflage of scars, and even in reconstructive surgery like nipple-areola reconstruction [3]. As the number of people around the world who are embracing tattoos as a form of self-expression and art increases, there is a growing need to ensure that everyone, regardless of their socioeconomic status, has access to creative, high-quality, and meaningful tattoo designs.

Ta2Studio is an app aimed at helping people design their own tattoos. Tattoos are often deeply personal and lifelong commitments, and many individuals have regretted designs they did not love (if you don’t trust us, watch a few episodes of America’s Worst Tattoos or Bad Ink!). Our app, however, uses stable diffusion and textual inversion and, through its simple and personalized interface, empowers users to create personalized tattoo designs based on their own ideas or existing images, putting the creative control back into the users’ hands.

Under the Hood: Tech Stack of Ta2Studio

1. Machine Learning Workflow

We used Stable Diffusion 2.0 [4] which is a conditional generative model able to generate realistic images based on textual descriptions, called prompts. Stable Diffusion is able to capture the necessary nuance present in the prompts and generate images with incredible amount of details, hence being our choice of text-to-image generative model. However, only using Stable Diffusion to generate tattoo templates would be very simple, and many apps allow users to do that already. Our main feature is the usage of a technique called Textual Inversion [5] to allow users to generate tattoo designs with their own concepts. With this feature, users can teach the model custom concepts by, for example, uploading pictures of their pet, providing a unique name to the concept, and selecting a category that they represent.

We did not fine-tune the model itself. However, our main data source consisted of the images uploaded by users (up to 10 at a time) for the model to learn concepts, which is done by learning an embedding for the concept. For every uploaded batch, we engaged in user-specific data versioning (storing their data in folders based on the email they used to log in) via DVC to track the different stages of our data (raw, processed, output images, and embeddings). We then performed data preprocessing to ensure that the images inputted by users had the square shape that was needed for our ML model. Finally, we use textual inversion to learn the concept.

Textual Inversion Training (Image by Author)

To learn concepts, we train the model on Vertex AI. We start by creating a new token in our vocabulary and initializing its embedding with the selected concept category provided by the user. We then use a set of custom prompt templates such as “A photo of <concept>?” and pair them with the images. Finally, we run gradient descent on a reconstruction loss with respect to the embedding of the concept we are learning. All the training is done on Vertex AI, and we heavily rely on the Hugging Face Libraries diffusers, transformers, and pytorch. Given the computational and memory cost associated with training, we decided to use a quantization technique, where we changed the data type of the model’s weights to float16, reducing the model size by half. In this setting, training on a single Nvidia T4 GPU on Vertex AI for 500 epochs takes around 18 minutes (although we also provide support to multiple GPUs training using the accelerate library).

Our ML pipeline is the following:

Vertex AI Machine Learning Pipeline (Image by Author)

Side Note: A cool thing about our project is that we actually had to train our model for every new image/concept that users provided to the app… but privacy was a huge concern for us! If we uploaded some images of our beloved dog ‘Berta’, we would not want other people to be able to get tattoos of her. Thus, we meticulously stored the weights learned in training for every user in separate folders, so that no one could access tattoo designs of other people’s concepts. Then, when doing inference, all we had to do was provide the right path to the given user’s weights and use them!

2. Other Tech-Stack!

Besides our ML machinery, we also used some common tools for MLOps and Frontend development. Our entire project relies heavily on containerization, and we used Docker Containers throughout all of our different sections such as data versioning, data preprocessing, model training, model serving, and frontend. We also used the Google Cloud Platform for data storage, we used Vertex AI tools to manage our ML project, and we ran our notebooks on Vertex AI Workbench. We used React to build our front-end and we deployed our app using Kubernetes.

User Case Study: How to use our app

If you are thinking about getting a tattoo, but are afraid that the design may not turn out like you hoped for, you can get rid of some of that nervousness by going to our website and trying out a few designs! By writing out explicitly what you want to see, you can keep trying our product until you find the one design that you love.

You first land on our Home Screen, where you have the option to upload your images or go straight to getting some designs of common objects.

Home Page of Ta2Studio Webapp (Image by Author)

Let’s say you want a tattoo of a dog named Canai. All you have to do is click on ‘Upload your Images’ and you land on this page, where you have to complete the three main fields (uploading images, giving the concept a name, and selecting a category that best matches it).

Upload concept images screen (Image by Author)

However, if you have not signed in, you will not be able to proceed. As mentioned before, we really care about the privacy of our users, so signing in allows us to compartmentalize each user’s data in a different directory to ensure no crossover of information. Also, the training process takes around 18 minutes to complete if you upload 5 images, so we use people’s email addresses to send them a notification of the process being ready.

Upon clicking ‘Next’, the ML pipeline gets started, and when the training has finished, users get the following email:

Email sent to confirm that training was successful (Image by Author)

Back in the App, users can now describe how they want their tattoo to look, and upon clicking Next, the inference portion of our pipeline gets started.

Tattoo generation screen (Image by Author)

After waiting for a few seconds, users now get their tattoo designs! And we also give them the option to try out inference again without changing the textual prompt by clicking ‘Try Again,’ to try out inference again but with a new prompt, or to upload more images.

Generated tattoos screen (Image by Author)

Lastly, users can see the images they have uploaded in the Upload History Tab, which is also log-in protected.

User upload history screen (Image by Author)

Future work

While the app already provides all the functionalities that users need to create their own tattoo designs, additional work could help bring the experience to the next level. Firstly, we are hoping to switch to Stable Diffusion XL, a bigger, more powerful model that has been trained on more images, has more parameters, and is thus able to generate images of higher resolution. While this model may take longer to train, getting additional, more powerful computational resources would help. Additionally, we are hoping to add an extra functionality to the app, where we allow the users to choose whether or not they want to share their concepts and images with other people. While some images may be too personal to share with others, some users may be excited to share their concepts with others, so we will consider adding a public library of learned concepts available to all users.

Overall experience

We had a lot of fun working on this project! While building an app while focusing on users’ needs and privacy concerns was challenging, we learned a lot and gained a lot of skills that can be easily applied to our next projects and professional life.

Acknowledgments

We would like to thank Dr. Pavlos Protopapas and the teaching staff of AC215, in particular our TA Boxiang Wang.

References

[1] Schaeffer, K., & Dinesh, S. (2023). 32% of Americans have a tattoo, including 22% who have more than one. Pew Research Center. https://www.pewresearch.org/short-reads/2023/08/15/32-of-americans-have-a-tattoo-including-22-who-have-more-than-one/

[2] Kluger, N. (2019). Insights into Worldwide Interest in Tattoos Using Google Trends. Dermatology, 235(3), 240–242. https://doi.org/10.1159/000496986

[3] Vassileva, S., & Hristakieva, E. (2007). Medical applications of tattooing. Clinics in Dermatology, 25(4), 367–374. https://doi.org/10.1016/j.clindermatol.2007.05.014

[4] Rombach, R., Blattmann, A., Lorenz, D., Esser, P., & Ommer, B. (2022). High-resolution image synthesis with latent diffusion models. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition (pp. 10684–10695).

[5] Gal, R., Alaluf, Y., Atzmon, Y., Patashnik, O., Bermano, A. H., Chechik, G., & Cohen-Or, D. (2022). An image is worth one word: Personalizing text-to-image generation using textual inversion. arXiv preprint arXiv:2208.01618.

Thank you so much for reading!