Gemini, the stylist…

A personalized haircut recommendation Application

Abirami Sukumaran
Google Cloud - Community
9 min readJan 9, 2024

--

Before I begin: The opinions, ideas, and examples in this blog are entirely my own, shaped by personal interests and experiences. They do not reflect the views or purposes of the products, my employer, any other company, or individuals. This experiment is purely driven by my curiosity and passion. The context and image samples provided are intended for educational purposes only and should not be considered within a commercial context. The background is curated based on my knowledge and gathered information about hairstyles, making the app’s responses highly subjective. As a result, this app has not been deployed, emphasizing its experimental nature.

Introduction

I was bored and needed a haircut. Decided to Gemini my way through the new look. Like me, are you an AI enthusiast who is tired of the same old hairstyle and looking for a fresh change in a fresh year? Imagine having an AI-powered stylist at your fingertips, ready to recommend the perfect haircut based on your face shape.

What are we building?

In this blog post, we’ll explore a Python program that leverages the power of Gemini Pro Vision and Imagen 2 to generate personalized haircut suggestions. From identifying face shapes to suggesting hairstyles, this approach combines the strengths of text and image generation models. Let’s dive into the technical details behind this AI-driven makeover!

How are we building?

Gemini Pro Vision generative model!!! It is Google’s multimodal generative AI model that accepts text, images and videos as input and generates text in response. We will use

  1. Google Cloud’s Vertex AI Gemini API that provides a unified interface to interact with the Gemini models. In order to invoke the Vertex AI Gemini API, we will use the Vertex AI SDK for Python.
  2. Imagen 2 to generate images (of models sporting hairstyles) based on the prompt that is generated as your recommendation
  3. I have created a knowledge base pdf as context, based on publicly available articles on haircuts for face shapes. I have used Imagen 2 generated images for sample face shape outlines. Sources are listed here with links to references.

Why Gemini?

Great question! I like the fact that I am able to send multi-modal input (text, images and more text) as input all in the same request and get the response in a format that I am able to even programmatically control. For instance, JSON. Of the many things I like about Gemini, this one stood out during this experiment.

So let’s go?

Install the Vertex AI SDK for Python as shown in the doc and authenticate your account.

#Install 
!pip install --upgrade google-cloud-aiplatform
from google.cloud import aiplatform
import vertexai.preview

#Authenticate
from google.colab import auth
auth.authenticate_user()

#Restart Kernel
import IPython
app = IPython.Application.instance()
app.kernel.do_shutdown(True)

Install and import dependencies

from __future__ import annotations
from tenacity import retry, stop_after_attempt, wait_random_exponential
from google.api_core.exceptions import ResourceExhausted
from google.api_core.client_options import ClientOptions
from google.api_core.exceptions import AlreadyExists
import numpy as np
import glob
import os
from typing import Dict, List
import pandas as pd
from logging import error
import re
import textwrap
from typing import Tuple, List
import vertexai
from vertexai.language_models import TextEmbeddingModel, TextGenerationModel
from vertexai.preview.generative_models import GenerativeModel, Image
!pip install PyPDF2
from PyPDF2 import PdfReader

Set your Project ID and REGION variables:

#Set PROJECT_ID and REGION variables
region = "us-central1"
project_id = "abis-345004"

#Vertex AI Init
vertexai.init(project=project_id, location=region

Let’s copy the pdf that contains the knowledge base / context to the current working directory. I have made this available in the repo, so you can store it wherever you can access.

# Copying the files from the GCS bucket to local storage
!gsutil -m cp -r gs://------/Hairstyle/FaceShapeAndSuggestions.pdf .

Create a PDF Reader object and extract the content to a String. Remember for larger knowledge bases, you can use Document AI and use chunking or other method to extract information from a file.

# creating a pdf reader object
reader = PdfReader('FaceShapeAndSuggestions.pdf')

# printing number of pages in pdf file
print(len(reader.pages))

# getting a specific page from the pdf file
page = reader.pages[0]

# extracting text from page
text = page.extract_text()
extracted_string = text

Instantiate Vertex AI generative model gemini-vision-pro object:

generation_model = GenerativeModel("gemini-pro-vision")

Load sample images that I created with Imagen 2 for face shape outlines. This is by choice and is an optional step. This is linked in the repo as well. In the below snippet I have used a curl command to download the file from Cloud Storage. Alternatively, you can access the sample files from your working directory as well.


!curl -o image.jpg --silent --show-error https://storage.googleapis.com/-----/Hairstyle/shapes/round_female1.JPG
image_female_round1 = Image.load_from_file("image.jpg")
!curl -o image.jpg --silent --show-error https://storage.googleapis.com/-----/Hairstyle/shapes/heart_female1.JPG
image_female_heart1 = Image.load_from_file("image.jpg")
!curl -o image.jpg --silent --show-error https://storage.googleapis.com/-----/Hairstyle/shapes/square_female1.JPG
image_female_square1 = Image.load_from_file("image.jpg")
!curl -o image.jpg --silent --show-error https://storage.googleapis.com/-----/Hairstyle/shapes/oval_female1.JPG
image_female_oval1 = Image.load_from_file("image.jpg")
!curl -o image.jpg --silent --show-error https://storage.googleapis.com/-----/Hairstyle/shapes/oblong_female1.JPG
image_female_oblong1 = Image.load_from_file("image.jpg")
!curl -o image.jpg --silent --show-error https://storage.googleapis.com/-----/Hairstyle/shapes/triangle_female1.JPG
image_female_triangle1 = Image.load_from_file("image.jpg")
!curl -o image.jpg --silent --show-error https://storage.googleapis.com/-----/Hairstyle/shapes/rectangle_female1.JPG
image_female_rectangle1 = Image.load_from_file("image.jpg")
!curl -o image.jpg --silent --show-error https://storage.googleapis.com/-----/Hairstyle/shapes/diamond_female1.JPG
image_female_diamond1 = Image.load_from_file("image.jpg")
!curl -o image.jpg --silent --show-error https://storage.googleapis.com/-----/Hairstyle/shapes/round_male1.JPG
image_male_round1 = Image.load_from_file("image.jpg")
!curl -o image.jpg --silent --show-error https://storage.googleapis.com/-----/Hairstyle/shapes/heart_male1.JPG
image_male_heart1 = Image.load_from_file("image.jpg")
!curl -o image.jpg --silent --show-error https://storage.googleapis.com/-----/Hairstyle/shapes/square_male1.JPG
image_male_square1 = Image.load_from_file("image.jpg")
!curl -o image.jpg --silent --show-error https://storage.googleapis.com/-----/Hairstyle/shapes/oval_male1.JPG
image_male_oval1 = Image.load_from_file("image.jpg")
!curl -o image.jpg --silent --show-error https://storage.googleapis.com/-----/Hairstyle/shapes/oblong_male1.JPG
image_male_oblong1 = Image.load_from_file("image.jpg")
!curl -o image.jpg --silent --show-error https://storage.googleapis.com/-----/Hairstyle/shapes/triangle_male1.JPG
image_male_triangle1 = Image.load_from_file("image.jpg")
!curl -o image.jpg --silent --show-error https://storage.googleapis.com/-----/Hairstyle/shapes/rectangle_male1.JPG
image_male_rectangle1 = Image.load_from_file("image.jpg")
!curl -o image.jpg --silent --show-error https://storage.googleapis.com/-----/Hairstyle/shapes/diamond_male1.JPG
image_male_diamond1 = Image.load_from_file("image.jpg")

INPUT IMAGE: Load the image to personalize a recommendation for you.

image = Image.load_from_file("<<Replace it with your file.JPG>>")
image

Below is my image as I want my personalized haircut recommendation:

Input Image (my selfie)

INPUT TEXT: Assign the context to a variable and draft your prompt string.

context = extracted_string
question = "From the context and the sample images above, categorize the closest face shape of the face in the image below into one of the following categories: round, diamond, heart, pear, oblong, square, rectangle, triangle. Even if it is not exactly matching entirely, choose the shape only from the categories listed that is close enough. For this identified shape, from the context, suggest top 3 most suited haircuts in 2 categories (name them as male identifying and female identifying individuals) from the context provided. formulate a text prompt that would help generate 3 images for the 3 most suited styles identified in the response."

prompt = [
"Here is the context: " + context,
"Here are the sample images for the context: ",
"Round shape image of a female identifying individual: ", image_female_round1,
"Heart shape image of a female identifying individual: ", image_female_heart1,
"Square shape image of a female identifying individual: ", image_female_square1,
"Oval shape image of a male or female identifying individual: ", image_female_oval1,
"Oblong shape image of a female identifying individual: ", image_female_oblong1 ,
"Triangle shape image of a female identifying individual: ", image_female_triangle1 ,
"Rectangle shape image of a female identifying individual: ", image_female_rectangle1 ,
"Diamond shape image of a female identifying individual: ", image_female_diamond1 ,
"Round shape image of a male identifying individual: ", image_male_round1,
"Heart shape image of a male identifying individual: ", image_male_heart1,
"Square shape image of a male identifying individual: ", image_male_square1 ,
"Oblong shape image of a male identifying individual: ", image_male_oblong1 ,
"Triangle shape image of a male identifying individual: ", image_male_triangle1 ,
"Rectangle shape image of a female identifying individual: ", image_male_rectangle1 ,
"Diamond shape image of a male identifying individual: ", image_male_diamond1 ,

question,
image,
"Return the response in JSON format"
]

Prompt

Prompt is a combination of multi modal inputs:

Context + Sample Face Shape Images + Below Text + Input Image + text that prompts the response to be in JSON format.

“From the context and the sample images above, categorize the closest face shape of the face in the image below into one of the following categories: round, diamond, heart, pear, oblong, square, rectangle, triangle. Even if it is not exactly matching entirely, choose the shape only from the categories listed that is close enough. For this identified shape, from the context, suggest top 3 most suited haircuts in 2 categories (name them as male identifying and female identifying individuals) from the context provided. formulate a text prompt that would help generate 3 images for the 3 most suited styles identified in the response.”

As you can notice above, I have provided context, added sample images and also added a line in the prompt to return the response in JSON format. The prompt is very detailed because:

  1. We need the personalized recommendation
  2. We want to get a prompt as part of the response so we can use it as input for my IMAGE GENERATION model request
  3. We want the response in a specific format
  4. And it’s my haircut we are talking about. :P

Let’s run the prompt:

responses = generation_model.generate_content(prompt
,
generation_config={
"max_output_tokens": 2048,
"temperature": 0.1,
"top_p": 1,
"top_k": 32
},
stream=False,
)
response_str = responses.text.replace("```json", "")
response_str = response_str.replace("```", "")
print(response_str)

Here is the printed response:

{
"face_shape": "oval",
"male_identifying_individuals": [
"Angular Fringe",
"The quif",
"The Side Part"
],
"female_identifying_individuals": [
"The quif",
"Side Part",
"Straight Long Hair"
],
"text_prompt": "A photo of a person with an oval face shape. The person should be smiling and have their hair styled in one of the following ways: Angular Fringe, The quif, The Side Part, Straight Long Hair."
}

Response with recommended hairstyles:

{
“face_shape”: “oval”,
“male_identifying_individuals”: [
“Angular Fringe”,
“The quif”,
“The Side Part”
],
“female_identifying_individuals”: [
“The quif”,
“Side Part”,
“Straight Long Hair”
],
“text_prompt”: “A photo of a person with an oval face shape. The person should be smiling and have their hair styled in one of the following ways: Angular Fringe, The quif, The Side Part, Straight Long Hair.”
}

As you can see the response is in JSON format as I had prompted. The reason for this is so I can handle the output uniformly and use it as input to the next step which is image generation for the recommended styles.

Let’s initialize a JSON object to store and process the text_prompt variable:

import json
#convert string to object
json_object = json.loads(response_str2)
imagen_string = json_object["text_prompt"]
imagen_string

In the above step, we are extracting the value of the text_prompt JSON field and passing it to imagen_string variable. The reason we are doing this is so we can use this string as prompt input for image generation.

For Image Generation, we are going to use Imagen 2. Let’s import dependencies and get started:

from vertexai.preview.vision_models import ImageGenerationModel

image_generation_model = ImageGenerationModel.from_pretrained("imagegeneration@005")

Let’s set the imagen_string variable value as the prompt and invoke Imagen 2 API:

prompt = imagen_string

response = image_generation_model.generate_images(
prompt = prompt,
number_of_images=3,
seed = 1000
)
try:
response.images[0].show()
response.images[1].show()
response.images[2].show()
except:
print("")
AI generated images for haircut samples

What I went with!

You know our stylists. They try to override AI. But I stood my ground (well Gemini’s ground) as much as I could and got a happy medium. Check out my final look below. Yes I really did it, not joking. :))

The hairstyle I actually got!

My AI-driven makeover isn’t just a cut, I guess it’s a statement now. Let this code be your guide to a hairstyle 💇‍♀️ that’s as unique as you are 😝. That aside, remember, this is only for fun and to learn the capabilities of Gemini and Imagen, and bringing them together in such generative AI applications.

If you enjoyed this, why don’t you let me know / register if you are thrilled for the upcoming Duet AI Road Show and Code Vipassana Duet AI season events?

--

--

Abirami Sukumaran
Google Cloud - Community

Developer Advocate Google. With 16+ years in data and software dev leadership, I’m passionate about addressing real world opportunities with technology.