Real-Time Face Recognition Using Pinecone DB, OpenCV, and Face Recognition Library (Part 1) : Image search in Pinecone using image vector embeddings

4 min readJul 16, 2024

Face recognition using Pinecone vector database

In this tutorial, we will build a real-time face recognition system using Pinecone DB to store image vectors, OpenCV for image processing, and the face_recognition Python library for face detection and encoding. Below is a step-by-step guide to setting up the environment, uploading data, and testing face recognition.

Step 1: Import Libraries and Set Up Environment

First, we need to import the necessary libraries. These include os for handling directory paths, Pinecone and ServerlessSpec for interacting with Pinecone DB, tqdm for progress bars, cv2 for image processing with OpenCV, numpy for numerical operations, face_recognition for face detection and encoding, datetime for handling date and time and warnings to ignore warnings.

pip install pinecone pinecone-client tqdm face-recognition warnings opencv

import os
from pinecone import Pinecone
from pinecone import ServerlessSpec
from tqdm.auto import tqdm
import cv2
import numpy as np
import face_recognition
from datetime import datetime
import warnings
warnings.filterwarnings('ignore')

Step 2: Load and Prepare Images

To work with a dataset, you need to download the LFW (Labeled Faces in the Wild) dataset from here https://www.kaggle.com/datasets/jessicali9530/lfw-dataset and extract the zip file. This will provide you with a path (eg. <your_path>/lfw-deepfunneled/lfw-deepfunneled) containing images of faces of people inside folders with their names, which we will use for face recognition. You can create a folder with your own image as well with <your_name> and then rename your image as <your_name>_0001.jpg inside the folder.

We load images from the specified directory. The curr variable gets the current directory, and path specifies the path to the dataset. We then read all images in the directory and store them in the images list, while the corresponding names are stored in the names list.

curr = os.curdir
path = curr + '/lfw-deepfunneled/lfw-deepfunneled'
images = []
myList = os.listdir(path)
names = []
for cl in myList:
    curImg = cv2.imread(f'{path}/{cl}/{cl}_0001.jpg')
    print(curImg)
    images.append(curImg)
    names.append(os.path.splitext(cl)[0])
print(len(names))
print(len(images))

Step 3: Initialize Pinecone and Create an Index

We initialize Pinecone by creating a Pinecone object with the API key from environment variables. We define the index name as img. If an index with the same name exists, it is deleted. We then create a new index with 128 dimensions and the cosine metric.

pc = Pinecone(api_key=os.environ.get("PINECONE_API_KEY"))

index_name = "img"
if index_name in pc.list_indexes():
    pc.delete_index(index_name)
pc.create_index(name=index_name, dimension=128, metric="cosine", spec=ServerlessSpec(
            cloud='aws', 
            region='us-east-1'
        ))
index = pc.Index(index_name)

Step 4: Encode Faces and Prepare Data for Pinecone

Next, we define a function createData to encode the faces. The function converts images from BGR to RGB and encodes faces using face_recognition. It then creates a list of upsert dictionaries containing the ID, encoding values, and metadata.

def createData(images):
    upserts = []
    encodeList = []
    for img in tqdm(images):
        img = cv2.cvtColor(img, cv2.COLOR_BGR2RGB)
        encode = face_recognition.face_encodings(img)[0]
        encodeList.append(encode)
    ids = [str(x) for x in range(len(names))]
    for _id, encoding, _name in zip(ids, encodeList, names):
        upserts.append({
            'id': _id,
            'values': encoding,
            'metadata': {"name": _name}
        })
    return upserts
upserts = createData(images)
print(upserts[0])

Step 5: Upsert Data into Pinecone

We then upload the encoded face data to the Pinecone index. This step involves calling the upsert method on the index with the list of upsert dictionaries created in the previous step.

index.upsert(upserts)

Step 6: Test Face Recognition

Finally, we test the face recognition system. We load an image for testing, convert it to RGB, and encode the face. We then query Pinecone with the encoded face to find the closest match and print the result with metadata. You can use any image of any person from the dataset and query the database to test the image recognition accuracy.

path = curr + '/lfw-deepfunneled/lfw-deepfunneled'
curImg = cv2.imread(f'{path}/Aaron_Tippin/Aaron_Tippin_0001.jpg')
img = cv2.cvtColor(curImg, cv2.COLOR_BGR2RGB)
encode_query = face_recognition.face_encodings(img)[0]
result = index.query(
    top_k=1,
    vector=encode_query.tolist(),
    include_metadata=True,
)
print(result['matches'][0])

Complete Code:

import os
from pinecone import Pinecone
from pinecone import ServerlessSpec
from tqdm.auto import tqdm
import cv2
import numpy as np
import face_recognition
from datetime import datetime
import warnings
warnings.filterwarnings('ignore')

curr = os.curdir
path = curr + '/lfw-deepfunneled/lfw-deepfunneled'
images = []
myList = os.listdir(path)
names = []
for cl in myList:
    curImg = cv2.imread(f'{path}/{cl}/{cl}_0001.jpg')
    print(curImg)
    images.append(curImg)
    names.append(os.path.splitext(cl)[0])
print(len(names))
print(len(images))

pc = Pinecone(api_key=os.environ.get("PINECONE_API_KEY"))

index_name = "img"
if index_name in pc.list_indexes():
    pc.delete_index(index_name)

# Create the new index with 128 dimensions and cosine metric
pc.create_index(name=index_name, dimension=128, metric="cosine", spec=ServerlessSpec(
            cloud='aws', 
            region='us-east-1'
        ))
# Connect to the newly created index
index = pc.Index(index_name)

def createData(images):
    upserts = []
    encodeList = []
    # Encode the images
    for img in tqdm(images):
        img = cv2.cvtColor(img, cv2.COLOR_BGR2RGB)
        encode = face_recognition.face_encodings(img)[0]
        encodeList.append(encode)
    ids = [str(x) for x in range(len(names))]

    # loop through the data and create dictionaries for uploading documents to pinecone index
    for _id, encoding, _name in zip(ids, encodeList, names):
        upserts.append({
            'id': _id,
            'values': encoding,
            'metadata': {"name": _name}
        })
    return upserts
upserts = createData(images)
print(upserts[0])

#Upsert data into Pinecone
index.upsert(upserts)

# Code to test if the images were ingested correctly
path = curr + '/lfw-deepfunneled/lfw-deepfunneled'
curImg = cv2.imread(f'{path}/Aaron_Tippin/Aaron_Tippin_0001.jpg')
img = cv2.cvtColor(curImg, cv2.COLOR_BGR2RGB)
encode_query = face_recognition.face_encodings(img)[0]
result = index.query(
    top_k=1,
    vector=encode_query.tolist(),
    include_metadata=True,
)
print(result['matches'][0])

Output:

{'id': '4',
 'metadata': {'name': 'Aaron_Tippin'},
 'score': 1.00000012,
 'values': []}

This concludes Part 1 of the tutorial. In the next part, we will explore how to use this example to create a real-time face recognition application using video capture from cv2and face_locationsfrom face_recognition library.
Let me know about any issues or error you might come across in the comments and I’ll fix them. Stay tuned for Part 2!