Creating a Dataset of People Using Masks to Face Recognition Applications

Published in

The Startup

6 min readSep 18, 2020

Everything has changed because of COVID19. The way we live, how we communicate and work, how we travel. Many aspects of our lives have been affected. One of these aspects is wearing face masks, no matter where you are outside, you must wear a face mask. Using masks reduces contamination by COVID19, this way we can protect us and the others. There are a lot of points to consider when you are using masks, some of them are “Place the face mask over your nose and mouth and secure with ties or loops. Make sure the mask fits snugly, molded to your face and around your nose. Make sure the mask fully covers your nose, mouth, and chin.”. Clicking in the underlined text you will find more suggestions on how to use a face mask safely.

Unfortunately, some systems can be affected by using masks, especially facial image processing, like facial recognition, for example. Now we lost some facial features because we have masks covering the nose and the mouth, so actually, processing depends on facial image can’t work properly. How can we deal with that? The first step is to get a dataset with people using masks.

Unluckily we don’t have a dataset with specific people using masks. We need a dataset where we have face images from people properly labeled. Like the typical raw data, we have some input to a specific output. However, now our input is face images from people. Each person must receive a label, this could be their names. So we need something like LFW (Labeled Faces in the Wild Home), that is, folders with people’s face image, each person has their own folder, named using your own names. Once we don’t have it with people wearing face masks, we have to build our own dataset to face recognition of people using face masks.

Basically our goal is building a dataset like LFW, however, we’re gonna have pictures with and without people wearing masks.

Folders with face pictures, note that each person is identified by their names in folder’s names

Pictures that we wish to have of people wearing masks

You can find images of people wearing face masks, but none of them have labels on how we wish. Below we have a lot of pictures of people (artificially) wearing face masks, but there is no label and just one picture to each person.

https://www.pyimagesearch.com/2020/05/04/covid-19-face-mask-detector-with-opencv-keras-tensorflow-and-deep-learning/

Getting people’s face image (not wearing masks) from a dataset

We are using the LFW dataset, you can find it in a lot of resources, but I’m gonna use LFW from Kaggle: https://www.kaggle.com/jessicali9530/lfw-dataset. Once you are there, just download the dataset, and let’s go!

You must have something like:

Extract the files:

Files in a folder downloaded from LFW dataset

Go to lfw-deepfunneled folder, over there you will find folders with people’s names. Each folder has face images of the person in the folder’s name.

We are ignoring folders with less than 19 pictures.

Let’s see some examples:

Folder to be ignored, contain only one image

Folder that can be considered in our dataset, as you can see, we have at least 19 images

The folder above will be in our dataset.

Putting masks artificially

Now that we have a dataset, let’s put on some masks! We are using the algorithm developed by https://github.com/X-zhangyang/Real-World-Masked-Face-Dataset.

Make: git clone https://github.com/X-zhangyang/Real-World-Masked-Face-Dataset

You will have:

Go to the wear_mask_to_face folder. We are gonna use the wearmask.py script, but we need to make some changes.

First of all, you need to change shape_predictor local (line 34) to where it is on your computer. If you are struggling to download this file, just take it here: https://github.com/AKSHAYUBHAT/TensorFace/blob/master/openface/models/dlib/shape_predictor_68_face_landmarks.dat.

Now we are gonna make important changes in the wearmask.py code. Actually, if we have an image input like that below:

the image output is like:

That is good because we have the image face and maybe we don’t need to use any image processing to detect the face. However, if you consider using your own face detector, in this case, we need to adjust the code, so we can put on the mask and keep the original image:

In order to make this, just make some adjusts. After verifying the code, I could see that the with_mask_face matrix (line 127) has the entire image as in the figure above. So, we just need to get the matrix and using it to write a new image.

So, we just need to create two modes, the first one gives the original output and the other keeps the entire image how we wish.

if found_face:
            # align
            src_faces = []
            src_face_num = 0
            with_mask_face = np.asarray(self._face_img)            mode = 'all_image' #two modes, face_image and all_image            if mode == 'face_image':
                for (i, rect) in enumerate(face_locations):
                    src_face_num = src_face_num + 1
                    (x, y, w, h) = rect_to_bbox(rect)
                    detect_face = with_mask_face[y - 0:y + h + 0, x - 0:x + w + 0]
                    src_faces.append(detect_face)
                # 人脸对齐操作并保存
                faces_aligned = face_alignment(src_faces)
                face_num = 0
                for faces in faces_aligned:
                    face_num = face_num + 1
                    faces = cv2.cvtColor(faces, cv2.COLOR_RGBA2BGR)
                    size = (int(128), int(128))
                    faces_after_resize = cv2.resize(faces, size, interpolation=cv2.INTER_AREA)
                    cv2.imwrite(self.save_path, faces_after_resize)             if mode == 'all_image':
                face = cv2.cvtColor(with_mask_face, cv2.COLOR_RGBA2BGR)
                cv2.imwrite(self.save_path, face)

After that, we can generate images of people using masks without cutting a piece of the image to just show the image face.

Now we are gonna choose some people and create a dataset of these people using masks. Put the correct repositories where are the datasets in your computer (lines 227 and 228). I created two ways of making this dataset, the first one (alternate = False, line 239) put the masks after the eighth image. The other (alternate = True) will alternate between put the mask and keep the original image, so you will have something near to 50% of pictures with masks and 50% without masks. You can find below the necessary changes in wearmask.py to make the dataset.

if __name__ == '__main__':
    dataset_path = '/home/diogenes/Desktop/create_face_mask/wear_mask_to_face/dataset'
    save_dataset_path = '/home/diogenes/Desktop/create_face_mask/wear_mask_to_face/save_face_masks'
    xor = False    for root, dirs, files in os.walk(dataset_path, topdown=False):        new_root = root.replace(dataset_path, save_dataset_path)
        if not os.path.exists(new_root):
            os.makedirs(new_root)        for i,name in enumerate(files):            alternate = False            if alternate == False:                if i >= 8:
                    new_root = root.replace(dataset_path, save_dataset_path)
                    imgpath = os.path.join(root, name)
                    save_imgpath = os.path.join(new_root, name)
                    cli(imgpath,save_imgpath)                else:
                    aux = cv2.imread(root+'/'+name)
                    cv2.imwrite(new_root+'/'+name,aux)             else:                 if i >= 0:
                    xor = not xor
                    if xor == True: 
                        new_root = root.replace(dataset_path, save_dataset_path)
                        imgpath = os.path.join(root, name)
                        save_imgpath = os.path.join(new_root, name)
                        cli(imgpath,save_imgpath)                    else:
                        aux = cv2.imread(root+'/'+name)
                        cv2.imwrite(new_root+'/'+name,aux)