Virtual Caregiver for teenagers with body dysmorphic disorder

8 min readJan 7, 2019

This project was developed with help of Serendeepia Research for psicobotica, in this post are available the results and the source code hosted in GitHub obtained by the students of the Artificial Intelligence master’s degree at the International University of La Rioja. The main aim of this virtual caregiver is to detect emotions in teenager’s facial expressions.

The caregiver is very important the emotion detection via facial gesture recognition because it allows the detection of negative and positive emotions in real time. This process creates a daily report with the patient progress which can be of help for the doctor to implement the appropriate therapies according to the emotional state of the patient.

Introduction to Body dysmorphic disorder

This disorder is defined by Sandoval O, García-Huidobro R., & Pérez-Cotapos S., (2009) as “the excessive and disproportionate worry for a minimum or imaginary defect in the physical appearance.” (p.01) Likewise, the authors show in their paper that the patients must have more social relations and acceptance of himself.

Along this line, it was proposed to use artificial vision to monitor the
patient’s emotions with the end of supporting his psychological care. Because the patient shows his negative emotions when he is in front of the mirror, these expressions can be detected to generate digital alerts.

Context

In this work, we have considered using Deep Learning with supervising because is an of the best in the images processing. Likewise, the problem has two part to be solved the first is a images classification where the model is trained with a dataset that contains the pixels images and labels identifying the emotion of which one, the second part is training a new model with another dataset that contains the key points facial for that the machine can find patterns between 30 attributes of the new dataset and your images.

Training dataset

To start, the procedure that it show involves two standpoints where it training with datasets for Facial Expression Recognition and Facial Keypoints Detection, you can download them here:

These datasets were obtained from Kaggle, more details here. The search of datasets is one of the main works to train the IA models. Without big relevant and meaningful datasets is very hard made a machine learn.

Facial Expression Recognition

Once the dataset is selected and imported into the python notebook, the content must be explored and understood. In this case the emotion dataset content:

As observed in the (Figure 1) the dataset is completely clean, it has labels that identify which are the data of training, of test and of the private test. Likewise, it is noted that the dataset has seven labels of the which the label “Disgust” there are 547 records, this is very little compared with the rest labels and maybe the machine learns bad this example. The dataset must change the format; at present, the information has a dimension incompatible for the model and it must resize for that only have the image with the real size and the emotions column is aside for convert in the labels the model.

Changing the dimensions for the model

This step is very important, as observed in the (Figure 2) was created a matrix with the dimensions compatibles for the model. Without this step, the model doesn’t understand the dataset information.

The model

During the project development, the model was progressing in each delivery; the first of these advances was a neural network with three layers (input layer, dense layer with 48 neurons and output layer); the obtains results were bads because the loss was of 1.6092 and accuracy was of 0.3616; in this case, in the model add 4 hidden layers with 2000 neurons and regularization layers but the model did not improve, the result always was the same. The second advance was create a method for pre-processing the images before of be trained, the image obtain you can observe in the next figure:

**Figure 3.** Pre-processing the images before of be trained

The remove the images features and leave only the that important as observed in the (Figure 3), the training was faster and it was possible to implement a model with 5 layers, two dense with 1024 neurons, another two dropouts layers, and the input layer. The obtains results with the training dataset improved substantially with a loss of 0.0065 and a accuracy of 0.9971 and with the test dataset the accuracy it went up just 4 points with respect the result of the previous model. These results indicate that now there is overfitting in the model, the machine is not learning well and that the basic neural network not solved this project.

Thanks to the previous results, the artificial intelligence expert of Serendeepia advise that the best shape for improved the results is with neural network convolutional beacuse this technique allows applying more filters to extract more image features.

**Figure 4.** Visual representation the one neural network convolutional.

As observed in the (Figure 4), the new focus allows applying more advanced filters that the made in the last delivery; Likewise, in this technique apply more filter layer to layer with space management well. To create this new model take as based a notebook that recognizes all the characters in the animated series the Simpsons that has an accuracy 0.9955 made by David Peña student the artificial intelligence of the UNIR, more details here. With this base, the model is built with 6 convolutional layers, 3 MaxPooling2D layers, 3 Dense layers, 2 Dropout layers and 1 Flatten layer.

To validate that the model is converging well in the notebook, the behavior of precision and loss is plotted as observed in the (Figure 5) with this new config the accuracy goes up to 0.6495 and loss goes down to 0.9475 in the test dataset. This allows us to observe that the model is suitable and it have not any anomalous behavior.

**Figure 6.** Confusion matrix and graphic results.

Now, the confusion matrix is plotted for observing visually the result, the machine can good classify 65% of the dataset but this it is not enough for the virtual caregiver because is necessary that have more than 90% of accuracy for a functioning good. For this reason is created a second model that content the facial key points detection dataset, the idea is to add more features and to build an ensemble for the classification. The combination of the two models is initially based on the average result.

Facial key points detection

In this part, the machine is trained with the image’s pixels for find Pattern of the target variables that contain coordinates of the eyes, nose, and mouth. As measurement first, inspects data to prepare the neural network environment:

The (Figure 7) contains the total quantity of every record, with this information it can verify that in the Facial key points detection dataset there are 31 attributes one of they are the image pixels who must be separated and assigned the training dataset (x), the 30 attributes rest are assigned in the target dataset.

The build and inspect the model

Just like method previous, here it is created a model that train the machine the difference is that this model is little because the result is a quantitative variable.

**Figure 8. Model summary key points facial.**

Train the model

The model is trained for 200 epochs because this training faster than the previous, likewise record the training and validation accuracy in the history object.

To validate this training it was graphed the accuracy and loss where it observes that the model have overfitting face the test dataset, you can observe the result in the (Figure 9).

However, the model was modified in his hyperparameters adding regularization layers, changing the optimizer learning and adding more dense layers but the results were not good. For this reason, the model result was used to generating predictions in his test dataset, you can observe the result in the next figure.

**Figure 10.** Result predictions key point facial model.

Ensemble the two training models (third Phase)

The step last is ensemble the two models for adding more features to the model, according to guiding of the work team it is should use a test dataset in the two models for obtaining the predictions and make an average the loss results. This last procedure was missing because the internships period ended but it can be retaken to continue with the development and value results.

Conclusions

During the course of the project, more and more understanding of the functioning of the neural networks, the obtaining of datasets and the correct reading of the evaluation metrics. This virtual caregiver has been consistent with all the knowledge acquired in the master artificial intelligence because it has addressed topics such as automatic learning, deep-learning, data science, and artificial vision. Among the lessons learned are:

- The error with the train set: it is very very optimistic (it is not used replacement”).

- The error with the test set: it is very optimistic.

- The error with the validation set: it is the most realistic. It is who expresses the real capacity of generalization.

- You must always start with models already created “do not reinvent the wheel”

- Increase the complexity of the models until obtaining the expected results

- Graph the results obtained to have more clear the behavior of the network.

As regards the Virtual Caregiver it has great opportunities of growth and be optimized, is very important guided the project to the test with a webcam because this the gives a lot of dynamism and meets the expectation of caregiver in real time.

Virtual Caregiver for teenagers with body dysmorphic disorder

Facial Expression Recognition

Facial key points detection

Ensemble the two training models (third Phase)

Written by David Peña