Conclusion

J Rannap
Truth or lie?
Published in
2 min readJan 13, 2020

Detecting deception has been a widespread task for many years as it could be beneficial for many areas (e.g. police, court). In this project EEG data was used for predict deception. Different models were created to detect whether a person is lying or telling the truth: Generalized Linear (Mixed) Models, K-Nearest Neighbours, Linear/Quadratic Discriminant Analysis, Random Forest, Polynomial and Linear Support Vector Machine.

The analysis was preformed on 3 different datasets — whole dataset, high quality dataset and using only means for every user’s run. In all cases KNN obtained the best accuracies. Predictions of deception on original data gave good results — accuracy, precision and recall around 90% for KNN and Random Forest. For higher quality data a triad with good classification indicators emerge. These include KNN, Random Forest and GLMER (mixed models) models. Below a figure of accuracy and F1 score interaction is presented. The F1 score was chosen, as it is the harmonic mean of recall and precision and therefore takes into account False Negatives and also True Positives and accuracy, as it is the main goodenss indicator under consideration. As both measures are on a 0 to 1 scale, the product of these measures also lies on that scale.

Accuracy times F1 score for different models

The overall best classifier for our data was the KNN with one neighbour on the high quality data achieving accuracy of 96.7% (95% CI 95.9%…97.4%) and precision and recall around 95%.

For deception detection we would suggest high quality data with KNN, Ranfom Forest or mixed models (with user and run being viewed as random effects similar to time series analysis) when presented with EEG data. Further development in future work can be done with more parameter tuning and implementation of different cross-validation methods.

Division of sections:

Jürgen Rannap: Research of background information and methodology introduction. Classification with mainly KNN, Ranfom Forest and SVM in R. Overall help in writing other parts of the project with implementations of accuracy and classification figures.

Anne Ott: Overview of the results — interpretations, graphs, formulas. Implementation of classification methods LDA and QDA. Overall help in writing other parts of the project.

Kristiina Uusna: Analyzed data and concluded it in the Data Overview section. Implementation of logistic regression for binary classification. Implementation of SVM algorithm in Python for comparison. Overall help in writing other parts of the project.

--

--