LDA in Analyzing Modus Operandi

kabelt
Super AI Engineer
Published in
3 min readMar 27, 2021

In a serial killer movie, you would probably see a case where the criminal murdered a number of people, targeting the same victim types, using similar killing patterns. The Methods employed by a criminal to commit a crime is called “modus operandi (M.O.)” which is a latin phrase for mode of operating. It refers to a distinctive pattern of criminal behavior that separates crimes allowing the cases to be recognized as the work of the same person.

In the real world, situational and behavioral information describing a crime are often captured in a form of free text narratives. This data is used in both administration and investigation. It also has the potential to increase the understandings of specific crimes, but the problem lies in their unstructured natures.

The length of these narrative documents can be varied from a few to thousands of pages depending on the complexity of such instances. Analyzing these records is error prone, resource intensive, and highly impractical. This leads to an iceberg problem where the information that can be employed is just structured measurements, such as quantity, location, and time but the real valuable data is left untouched.

Due to the advancement in computer technology, there have been attempts to solve this problem with AI. Recently, a group of researchers from the university of Leeds proposed a new way to automatically analyze free-text data and identify clusters of crime using unsupervised machine learning methods.

They analyzed roughly ten thousand narrative descriptions of residential burglary occurring over a two-year period in a major area of the UK. After the data has been pre-processed, topic modeling algorithm specifically latent dirichlet allocation or LDA is applied. The goal is to determine latent topics within the corpus which corresponds to specific MO. Then the most probable topic is assigned to each report. As a result, the instances are categorized into 21 topics. After that, the classes are defined using the domain knowledge.

To illustrate, topic 5 describes the burglary where mole grips were utilized by offenders in an attempt to gain entry into a property through doors. Topic 9 describes incidents where suspects entered into a property and stole the victim’s car keys and then stole the victim’s car. As for topic 18, entry was gained into property via a window and nothing was taken.

To my knowledge, this research is the first demonstration of the use of topic modeling with AI to identify latent topics from crime relative texts. Given that the unsupervised algorithms applied here incorporate no prior information about the nature of these problems or the domain in which they are applied, this approach has the potential in automating tasks that would otherwise requires considerable crime analyst resource and would likely to be subject to unavoidable biases.

But, if the question is “does it mean that in the future we will no longer need a crime analyst at all?”.

I think the answer depends on how far in the future we are talking about.

Honestly, I don’t have the answer for the long term.

And probably no one does.

But for the short term, while such analytics cannot replace the judgment of human, they can provide means to efficiently pre-process large quantity of data making it more tractable. This approach can surely enable officers to devote their human algorithm to the more complex tasks of understanding specific problems and devising potential solutions for them.

22p26w0031-Kanet

Reference : Birks, D., Coleman, A. & Jackson, D. Unsupervised identification of crime problems from police free-text data. Crime Sci 9, 18 (2020).

--

--