AGATHA — Open Sources
Analysis Intelligent System

David Teixeira
jeKnowledge
Published in
4 min readFeb 23, 2022

The prevention of organized crime is a constant concern for the threat it poses to the well-being, security and trust of citizens and states themselves.

The “Information Age” shows a society dependent on electronic networks and information systems. Information and communication technologies are also today tools and opportunities for criminal activities.

These threats extend to citizens, businesses, governments and critical infrastructure. Cybercrime or criminal acts committed online through electronic communications networks and the use of information systems are increasingly complex and extensive!

The multiplicity and complexity of criminal operations developed using the use of cybercrime goes beyond borders.

Among the practices of greatest concern, involving organised crime, are the trafficking of goods and animals, or even the formation of complex and highly organized networks for trafficking in human beings and the distribution of illegal pornography, including child pornography. Lately, acts of terrorism claimed by transnational organisations, with recognized political, economic and religious performance, present a complex and very difficult to control organization.

The constant complexity, novelty, breadth and severity of these attacks puts even more pressure on investigation teams and crime-fighting structures!

For agents of investigation and criminal intelligence it is very difficult to follow the agility with which the members of these organizations adopt new tools and how they put them at the service of their purposes. It is therefore essential to provide the competent authorities with the control and prevention of organised crime with adequate tools to deal with these recent realities.

Thus the project “AGATHA” — Intelligent system of analysis of sources of information open for surveillance / control of crime, is a platform directed to the criminal investigation police and intelligence services, facilitating the collection of evidence of criminal practices when using the information available in open sources, analyzing them automatically. These sources embrace several areas: social networks, forums, images, information from the blogosphere and other sources of information present on the web, including audio and video sources. Moreover, the platform can ingest files collected or apprehended by the police or intelligence services, benefiting from all analytical capacity.

The developed system has the ability to analyze large amounts of information and extract from it implicit relationships, patterns and stakeholders, among others, through modules dedicated to the analysis of video and image, audio and text in several languages, composed of crawling algorithms and data mining, and Artificial intelligence for the collection of content in a selective and targeted way.

This data collection — web crawler or import, creates copies of the contents to be analyzed and processed, indexing them according to the format, source or address, etc. for search optimization. This data obtained through the crawler is stored in its original form (Raw data) in a dedicated database/repository where its traceability searches and the meta-data associated with the contents are guaranteed.

The solution developed by the project consorcio supports collaborative multilingual analysis of audiovisual content and biometric information, through the application of Visual Analytics methodologies and data mining technologies. On the other hand, integrates database technologies and ETL (Extract, Transform and Load) processes, semantic modeling and machine learning in order to explore the various data to be collected.

A wide range of Features for Surveillance and Crime Control are available in the solutions, such as:

Data acquisition — Gathering information from open sources, through crawling algorithms, data mining and ETL tools;

Audio and voice analysis — Ability to automatically obtain information from audio data sources collected by the data acquisition module;

Multilingual text analysis — Automatic translation for use of information in different languages as well as natural language processing (NLP) techniques to extract automatically knowledge.

Database and Repositories — Storage of all the information coming from the different processing modules (audio, video, image, text), properly indexed to facilitate its reference and/or correlation.

Video and image analysis — Video file characteristics extraction, automatic delimitation of moments and scenes, detection of patterns and their segmentation;

Biometric analysis — Extraction of high-quality 3D face models from low quality video files, to obtain 2D images of those faces for facial recognition applications, with special focus for forensic use.

Classification and semantic segmentation — Segmentation and indexing of contents, for a greater navigation and information crossing, between the different databases and repositories.

Management, organization and visualization — Rules for processing user information requests, retrieving information from the database and allowing visual analysis of large amounts of data.

With these tools, the work of teams and research organizations is simplified, reducing the investigation time and the process of collecting evidences for further investigation.

The Agatha project was supported by COMPETE 2020 under the Incentive System for Research and Business Technology Development in the co-promotion area. For the implementation of the project, a consortium led by FUTURE COMPTA (CBS) was established, which was attended by Voiceinteraction S.A. , Association C.C.G. University of Évora | Center for Innovation in Information Technologies (CITI).

--

--