OpenCTI platform performances

Published in

Filigran Blog

6 min readMar 16, 2021

Explanation and metrics

Performances are one of the main concerns / questions we have to address on a daily basis. All the users of the platform, at some point, wish to know how many inserts per seconds OpenCTI supports, how much time it will take to ingest 1 billion of cyber threat intelligence elements. And as any other products, it turns to be tricky to have a unique and simple answer, despite our continuous efforts to monitor and enhance ingestion speed.

Well, it depends…

This is the most honest answer and obviously the most annoying… The performances of OpenCTI are related to many concepts.

As you can see on the diagram above, there are several components involved in the OpenCTI ingestion workflow, and each of them can have some impact on the performances of the platform. You will notice, however, that all of these elements are highly scalable, and that multiple instances can be launched for both RabbitMQ, the GraphQL API and the workers.

Sources and connectors

About sources and connectors, the performances will depend on the remote systems APIs/feeds and on the connector implementation. Some connectors are pulling external services APIs on a regular basis and then convert the received data into STIX 2.1 JSON bundles, others will consume real time feeds with data already formatted using the STIX standard.

So for this part, the speed of the consumption relies on the remote services or third-party applications capabilities.

Messaging system

After data is converted to STIX 2.1 JSON bundles, connectors send them to the RabbitMQ messaging system. It’s important to understand that the bundles are actually splitted into tiny chunks to enhance parallelism and speed during the ingestion by the workers. So if you try to ingest 200 MISP events, you may have thousands of messages stored in the queues.

RabbitMQ is very scalable and has been written in Erlang so its throughput corresponds to the most demanding use cases. It should not be a problem but you have to be careful to size it correctly depending of the performances you will observed on OpenCTI side. Do not hesitate to take a look at its management console.

Workers

The workers are very basic stateless Python processes which consume STIX 2.1 messages (chunks) from the queues and create the corresponding elements in the OpenCTI platform using the GraphQL API. Basically, the ability for the workers to write data efficiently is linked to the OpenCTI capacity to quickly absorb it.

You also need to know that one STIX 2.1 message in the queue can lead to 1 to 10 elements creation in OpenCTI (authors, labels, external references, etc.).
So when you look at the metric of messages/sec in the data integration screen, if you see 10 messages/sec it's more around 100 write operations per second.

Monitoring of the messaging queues in the OpenCTI interface

OpenCTI platform

On the OpenCTI side, performances may be impacted by 2 different aspects.

The ElasticSearch database performances

OpenCTI relies on ElasticSearch to store the data. The most demanding users in the community in terms of performances have no issues using OpenCTI to ingest thousands of elements by hour but they have properly sized their ElasticSearch cluster. If you have specific expectations about ingestion speed, you should think about deploy a production-ready cluster with enough CPU/RAM and if possible use SSD storage. This kind of setup will drastically improve the overall OpenCTI performances.

Example of a production cluster on AWS

2. Data management

OpenCTI is not “just” a STIX 2.1 repository without intelligence. The platform performs a lot of processing on the ingested data, ensuring identifiers consistency, relationships deduplication, on-the-fly merging of elements such as file hashes, etc. For this reason, the performances will depend on the data sources and the quality of the knowledge itself (relationships, tags, etc.).

File observable automatically merged by the VirusTotal connector

This data management is the one of the main aspects which could impact the performances. If you are using a lot of sources with a low amount of relationships between elements, it will be incredibly fast. If you use only one source with every data related to each others it will be the worst case.

Performances monitoring and daily metrics

Looking for better performance has always been a top priority since the first OpenCTI release. But designing better architectures and writing more effective source code is only possible if you proactively monitor the performances of a product in realistic conditions. To do so, we have implemented a performance agent which executes every night an ingestion scenario with 3 different profiles and graph the result in a Kibana dashboard.

The Kibana dashboard is publicly accessible so don’t hesitate to consult it!

Evolution of the scenario (profile 1) ingestion time over the last 50 versions of OpenCTI

The chosen scenario is simple and and should answer a basic question: how much time is needed to ingest the data from the OpenCTI datasets connector and the MITRE ATT&CK connector in OpenCTI? As this scenario runs, the agent collects as much metrics as possible along the way: read/second, write/second, memory usage, average time by type of query, etc.).

The 3 different profiles are just representing the variation of the number of workers: 1, 2 and 5 and they are all executed in the same environment.

Bare Metal server: CPU 12 cores - RAM 64GB - SSD
Runner: Drone.io, all elements deployed in docker

Most of the OpenCTI dependencies are started with the default options. We have just:

increased a bit the default allocated memory of ElasticSearch ;
forced the garbage collector of OpenCTI to be executed more often (to ease the detection of potential memory issues).

Elasticsearch: single-node ES_JAVA_OPTS: -Xms6G -Xmx6G
OpenCTI: --optimize_for_size --always_compact

Some insights on the 3 profiles metrics

Based on our explanation in the “Well, it depends…” section this is now easier to understand our result for the different profiles:

This picture represents the total duration of the scenario for the 3 different profiles. You can see the limitation about data management/dependencies on this result. Indeed, moving from 1 to 2 workers with the same amount of data has increased the performance by 2. But moving from 1 to 5 workers only increase the performance by 3.

OpenCTI memory usage

To finish this article, we have recently added the memory profile of the application in our performances dashboard. This information is now available and will be tracked for every future releases.

Memory profile of the application over the last releases

As you can see above, the platform is using approximately 80 MB of memory with a great stability. But obviously the memory consumption will also depend on the number of workers, the nature of your data, etc.

Conclusion

Performance is an important aspect of every piece of software and we will continue to improve it at each release. We hope this article has given to you a better understanding of the platform and explained a bit why we always answer “it depends”.