Your Cyber Threat Intelligence Knowledge in a Magic Box
The time has come to forget the old ways to store, organize and share knowledge about cyber threats, indicators of compromise & field observables. We are proud, as a non profit organization and as cybersecurity professionals, to finally reveal OpenCTI version 4, after 8 months of tremendous collective work by the core development team. The demonstration instance has been migrated as well.
Why this version is a major breakthrough
When we released the first version of OpenCTI more than a year ago, we were convinced that the CTI community lacked an effective tool to organize not only technical knowledge of cyber threats but also of TTPs, victimology, contextual data, etc. One year on, it has proven to be true but we also learned a lot from the community about the needs and the expectations of CTI/SOC/DFIR teams.
Deciding to break compatibility and rewrite the schema was a difficult decision (data from version 3 to version 4 must be migrated through a dedicated script), especially because only few resources are currently available to work on the core platform. But this is definitely the right path in order to be able to achieve our goals and to solve the challenges our users are currently facing (this will not happen again a foreseeable future).
In OpenCTI version 4, we have tackled a lot of them:
- Understandability and easiness of use of the data schema/model (including migration to STIX 2.1 and better compatibility).
- Full modelization of STIX Cyber Observables for artifacts/technical elements representations.
- Scalability to billions of entities and relationships.
- Performance of ingestion process & connectors consumption.
- Synchronization in real time between multiple OpenCTI platforms to enable easier knowledge sharing.
- Unique management of uniqueness/duplicates with a proper management of STIX IDs (and automatic merging / deduplication).
- Ingestion process monitoring to better understand what’s going on on the platform.
- A lot more connectors, for knowledge import or data consuming.
Main major enhancements (among others)
OpenCTI version 4 source code is a completely rewritten application (more than 70% of the source code has been changed, whether in the API, the Frontend or the Python library).
Our stack has also changed. We decided to temporarily switch to Elasticsearch as our main storage instead of Grakn, but we are still committed and believe that Grakn is the long term storage engine for OpenCTI’s roadmap. For this new version, we require the maturity and performance that Grakn 2.0 will bring, but while this is still in development, we found it best to temporarily switch to Elasticsearch. We’re still working very closely with Grakn and our teams are heavily aligned.
New data schema for seamless integration with STIX 2.1
To be able to work properly, we designed a new schema for OpenCTI data, as close as we could to the STIX 2.1 standard. Only a few attributes/entities are different. The real goal was to store “pure” STIX 2.1 and prefix all deviations with “x_opencti_”.
Full management of STIX Cyber Observables
Another major change in OpenCTI version 4 is the full modelization of STIX Cyber Observable with all attributes proposed by the STIX 2.1 standard.
This modelization allows us to better manage uniqueness of them and we successfully manage to fully handle all file entities with different hashes, to automatically merge similar files together (attributes, relationships, hashes, etc.) when they are enriched or modified.
Scalability and performances of the ingestion process
One of the most trying challenge our users faced in previous versions of the platform was the performances of the ingestion process, especially when a lot of different import connectors were launched.
We reworked almost all parts of the ingestion process, starting with the management of STIX bundles (with now a full recursive method to split and order bundles), then with a rewrite of the PyCTI library itself to avoid useless checks before creation and finally by handling creation on the API side in one single transaction (for all potential marking definitions, external references, etc.).
The removing of Grakn from the global stack has drastically enhanced the performances. To benchmark the OpenCTI platform, we run every night a full OpenCTI stack to ingest the same dataset against the latest build of the OpenCTI platform in our performance tests environment (only 1 worker).
Events stream & synchronization
OpenCTI is a knowledge platform, meaning that the stored data is regularly changing because of connectors or manual actions. We decided to implement a “STIX 2.1” events stream in Redis, for it to be used by a new type of connectors. Multiple new connectors are now using the Redis Stream to consume events and do something:
- The history connector, which writes the logs of entities & relationship.
- The synchronizer connector, which consumes a remote OpenCTI stream to synchronize with a local OpenCTI instance.
- The Tanium connector, which synchronize indicators & observables to Tanium Threat Response Detect Engine.
And so many to come (SIEM sync, SOAR integration, alerting, etc.)… OpenCTI is the first TIP to provide such real time events feeds, with a modern implementation of Server-Sent-Event (SSE).
Give the demonstration instance a try, open the link to the SSE endpoint and a in a new tab, just do something (creating, updating or deleting knowledge).
Management of duplicate entities & automatic merging
One of the major challenges we are facing is to be able to maintain consistent data as number of knowledge sources are increasing. We decided to completely re-think our management of STIX IDs, using predictable IDs based on transparent rules. This allows us to remove all useless code about trying to avoid duplicate before creating entities, updating existing entities etc.
When an entity or a relationship is created and appears to already exists based on the rules we defined (STIX 2.1 standard only provides rules for STIX Cyber Observable but not for all possible entity types), depending on the configuration, data is updated or merged automatically, to keep a really useful, context-oriented and consistent knowledge base.
Ingestion process monitoring
One other very important challenge we had to tackle was the ingestion process. We implemented the following changes:
- Dramatically enhancement of the ingestion performances.
- Supporting multi-workers without any errors or race conditions.
- Monitoring import/ingestion jobs from connectors or manual imports.
You can now follow the ingestion process of each connector, for the different works initiated:
Here is the workflow for monitoring connector jobs throughout the whole infrastructure, starting with a “work” divided in multiple jobs, and then reporting the status and potential error for each of them:
OpenCTI version 4 is just the beginning of the first full-featured Open Source Threat Intelligence Platform. Our next priorities are:
- integrate properly OpenCTI with SOC and SIEM infrastructures ;
- allow analysts to follow / engage on the platform directly ;
- develop dashboards & advanced data visualization capabilities ;
- enhance graph & investigation features ;
- introduce data science & machine learning algorithms.
Please, join us on our #Slack channels if you need any more information on OpenCTI version 4. We wish you many fruitful investigations & hunting!