Integrating MISP into a malware analysis and collaboration CI/CD pipeline.

Juha Kälkäinen
OUSPG
Published in
5 min readJul 4, 2019
Our document-pipeline at work analyzing a possibly malicious Office document. Check it out here!

The motivation behind this blog-post was to both document and present how I approached solving the problem of how to further integrate Malware Information Sharing Platform (MISP) into our existing malware analysis pipelines. Technically this is part 2 of an older blog post where the first one featured how to install MISP, but if that doesn’t interest you it is not needed to follow this write-up.

Sharing reliable information and threat indicators among information security community has become key element in incident response and threat analysis. A popular platform for doing this inside an organization is MISP, and as such many existing malware analysis frameworks and tools have been integrated into both consuming and producing usable data into the MISP environment. We at the Cincan project have been building our own CI/CD pipelines for both automating some parts of incident response work and making sharing the results easier. Since our goals somewhat overlap with MISP use cases we were interested in both getting deeper understanding on how MISP works and how we could integrate our pipelines into it.

For starters I decided to first add steps into an existing pipeline that makes it possible for an user to input relevant parameters to fetch an attachment from MISP, analyze it and upload results into an MISP event as attributes. Sounds easy enough, but this is where I instantly ran into a problem. As it turns out Concourse, the “container-based continuous thing-doer” we’re using for building our CI pipelines, to the best of my knowledge currently doesn’t support triggering jobs with custom parameters (though there is an existing feature request for it here). After haphazardly reading through the relevant parts of the Concourse documentation and exploring how other people have solved this problem I ended up with two different solutions.

The “correct” way to do this would be to either create my own custom resource that accepts and passes parameters to jobs or use an existing one. The resources we’ve mainly used on our pipelines so far have been Git repositories but making the user to go through uploading parameters either by hand or via script into an repository every time he or she wants to run an analysis on an attachment seemed kinda clumsy.

I also played around building a single task that accepts parameters via the execute command which would then trigger an relevant pipeline, but this approach started to feel like trying to use an hammer to split wood. Sure you could do it, but it would probably be easier to just use an axe.

Concourse does however support passing parameters to pipelines and jobs when you’re first creating them with the fly setup-pipeline command. This function is necessary for passing static variables such as API keys or credentials to a job, but it could probably also be used for my use case.

This solution seemed to work well enough so I proceeded to write a couple Python scripts that uses the PyMISP API to communicate with MISP. Insert those into jobs, combine the jobs with an existing pipeline and what we end up with is a pipeline that accepts an MISP attachment ID and an event ID as a parameter and then uses them appropriately.

Does it work? Well, kinda. The user needs to use fly commands to pass the parameters and each time they change you get a prompt asking if the changes are okay. Makes scripting a tad bit harder and I’m getting this nagging feeling that I’m doing something incorrectly. It’s also not very automatic since you have to pass the parameters by hand.

My second plan of attack for integration was to use ZeroMQ, an asynchronous messaging library, for building an script that uses publish–subscribe pattern for triggering a pipeline whenever relevant attachment appears on the feed. ZeroMQ happens to exist as a plugin in MISP and their Git repository had couple coding examples using the extension so my work was already half done.

Data-flow diagram

For this case clearly the easiest solution for triggering a pipeline would be to automatically upload any relevant attachments into its Git repository. I could then set this to be the trigger for the pipeline to start the analysis. For handling Git repositories with Python I used a handy Python library called GitPython. Right now the script checks for any attachments an event might have and uploads them for the pipeline to use. Since the first step of the document pipeline is to sort files into pdf and doc files, we could relatively easily extend that part into checking other relevant file types as well and pass them based on that output into appropriate pipelines. Finally I added a job to the end of the document pipeline that adds an attribute containing the URL of the repository to the event once the analysis is concluded. Initially the job would enrich the event with multiple attributes containing all outputs of the analysis but this quickly came unfeasible as just the document pipeline contains several logs with hundreds of lines of data.

Script output to MISP

Overall I feel that further work is definitely needed on both approaches. For example one important step to add would be to always check the TLP of an event and handle them appropriately. You can checkout the code here.

If you read this far, thanks! Should you have any comments or questions about the article, our work or really any feedback at all, don’t hesitate to either contact me directly or drop a comment down below.

One of the goals of the Cincan project is to build shareable, repeatable & history preserving malware analysis pipelines using your favorite tools + CI + git + containers.

For more information about this project see our homepage.

--

--