Introducing OpenUBA: an open source user behavior analytics platform powered by the scientific computing ecosystem

Jovonni L. Pharr
Jun 3 · 4 min read
Image for post
Image for post

Visit www.openuba.org for more detail, but I would love your feedback on this work, and all inquiries to help are welcome ❤️

I have had a chance to witness countless UBA tools hit the market, raise funding, serve advertisements, and present at conferences across the country.

However, one thing still remains — UBA vendors keep their models black boxed. I have yet to see a truly “open model” UBA platform that is built on the countless programmer hours put into the popular scientific computing ecosystem. UBA vendors constantly attempt to market advanced AI capabilities being performed on security data ingested by their customer. However, since I am a data scientist, I have pondered so much on these “models” running behind the scenes. I even have had several occasions to slightly peek under the hood on many UBA products, although slightly peeking is cumbersome by itself, and limited.

Even if you look at the open source behavior analytics projects, they all have an emphasis on security, integrations, etc, and far less emphasis on data science/modeling.

It’s like open source UBA tools want to perfect everything else besides actual modeling

It was clear to us before, but now we keep hearing this from developers wanting to use OpenUBA — so, we assume the problem exists for many others.

Problem statement

UBA tools expose “black-box” modeling capabilities, and actually hinder meaningful, and advanced model development. These tools also do not extend the usage of the scientific computing community, therefor missing out on substantial user adoption/usage for the sake of protecting their “IP”.

Background

In 2018, after realizing UBA tools were not going to just magically become more open, I began ideating a purely open source UBA platform built using on real data science disciplines.

I started my corporate career in 2015 working for Research & Development with a fortune 50 company. In R&D, we specialized in building bleeding-edge proof of concepts on emerging technologies. This is where I was able to become introduced to several new technologies (blockchain, IoT, DevOps, Modern Embedded Systems, ML/DL, etc). Before joining corporate America, my colleagues, and I ran a successful technology development firm in downtown Atlanta, GA. During those ~4 years, we developed pretty innovative systems for brands, and celebrities — small, and large. Prior to that I was a solo engineer developing my skills since 2008–09. Bottom line, my entire technology career has been focused on innovation, and furthering how things are done.

I took it upon myself to develop this for the community as a whole, and easily found other security/data people who were interested in the same thing

Architecture

How does OpenUBA work? For starters, here is our current technology stack.

Image for post
Image for post

“you can’t patent math” — since UBA tools attempt to protect their model development approaches, and the model logic itself, we can mitigate against this by extending the already existing, and well understand data science research as it unfolds in the public arena.

While working in a SOC on data science related solutions, it was clear to me to see the connection between what my employer wanted, and the tools required to develop it. However, working in a large corporation doesn’t always enable us to innovation every second of the day — plus, I knew this couldn’t be developed in a weekend. The following are two very high level ideas driving our work on OpenUBA.

Compute Engines

We are purposely keeping the compute engines very focused by using Spark, and Elasticsearch. This enables us to feed data into the system using two of the most performant, and most adopted compute engines available. Simply by using these two platforms, we now have the means to feed data into the system from a variety of source, all at scale. The rest of the solution is well thought out abstractions around data, models, anomalies, etc.

Model Library/Registry

To foster a community, we have developed a model library. It works similar to Docker Hub. Here, developers and security analysts can simply search a registry of ready-to-use models covering security gaps. This enables model developers to share useful models with the ecosystem, whether for free or for compensations.

There are a lot of moving pieces with this ecosystem, but we are aiming to keep it simple, useful, and transparent.

We will be posting more in depth looks at the features in OpenUBA. Subscribe to our medium publication to stay updated.

Visit www.openuba.org to find our documentation, repository, and white paper. We are building an awesome team of passionate hobbyists, and welcome anyone interested. Join our discord: https://discord.gg/Ps9p9Wy

Image for post
Image for post

Georgia Cyber Warfare Range

Enabling future cyber warriors through education, resources, and community

Medium is an open platform where 170 million readers come to find insightful and dynamic thinking. Here, expert and undiscovered voices alike dive into the heart of any topic and bring new ideas to the surface. Learn more

Follow the writers, publications, and topics that matter to you, and you’ll see them on your homepage and in your inbox. Explore

If you have a story to tell, knowledge to share, or a perspective to offer — welcome home. It’s easy and free to post your thinking on any topic. Write on Medium

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store