Introducing OpenUBA: an open source user behavior analytics platform powered by the scientific computing ecosystem
Visit www.openuba.org for more detail, but I would love your feedback on this work, and all inquiries to help are welcome ❤️
I have had a chance to witness countless UBA tools hit the market, raise funding, serve advertisements, and present at conferences across the country.
However, one thing still remains — UBA vendors keep their models black boxed. I have yet to see a truly “open model” UBA platform that is built on the countless programmer hours put into the popular scientific computing ecosystem. UBA vendors constantly attempt to market advanced AI capabilities being performed on security data ingested by their customer. However, since I am a data scientist, I have pondered so much on these “models” running behind the scenes. I even have had several occasions to slightly peek under the hood on many UBA products, although slightly peeking is cumbersome by itself, and limited.
Even if you look at the open source behavior analytics projects, they all have an emphasis on security, integrations, etc, and far less emphasis on data science/modeling.
It’s like open source UBA tools want to perfect everything else besides actual modeling
It was clear to us before, but now we keep hearing this from developers wanting to use OpenUBA — so, we assume the problem exists for many others.
Problem statement
UBA tools expose “black-box” modeling capabilities, and actually hinder meaningful, and advanced model development.
UBA tools expose “black-box” modeling capabilities, and actually hinder meaningful, and advanced model development. These tools also do not extend the usage of the scientific computing community, therefor missing out on substantial user adoption/usage for the sake of protecting their “IP”.
Background
In 2018, after realizing UBA tools were not going to just magically become more open, I began ideating a purely open source UBA platform built using on real data science disciplines.
I started my corporate career in 2015 working for Research & Development with a fortune 50 company. In R&D, we specialized in building bleeding-edge proof of concepts on emerging technologies. This is where I was able to become introduced to several new technologies (blockchain, IoT, DevOps, Modern Embedded Systems, ML/DL, etc). Before joining corporate America, my colleagues, and I ran a successful technology development firm in downtown Atlanta, GA. During those ~4 years, we developed pretty innovative systems for brands, and celebrities — small, and large. Prior to that I was a solo engineer developing my skills since 2008–09. Bottom line, my entire technology career has been focused on innovation, and furthering how things are done.
I took it upon myself to develop this for the community as a whole, and easily found other security/data people who were interested in the same thing
Architecture
How does OpenUBA work? For starters, here is our current technology stack.
“you can’t patent math” — since UBA tools attempt to protect their model development approaches, and the model logic itself, we can mitigate against this by extending the already existing, and well understand data science research as it unfolds in the public arena.
you can’t patent math
While working in a SOC on data science related solutions, it was clear to me to see the connection between what my employer wanted, and the tools required to develop it. However, working in a large corporation doesn’t always enable us to innovation every second of the day — plus, I knew this couldn’t be developed in a weekend. The following are two very high level ideas driving our work on OpenUBA.
Compute Engines
We are purposely keeping the compute engines very focused by using Spark, and Elasticsearch. This enables us to feed data into the system using two of the most performant, and most adopted compute engines available. Simply by using these two platforms, we now have the means to feed data into the system from a variety of source, all at scale. The rest of the solution is well thought out abstractions around data, models, anomalies, etc.
Model Library/Registry
To foster a community, we have developed a model library. It works similar to Docker Hub. Here, developers and security analysts can simply search a registry of ready-to-use models covering security gaps. This enables model developers to share useful models with the ecosystem, whether for free or for compensations.
There are a lot of moving pieces with this ecosystem, but we are aiming to keep it simple, useful, and transparent.
We want to take the very complicated space of Security Analytics, and disrupt it — in a free, and open way.
We will be posting more in depth looks at the features in OpenUBA. Subscribe to our medium publication to stay updated.
Visit www.openuba.org to find our documentation, repository, and white paper. We are building an awesome team of passionate hobbyists, and welcome anyone interested. Join our discord: https://discord.gg/Ps9p9Wy