Introducing Brushable Histogram
The idea behind this component came up during the development of Genome, our new dynamic visualization engine. We needed a way to display how the events generated in the visualization were distributed over time, which sounds quite simple but ended up being a bit trickier for it to work for all our use cases.
A customer might have a regular habit of making a couple of purchases a month for a few years. However, fraudsters may deploy a bot attack which makes hundreds of purchases in just a few minutes. This required a flexible binning approach to bin events for a time period of a minute, or a month, depending on what’s needed.
Now consider that we are looking at a graph which involves both cases. We should be able to zoom in on interesting time ranges while continuously adjusting the time granularity of the binning, and pan around it to uncover the story. So here we discovered two fundamental actions which our histogram should support: pan and continuous zoom. To speed up navigation between different time ranges a slider was also necessary.
But when you are looking at events spanning over a two-day period (binned by hour), how do you know you need to zoom in on a specific 5-minute interval in which a bot attack took place?
This led us to add a bit of flair to our slider and turn it into a strip plot of the full time period in investigation. As each strip represents an event, when you have an area with a large density of strips, it is an indicator of a high frequency of events. This way, we are able to give a more granular view of event velocity which allows us to uncover that bot attack and many other fraud patterns.
We kept searching for a histogram component that could give us the flexibility to do all of this, but we couldn’t find it. So, we set to create a new one! Beatriz Malveiro, one of our data visualization engineers, did the first original conception and prototype, and Victor Fernandes made several improvements to that first version.
We improved the histogram a bit while developing Genome (e.g., we added the play button) and after a while, we started to seriously think about open sourcing it. It was, as far as we knew, a new type of component and was pretty stable and worked well.
The open-sourcing process
When I was finally able to allocate some time to work on this project, the first steps were to:
- Move the code of the histogram that we had on the Genome repository to a new internal repository just for the component.
- Publish the histogram in our internal npm.
- Change the Genome code to depend on that internal package.
We had to do all this quickly because while we did the migration, we couldn’t make any changes to the histogram code in Genome.
Fortunately, we used an internal boilerplate to create UI components. This boilerplate had tests, a Storybook, ESlint and other already configured things. Another thing for us to open source one day!
Anyway, using the boilerplate, and after a few iterations we got things working. Genome was using the histogram on the new repository.
With the code in a new repository, I started working on making several key improvements before we could move the code to GitHub:
- Improve unit test coverage
- Remove unwanted dependencies
- Improve code quality and modularity
- Improve documentation
- Add Storybook stories
- Run some performance tests
Doing all this did take a bit of time, but it was worth it! Now the code was in much better shape and ready to be public.
Before moving on to the next steps, I will make a quick note on the performance tests. These were the results we got:
- Initial render is relatively fast with 100k data points
- Tooltip highlight works smoothly with 300k data points
- Brushing works with 70k data points, and smoothly with 25k data points
These were qualitative tests to see how the component behaved with large amounts of data and to detect the main low-hanging fruit that could be hindering performance. In this case, we did a couple of optimizations to avoid unnecessary re-renders.
Once the code was ready, the final steps were:
- Create the GitHub repository
- Move the histogram code there
- Publish the histogram on the public NPM registry
- Adapt Genome to use the public version
- Deprecate the internal repository
- Configure the CI pipeline, coverage and static page in GitHub
At this point, we had our code public, but we weren’t ready to announce it. We wanted to give some time so that eventual issues could be found, and they were! So we are glad we didn’t announce it right away.
And actually, during that time period, the histogram was even reused in an internal proof of concept, so we already had value from the work we did!
As for next steps, the main feature that we are missing is support for other types of scales in the x-axis (right now we only support time scale).
This was our first open source frontend component. Although the component itself is not extremely complicated, the process of open sourcing wasn’t easy. My hope is that this “first” for a frontend component opens up the road for more frontend components to be open sourced at Feedzai!