Plotting the first point of the Feedzai Charting Library.

João Palmeiro
Jan 22, 2020 · 16 min read

{ “context” :

Between July and August, I had the opportunity to join Feedzai’s 2019 summer internships as an Engineering Intern. I worked with the Data Visualization folks at the kickoff of the Feedzai Charting Library.

},
{ “motivation” :

Feedzai Charting Library — I think you already have a glimpse of what this project is all about just by the name. Data is at the core of Feedzai and their visualization provides an effective way of communicating… but why would Feedzai care about creating its own charting library? Let’s look at some of the macro reasons most supported by the term “Library” rather than the term “Charting”:

  • At Feedzai, currently there are no standards for formatting and designing visualizations. This kickoff can also be the beginning of something like a “Data Visualization style guide” that could be integrated into Feedzai’s design system.
  • Different technologies/libraries are used for creating charts in Feedzai’s products. This Charting Library would enable us to homogenize everything under the same technology/library.
  • Since two charts, although with the same name, may differ, reproducibility is also an important aspect. Thus, the possibility of obtaining the same chart, as agnostic as possible, in different platforms is quite tempting. For example, it would be possible for Data Scientists to use it in Python, Front-End Engineers in web applications and even Business Intelligence Developers in Kibana!

Although briefly presented, these points summarize the motivation for this modest venture quite well. They are a bit “general”, but in a way, they were born purely from our context — if you have a few different or complementary thoughts about creating a library, in this case, a charting library, feel free to share them in the comments!

},
{ “feedzaiChartingLibrary” :

From this concept, many questions arise regarding the scope of the library: what charts to include, what level of customization to allow, what technologies to use, and more. We have a jungle full of possible paths and traps. And when those traps are activated, they unleash an avalanche of rocks that threaten to bury us. However, instead of rocks, we have pie charts… in 3D. So which way should we go?

We thought a good starting point would be to expand the React UI Component library used across Feedzai products with components dedicated to different types of charts, and to define these charts in Vega.

At first glance, it seems we have somewhat blurred platform-agnosticism. However, Vega, as we will talk about later, is very flexible and the idea of a repository with visualizations that everyone can contribute to and use will not be forgotten (). For now, let’s dig into the first steps.

Since the start of this project is intrinsically associated with the internship, it becomes interesting to talk about both at the same time. First things first: the first week of the internship was devoted to onboarding, getting to know Feedzai from the inside (and in terms of Data Visualization), and learning about Vega. It was crucial to get to know Feedzai better before I started writing any code whatsoever because this library must be created and adopted smoothly. It was like visualizing before the visualization (mind = blown?). The next (approximately sequential) steps involved the following actions:

  1. Become (did you see what I did there?) and design some sample charts.
  2. Learn about using React and Vega together, and how React-specific Data Visualization libraries work in general (like react-vis, Victory, Recharts, nivo, and vx). The second part was just to get some inspiration. I didn’t copy any code, I swear (😛).
  3. Integrate React and Vega into a demo application with some charts and interaction.
  4. Write the JSON specification for a simple histogram.
  5. Create a new branch from the React UI Component library and add a new component, in this case, a Histogram component that could generate simple histograms as well as overlaid histograms.
  6. Use the new Histogram component in another Feedzai library to evaluate the possibility of integrating this new component in the future.
  7. Learn about unit testing (and some theory).
  8. Write a test suite for the Histogram component.
  9. Write a blogpost up to this point and then from here to the end.

Of course, the internship didn’t follow this so linearly; it obviously passed through an . Overall, things proceeded at a good pace, even with some winding moments. During these two months, I followed what I call a (don’t confuse it with a DAG, Directed Acyclic Graph, please). stands for “Directed Absorption, Application, Assessment Graph” and summarizes the methodology I adopted.

Starting with “Absorption”, I tried to learn as many things as possible to gain the minimum confidence needed to apply (“Application”) my learnings and start the work required to meet the goals for this internship.

After completing small chunks, the “Assessment” node would arrive and, with more calm and constructive criticism, I tried to understand if what I did was good, or if I should implement some changes. This way, I wasn’t so stuck on how to implement something, I just let it happen. With the gained knowledge and experience, my critical eye became a little sharper, and the next steps became a little clearer, mitigating the impact of a bad idea and ensuring that every day I made at least a little bit of progress on the roadmap. However, as might be expected, this route was not fully sequential. Over time, it was necessary to jump between nodes in this graph, as I had to check the Vega documentation to learn something new or implement a quick change during the evaluation process to be able to quickly test a new idea, for example.

You might wonder why we decided to start with the histogram as the first visualization for our library. Good question! There were two reasons:

  • First, histograms (in particular a variation of them) are one of the most commonly used chart types within Feedzai and across its products.
  • Secondly, introducing and integrating a possible React Component with a ready-to-use histogram (rather than another kind of data visualization) offered a smaller challenge both in terms of complexity and time within Feedzai.

We’ll get back to these points later, but for now, let’s look at the technologies used besides our old friends “pen and paper”.

},
{ “technologies” :

Regarding the stack used during the internship, the following technologies and libraries stand out:

First of all, React is… well, React is React. It is one of the most widely used and known JavaScript libraries for building user interfaces today.

I’ve mentioned Vega a few times already, but I wasn’t referring to the brightest star in the constellation of Lyra. I’ve been referring instead to a visualization grammar of the same name. Vega is a declarative language for describing and creating interactive data visualizations. The description (known by ) of both the visual appearance and the interactive behaviour of a visualization is done in a JSON format. After that, the specification is parsed by Vega’s JavaScript runtime and we can generate static images or web-based views (using Canvas or SVG). This way, Vega provides a broad set of building blocks that allows you to plan and build flexible visualizations. Some examples of these building blocks are: data loading and transformation, scales, map projections, axes, legends, graphical marks such as rectangles and lines, among others. On the other hand, interactions can be specified using reactive signals that dynamically adjust a visualization in response to input event streams. For more information, you can check the documentation. I’d suggest reading this page as it provides an interesting overview of the foundations.

Spoiler!

To make it easy to load a Vega specification into a React Component, we took advantage of Vega-Embed, a helpful package that comes with three fantastic extras: a button with action links such as “Save as PNG/SVG”, a tooltip plugin (Vega Tooltip), and some predefined themes (Vega Themes).

In order to ensure the created component worked as intended, we coupled a test suite. With the help of Jest and Enzyme, a set of unit and snapshot tests were defined.

Jest and Enzyme are different (but complementary) testing tools. While Jest acts as a test runner, assertion library, and mocking library, Enzyme adds some additional utility methods for rendering a component, finding elements, and interacting with them. Both are widely used in React, although Jest, like a good joke, works very well elsewhere (TypeScript, Angular, Vue, etc.).

Finally, it’s worth mentioning the use of Lodash, a JavaScript utility library with lots of methods to solve certain needs in a productive way (like performing a deep comparison between two objects to determine if they are equivalent), and Storybook. Storybook provides an excellent sandbox to mock different components (and different states and/or props, for example) and expose them in a pleasant way for the stakeholders. Basically, we end up with a kind of wiki that we can easily browse and check the different available components.

In short, I think the stack used can be summarized as “JS²” — .

},
{ “theDirt” :

It is finally time to look at the “dirt”! In this part, I will share some of the details of what we managed to achieve, the problems we faced, and the doubts that remain. I’ll also share some ideas on possible vectors worth exploring in the future.

First, the Vega specification, that is, the JSON file, contains a basic histogram with some artificial data. Thus, it is ready to be used in any of the supported platforms, and we allow anyone interested in testing the Vega specification to see a simple, out-of-the-box example. Therefore, one of the responsibilities of the React Histogram component is to adapt and customize this Vega specification.

The first few lines of a possible Vega specification.

Before delving into some of the details of the solution’s design, I think it is useful to first give an overview of the Histogram component. This component is a class component, with multiple props, that leverages some lifecycle methods. It is important to note that we used refs to facilitate the identification of the container (<div>) in which the histogram is embedded. This allows us to capitalize on the “commit phase” (as we can work with the DOM) to “mount” and “update” the chart (according to the original and new props, respectively). Note that in the “render phase”, we just add the empty container that will later contain the chart. A new container, as well as a new chart, is rendered every time there is an update.

These are the main features of the Histogram component:

  • Basic histogram for a single distribution.
  • Overlaid/Overlapping histogram for two distributions.
  • Legend with customizable labels.
  • Vertical axis with a linear or logarithmic scale (semi-log plot).
  • Tooltip to show some details associated with each of the bars.
  • In the case of the overlaid histogram, an interactive legend that allows viewing only one of two available distributions.
  • A dynamic number of labels depending on the number of bins to allow a good strategy for overlapping information on the horizontal axis.
  • Button with action links to save the chart in two different image formats.
  • A threshold marker to specify a reference value.

Although a histogram is one of the most widely used and well-known chart types, our component needed to handle challenges beyond styling a simple “set of bars”:

  • Given the complexity and size of the data Feedzai handles daily, all the calculations needed to transform data for visualization are done outside of the Histogram component.
  • We achieved the conjugation between React and Vega through an approach called . The lifecycle methods wrap the creation, updating, and removal of Vega charts, establishing a kind of boundary between React and Vega code.
Lifecycle Methods Wrapping basic skeleton.

On the other hand, while it was not difficult to integrate React and Vega, we were walking in uncharted territory. There aren’t many examples or use cases that combine React and Vega, and little discussion on possible approaches (as well as advantages and disadvantages). The best (in my opinion) blogpost found, written by Peter Beshai, dates back to 2016 and is just a simple proof of concept. It doesn’t cover component creation to integrate into a UI library used in production.

In order to narrow this gap a bit, and given that Vega uses D3 heavily within its implementation, the research also focused on how to combine React and D3, as there are significantly more resources. Given this scenario, it is evident that to extract the potential associated with React and Vega, and both at the same time, we needed plenty of room for iteration until we found a final solution.

One of the ideas that did not go forward was the use of react-vega. This package is intended to convert Vega specifications into React class components. However, there is no clear advantage in using this package, and improving React support is still a point in the Vega (and Vega-Lite) roadmap (). So, we chose not to use this additional third-party package. The code that makes up this package has only been checked to complement the research done as it has important details to consider.

Looking at the Vega roadmap, there are two features that we are really looking forward to: and .

Animations would be useful to show small differences if the distribution in the histogram changes. Native support for labelling would allow us to reduce the number of JSON lines currently required to introduce the threshold marker (fortunately, there is already a nice prototype called Vega-Label!).

Regarding data, Vega has a well-defined data model which uses data. This should be kept in mind, as the data that is passed to our component is already transformed and may no longer be in its original shape. Therefore, we may have to reshape it. This happened when we tested the new Histogram component in another Feedzai library to evaluate the possibility of integrating it in the future. Since the data feeding the histogram did not fit a format, it was necessary to make some changes to the calculations and objects to ensure that the histogram received the data as expected.

Data model used by Vega.

Finally, I would like to share some backlog ideas in the hope that it gives you a glimpse of possible future steps.

There’s still a lot to consider regarding the React + Vega relationship, as it was one of the main pivots during the internship. It’s inevitable to wonder if there is a better way to unfold the relationship between Vega and D3 (or the concept of out there). Strategies for integrating React and D3 have already been widely discussed and proven to work in a way which is consistent with React. This means that D3 is responsible for the mathematical part of creating charts, and the DOM leadership belongs to React. In short, D3 performs all the calculations for SVG paths, scales, layouts, and any transformations that take some data, and then React takes the results of these calculations and draws the chart. This (not vectorized) path is the most React-friendly one. It becomes tempting to consider if there will be a similar way of approximating React and Vega closer together by creating a kind of framework that allows the efficient construction of specifications and components for new types of data visualization. However, Vega is not D3, even if the former depends heavily on the latter. D3 is intentionally a lower-level library, a kind of “visualization kernel”, whereas Vega provides a higher-level visualization specification language. By design, D3 maintains an “expressivity advantage” (in some cases it will be better suited for novel charts, for example) and Vega a “convenience advantage” (for a wide range of common, yet customizable visualizations, for example). Therefore, it is important to remember the philosophy of Vega and stick to it. Unfortunately, during the internship, I could not come up with a substantially different strategy from the one implemented, but I strongly believe it exists. Consider this paragraph mainly as food for thought.

Since only unit (and snapshot) tests were implemented, it would be important to consider other types of tests in the future (as well as assess their possible interaction with React and Vega), such as integration tests and end-to-end tests, but particularly some visual regression testing (unit tests have a very specific low granularity and don’t see the bigger picture).

Briefly, in visual regression testing, we render a page, take screenshots of specific elements, and then compare the new screenshots with the ones taken earlier. This way, if there are differences, a possible test will fail and a comparison image is created so that a developer can look for any problems in the code. As you can imagine, it is critical that the Feedzai Charting Library charts show exactly what we expect them to show because small differences in what we see can lead to big differences in what we think. This type of testing is a useful complement to standard testing, especially when it detects changes that are very difficult for the human eye to spot and that may misrepresent what we hoped to show. If you are interested in learning more, this website is a great aggregator of articles and tools on visual regression testing.

},
{ “takeaways” :

Just two months of work and so much to cover! I hope the previous sections have been clear (and fun) and that you now have a “mind chart” full of ideas for the future.

These are perhaps the most important things I carry with me from my internship:

  • . During this internship, I understood the importance and advantages of making sure that I clearly knew the core. In other words, learning Feedzai at my own pace — from the inside, along with its Data Visualization needs — was key to understanding what my job and responsibilities would be, but above all, why and how this project could help Feedzai.
  • . It may sound cliché to talk about the importance of teamwork, but there are two specific things that I have experienced and would like to repeat. The weekly syncs, with three people who were always willing to listen and help me, as well as the challenge of presenting the Feedzai Charting Library several times and in distinct contexts, were fundamental to strengthen my learning process and to receive useful feedback. In particular, the weekly syncs, for their periodicity, allowed me to organize my work and quickly tackle any blocks that came up. Regardless of the block, I knew there would be an unblock.
  • . Not much to add after I’ve spent so much text talking about Vega. Vega (and Altair) is an excellent option that I will always keep in mind whenever I need to visualize data. Distinct from more straightforward libraries, such as Seaborn, and also from D3, it has its own characteristics that can be useful.
  • . Feedzai is like the DOM — a logical tree where each of us is a node with a unique set of characteristics. Leaving the geek puns aside, the thing that hooked me the most was the freedom and trust they had in me from the first day. It translated into something beautiful where I was able to actively engage. This formula, although intimidating at first, proved to be gratifying, so I truly thank Feedzai for this opportunity. Using my wildcard quote for this blogpost, and looking inside Feedzai, I’m reminded of Steve Jobs’s quote: “It doesn’t make sense to hire smart people and tell them what to do; we hire smart people so they can tell us what to do.”
  • . Creating a visualization is like cooking steak with white rice; while not being worthy of a Michelin star, it ensures subsistence and provides satisfaction. However, sometimes, you need to show off how great your cooking skills are. The same goes for Data Visualization. What I’m trying to say is that, in some contexts, a “bland” chart won’t do, and it is critical to work towards custom (and customizable) visualizations.
  • . A dash of design, a bit of data science, a touch of front-end development… and we ended up with a very tasty mix! In Data Visualization, we have one of those crazy Venn diagrams with lots of subtopics that eventually come together in this discipline. Since I really like daily variety in my job, I was drawn to Data Visualization. If you like mixing different areas of knowledge, then maybe Data Visualization is for you too.
  • . I ate an average of 2 bananas a day (this sample was taken from a uniform distribution). Maybe I should try other fruits next time.

},
{ “demo” :

Before closing this blogpost, there is one last thing I want to show you. You can find a small demo of a React + Vega application with a basic histogram and a button that allows you to change the size of the bins here. Also, if you want to see other examples, check out the examples section of the Vega website.

},
{ “acknowledgements” :

This is the easiest part to write. First, I want to thank Feedzai, as an organization, for this amazing opportunity and all the support. It was two months of extremely gratifying work. From the first contact, I felt a warm atmosphere (which I believe will continue as long as I do not commit any fraud). In particular, I would like to truly thank the three people who supported me most during this period, namely Beatriz Jorge, Liliana Fernandes, and David Polido. You kicked off the idea of the Feedzai Charting Library and, since my interview, you have been present not only to help me and contribute to the project, but also to talk, discuss Data Visualization, and provide me with other complementary opportunities during the internship. It was a really good experience and it allowed me to greatly enrich my “dataset” for the future — thank you very much for everything! Finally, I would also like to thank Krisztina Nagy and Indigo Wilmann for the careful feedback that allowed me to reach the best possible version for this blogpost. Otherwise and alone, this would not be possible.

}

Feedzai Techblog

Welcome to Feedzai Techblog, a compilation of tales on how…