Acing Nubank’s technical exercise — part 1

Caio Oliveira
Building Nubank

--

Note: Although we only accept solutions in some languages, we decided to write code examples in Python since it’s a more widely adopted language, the syntax is fairly simple and using it to write functional code is straightforward. If you’d like to see the solution in a language we accept on our process, you can check this Clojure solution.

As we discussed in our prior post, Nubank has enjoyed significant benefits from embracing practical functional programming concepts to build our business. These concepts play an important role in the clarity and consistency of our codebase, and we invest to help all new engineers that start with us learn and understand them. For this reason, we find it helpful to expose engineers considering a position at Nubank to a challenge that helps us assess coding style, and gives candidates a perspective on how it feels to build technology at Nubank.

Nubank’s São Paulo office building

Now to the cool stuff! Solving our technical exercise

Let’s solve a technical exercises that we’ve used in the past. This way we can see all those small concepts combined in a real world-like project.

For this exercise, let’s use a language more widely used than Clojure — in this case, Python. Python doesn’t have persistent data built into the core language, but we’ll use a cool library called pyrsistent, which implements persistent collections for us. You can probably find a similar library in your favorite language as well.

Here’s the problem’s description:

A credit card company has many customers, and we need to defend each of them from fraud every day. One approach to detect fraud is to look at collision networks.

A collision is simple: if some customer A provides the same information that a customer B provided in our Know Your Customer process, we say A and B have a collision. For this exercise, a network of collisions is defined as a connected graph where each edge represents a collision between two nodes.

For instance, if A and B have a collision and B and C also have a collision, then, A, B, and C belong to a collision network.

We will write a small set of functions for reading a file and answering if two nodes are in the same collision network. The file has one edge in each line. An edge is denoted as two node identifiers separated by a single space. Here’s what it looks like:

Let’s start with two imports:

Now, let’s define a couple of functions to parse the file content and turn it into a list of edges:

Simple enough, right?

Now, for the solution: we’ll use a simple recursion*: for A and B to be in the same network, either A is directly connected to B or A is connected to a node V that is in the same network of B. We’ll have to keep track of the visited vertices so we don’t get stuck in loops. If you take a closer look, this is just a slight modification of the depth first search algorithm.

  • Note: Python doesn’t have tail call optimization, so for big graphs this could lead to a stack overflow, but it’s enough to illustrate some important concepts. Also, if you’re interested, there is a python module that implements tail call optimization in Python as well.

We need an efficient way to find all neighbor vertices from a starting vertex. To do that, we’ll need to transform our edges list into an [adjacency list)[https://en.wikipedia.org/wiki/Adjacency_list). So let’s do that:

Notice we created three functions for this:

  1. One just receives a map of new connections (which is just another adjacency list) and adds them to the adjacency list,
  2. Another one adds a bidirected edge to the adjacency list, and
  3. The last one turns a bunch of edges into an adjacency list.

Notice how each function is built on top of the former: this may seem weird at first, but we’ll see that working with small functions is a great way of

a) making the code clearer, and

b) increasing the composability and reusability of our code.

Now let’s implement the actual solution:

And that’s it. If you take a closer look, you’ll notice all we did until now are pure functions. We didn’t even need to declare variables inside the functions! And we did it using persistent datastructures. Now, the impure part of our code (reading user input, getting file contents and printing the output) should be really easy to build, as we already have the building blocks ready.

Notice that, although we had to use intermediate values to make the code clearer because we don’t have something like Clojure’s threading macros in python, all we did was transform data and return the result.

We can see some benefits of using functional programming here: imagine the adajcency list was a shared variable. While we’re running our is_same_network function, we don't have to worry about another thread modifying it. The functions are all small and each have a specific usage, so they can be more easily reused.

In the next post, we’ll cover some things we think are really cool about functional programming when dealing with shared state. It will also be a continuation for this exercise, so make sure to check it out!

--

--