Using Erlang Performance Lab with a real project

Motivation

Over the last few weeks I’ve been working on ErlangPL as my Google Summer of Code project. The part of the project I’ve been working on aimed at analysing the performance of the ETS (Erlang Term Storage) tables. When the project was coming to an end, I became curious if the newly implemented ETS views would be helpful for developers working on real projects. These thoughts appeared because during the development process I was just adding new features and testing their technical correctness. I was checking if they do what they should do. There has not been an opportunity to test the ETS features with a real project yet, so I created such an opportunity myself.

Finding the project

The first thing to do was to find an appropriate project to run EPL (ErlangPL) on it. EPL aims at the projects that are still in the development phase so that was one of the criterions. The other one was that the project uses ETS tables. Elixir application which I’ve been using (sdn_epc) in my bachelor thesis seemed to be a perfect candidate for testing ETS views because it met both requirements. Great!

sdn_epc

One of the main differences between traditional computer networking and Software-defined networking (SDN) is that the control plane and the data plane are now separated. Shortly speaking, an SDN switch only forwards incoming packets following the rules received from an SDN controller. Every decision where to send a packet is made by an SDN controller, not by a switch.

sdn_epc basic work scheme

Let’s consider the scheme above: when any of the PCs sends a datagram to the SDN switch with a destination address, the switch doesn’t know it’ll ask the SDN controller where to send the datagram. In practise the switch sends a PACKET IN message to the controller and it responds with a PACKET OUT message so the switch knows where to send the datagram.

Imagine a situation when a lot of datagrams with random destination addresses come to the switch. The addresses are random, so the switch doesn’t know where it should forward the datagrams to, and it’ll send a lot of PACKET IN messages to the controller. If there are many switches asking one controller (by sending PACKET IN messages) what to do, there might be a situation when it couldn’t handle so many messages and would go down.

SDN Elixir pseudo controller (sdn_epc) is an Elixir application which purpose is to prevent SDN controller against DDoS attack. If all the traffic between the switch and controller (PACKET IN messages in this case) goes through the sdn_epc (see the scheme above), it could detect when the things start getting out of hand and take action e.g. dropping the PACKET IN messages so the controller would stay alive.

Designing the experiment

The test infrastructure used for testing ErlangPL ETS views looks as follows:

ErlangPL and sdn_epc test infrastrucure

I was interested in what is going on in the observed system in terms of ETS performance when the PACKET IN messages are flowing through the sdn_epc component. To force the switch to send messages to the controller, I used macof from dsniff tools. It can generate network frames with random destination addresses. It’s important because the controller used in the test implemented learning switch logic.

Are ETS tables involved?

I sent 10 frames from PC A to the switch and observed the ETS node view. This is what I saw:

ETS node view

In the red rectangle above you can see the ETS tables that are involved in processing the traffic. Flowing through the system, it causes lookups to the marked tables. The system definitely uses ETSes. Great!

Finding culprits

When I started to work on the sdn_epc, I had to familiarise with the of_protocol library which allows Erlang system to receive and send OpenFlow messages from/to a SDN controller. I remembered that the library uses the ETS table named ofp_channel_N (N is a number of a channel between an Erlang system and a controller) but I didn’t know exactly what the table role in the traffic handling was so I decided to investigate the issue.

The ETS node view (the previous screenshot) provided me with the information that processing every PACKET IN message causes 2 lookups to the ofp_channel_1 table. I wanted to know which process or processes are responsible for that lookups. A double click on the row which represented the table in the ETS node view and I knew the PIDs.

ETS details view

There are two processes that read from the table when the traffic flows through the system. Good to know!

Exploring the processes

To learn more about the aformentioned processes, I searched for them in the Supervision Tree view.

Supervision Tree view

It turned out that they are part of the sdn_epc application and both of them are gen_servers. Moreover, while using the Supervision Tree view I learned the processes callback modules names. Nice!

Because the processes are gen_servers, they take action in handle_something clauses. It means they receive a message and then they do what they have to do. ErlangPL has the Timeline view which allows to trace incoming messages to a particular process and its state just after it handled the message so I put the gen_servers PIDs into Timeline and saw this:

Timeline view

Then I knew what messages the processes receive when the traffic flows through the system. In other words, I knew which gen_server clauses are involved in handling the traffic. Excellent!

Summary

ErlangPL definitely gave me a lot of interesting and valuable information. Using the ETS node view, I found out which ETS tables are involved in traffic handling. When I was inspecting a particular ETS table, ofp_channel_1 in this case, I could see which processes communicate with the table. Knowing the processes PIDs, I searched for them in the Supervision Tree view and learned that those processes are gen_servers. I also got to know their callback modules names and the name of the application they are part of. The timeline view showed me what messages the processes receive when the traffic flows through the system so I found out which gen_server handle clauses are responisble for the lookups.

Getting all the stuff I learned from ErlangPL together , I was able to precisely locate the code which is responsible for reading data (lookups) from the ofp_channel_1 table. What is more, going through all the mentioned ErlangPL views I could better understand what is going on in the system. It was definitely easier and more convenient to get this information using Erlang Performance Lab than using Erlang shell and browsing source files!