dnsjit performance++

With the release of dnsjit v0.9.3 the first milestone of the project Performance and Responses has been completed. The focus of this milestone has shifted somewhat since it was defined, a lot of the work has gone into other parts of dnsjit to increase performance because threads are not always the solution.

The project is funded by the Comcast Innovation Fund for the continued development of drool, which has been rewritten into Lua/dnsjit.

dnsjit v0.9.3 packages can be found in the pre-release channels for Debian, Ubuntu, CentOS, RHEL, SLE and openSUSE.

Input improvements

For the input of packets I’ve added 3 more modules:

  • fpcap: use fopen() to read a PCAP and parse it without libpcap
  • mmpcap: use mmap() to read a PCAP and parse it without libpcap
  • pcap: use libpcap

On my laptop (Dell XPS 13 Developer 9350) they perform as follows:

  • pcap: 6.38Mpps
  • fpcap: 7.21Mpps +13%
  • mmpcap: 23.2Mpps +264%.

Parsing network packets

For the parsing of packet stack, which is built into pcap-thread, I’ve added filter.layer and reused code from pcap-thread, they perform as follows:

  • pcapthread: 3.60Mpps
  • pcap: 3.96Mpps +10%
  • fpcap: 4.36Mpps +21%
  • mmpcap: 7.97Mpps +121%

Threads

As for throughput to threads, despite not using atomic yet, there has been some good gain and simplification of the code. My laptop can pass empty static object between threads at around 11M/sec, the SLLQ code used in drool v1.1.0 does a bit over 5M/sec. As a note, doing this in the same thread does 200–250M/sec so depending on the workload it might not be wise to use threads.

Sending queries

For the sending of queries I’ve added a really simple UDP DNS client (output.udpcli) which basically just takes the full UDP payload of a query and sends it using sendto(). Using the mmpcap input, new thread code, filter.layer and output.udpcli my laptop maxed out at 450Kqps to localhost. For drool v1.1.0 this was maxed at around 260Kqps.

Custom Lua

There has also been work done on the design w.r.t. adding custom Lua code in between modules. I’ve added a module that uses Lua’s coroutines to double the performance and make it really easy to get data out of the processing (example), previous way had it’s own Lua state so no data could be shared with the main state.

More work is underway here to “revert” the process chain, instead of building a chain of receivers (input->receiver->receiver->output) you would be able to have a chain of producers (input<-produce<-produce) which fetches objects instead of receiving them. This makes it possible to build chains that uses both and has custom Lua in the middle (producer<-lua->receiver) and it performs really well (empty static objects):

  • zero:receiver() -> null:receive(): 210M/sec
  • zero:produce() <- null:producer(): 253M/sec
  • zero:produce() <- lua -> null:receive(): 216M/sec

This will be the ultimate way of processing PCAPs!

Cheers,
Jerry

One clap, two clap, three clap, forty?

By clapping more or less, you can signal to us which stories really stand out.