Ancient City Ruby 2016 & Snake Case

Earlier this month I had the opportunity to attend Ancient City Ruby 2016 in St. Augustine, FL. (No, Florida will not become my second home.) The talks were varied and the opportunity for geek socializing were well thought out. In fact, the conference itself was a well-oiled machine and I look forward to visiting again in the future.

During the conference, a programming challenge was presented called snake_case. While I didn’t have a solution during the conference, I’ve had the opportunity to review a number of solutions that others have presented. For instance, Ray Hightower (who presented a talk about Parallela) wrote on his blog about a brute force solution in Ruby, C, and Go. Another presenter, Ross Kaffenberger, wrote up a number of solutions and compared their speed and readability.

Since I’ve been studying Elixir, my first thought was to compare performance. Surely Elixir would not only provide similar performance, but maybe better (go BEAM!). Let’s start with the brute force approach.

All results are done on a 2012 Macbook Pro, 2.5Ghz Intel Core i5 with 16GB RAM.

Brute Force (Halo Style)

Here is the Ruby version presented by Jack Christensen during the conference. Check out Ray’s blog post for an in-depth explanation.

puts (0..(2**20)).count { |n| n.to_s(2).chars.count("1") == 10 }

Wow. Lots of code, eh? Let’s move this over to Elixir:

IO.puts Enum.count 0..round(:math.pow(2, 20)), fn x -> Integer.to_string(x, 2) |> String.replace(“0”, “”) |> String.length == 10 end

And putting a time on execution gets us…

elixir snake_case.ex 8.88s user 0.46s system 99% cpu 9.391 total

Almost 9 seconds to do the work Ruby did in less than 5 seconds. Ouch. Well, maybe since Elixir is meant to be multiprocess, we can split up the work and help things out!

There’s a lot going on here. First off, in order to share state between processes in Elixir we need something to store that state and make it accessible to the processes. Elixir calls these things Agents. Give an Agent something, tell it what to do when you update the Agent, and bam. State in a functional programming scenario.

So we start up an Agent to store this state, naming it after the current module for ease of access. Next, in order to actually let multiple processes handle chunks of the numbers to process, Enum.chunk breaks up the huge list. I’m using 262144 to divide the processing into four parts.

But wait, how do we actually set up four processes and block the main process until all the children are done? Enter the Task. Once we set up the individual tasks (easy), then we just feed Task.await the PIDs of the tasks and the main process waits for completion. Hooray! From there, it’s easy to retrieve the final count from the Agent and we’re done.

Performance, though. Did this help?

elixir snake_case_multi.ex 13.89s user 0.67s system 312% cpu 4.665 total

Hey, it did! Only 4.665 seconds this time. Nice! But still slow. That’s four cores (with Hyper-Threading!) beating away at the same problem Ruby did with a single core. Not too great, and definitely a disappointment. With sadness in my heart, I turned to the Elixir Slack channel and threw this challenge out there. In no time, minds sharper than mine were churning bits and writing code.

Behold, recursion and pattern matching! Utkarsh Kukreti was incredibly helpful and came up with this solution to convert an integer into a bitstring and then recursively count up the number of ones in the string. Zeroes are totally skipped, removing the whole string processing used in previous examples. Performance time!

single: elixir snake_case.ex 1.21s user 0.26s system 106% cpu 1.382 total
multi: elixir snake_case.ex 2.03s user 0.39s system 181% cpu 1.336 total

Hmm. Performance is much better, under 2 seconds, but the multi-process approach is only slightly faster, less than a tenth of a second. Perhaps now is a good time to dig into some of the benchmarking tools available in Elixir.

Benchmarking with timer

As we all know, Elixir can access everything in Erlang. That’s great news for this benchmarking pursuit because Erlang comes with a ton of tools, including the timer module. Of interest in particular here is the function tc which allows us to measure the time a function takes to execute and returns the execution time along with the result of the function. Note that the time returned is in microseconds, or millionths of a second. Here’s an example:

Total paths for brute force, single process: 184756
{769587, :ok} # Result from :timer.tc

Internally, then, execution time took approximately 0.769 seconds. Using the system timer, the same function took 1.363 seconds. I think it’s safe to assume that the 0.594 seconds of difference are due to compilation and loading BEAM. Let’s try this with the multi-process variant also:

Total paths for brute force, single process: 184756
{724477, :ok}
Total paths for brute force, multi-process: 184756
{698528, :ok}
elixir snake_case.ex 2.70s user 0.43s system 151% cpu 2.064 total

Multi-process then is still faster, but only by a little. This also points out the fact that these tests aren’t averaged. Maybe somebody out there wrote a tool for us to do something like this, but with averaging and other stuff…

Benchmarking with Benchfella

Alexei Sholik came to the rescue many months ago when he released Benchfella. Benchmarks are simple to define and easy to run. An example from the docs:

# bench/basic_bench.exs
defmodule BasicBench do
use Benchfella

@list Enum.to_list(1..1000)

bench "hello list" do
Enum.reverse @list
end
end

Running this with mix bench gives us some fancy output!

$ mix bench
Settings:
duration: 1.0 s

## BasicBench
[13:23:58] 0/1: hello list

Finished in 3.15 seconds

## BasicBench
hello list 500000 5.14 µs/op

For a total duration of 1.0 seconds per test, Benchfella ran the function and averaged it all out for us. Very nice! Mad props, Alexei!

Putting it all together

Multiple types and kinds of solutions? Check.
Benchmarking? Check.
Mix project? Check! https://github.com/GimliLongBow/snake_case

I’ve taken the Gist above, cleaned it up, and added more solutions (thank you, Ross!). There are also two types of benchmarks: :timer.tc and Benchfella. Testing is in place. Now let’s hit up that output!

Note: Benchfella defaults to 1 second of duration. Since several of the tests take more than 1 second to complete, I increased the duration to 18 seconds to get at least two passes of each solution.

mix bench -d 18
Settings:
duration: 18.0 s
## SnakeBench
[21:15:44] 1/7: Pascal’s Triangle
[21:16:38] 2/7: Iterative
[21:17:34] 3/7: Factorial
[21:18:17] 4/7: Brute Force single process
[21:19:10] 5/7: Brute Force optimized single process
[21:19:56] 6/7: Brute Force optimized multi-process
[21:20:34] 7/7: Brute Force multi-process
Finished in 335.95 seconds
## SnakeBench
Factorial 10000000 3.92 µs/op
Pascal’s Triangle 2000000 17.34 µs/op
Iterative 5000 9155.85 µs/op
Brute Force optimized multi-process 50 628871.18 µs/op
Brute Force optimized single process 50 754899.36 µs/op
Brute Force multi-process 10 4085647.50 µs/op
Brute Force single process 5 8842539.00 µs/op

The factorial solution is king, taking an average of 4 microseconds. Yay for Enum.reduce!

Back to Ruby

Elixir is supposed to be faster than Ruby, right? After all, it’s compiled into bytecode and executed in a VM instead of being interpreted. So let’s compare! Taking Ross’ Ruby code and executing it on my machine give the following results:

Comparison:
snake case factorial: 312024.2 i/s
snake case iterative: 33941.2 i/s — 9.19x slower
snake case enumerative: 32034.3 i/s — 9.74x slower
snake case recursive: 33.4 i/s — 9355.68x slower
snake case brute force: 0.2 i/s — 1403140.68x slower

Converting Benchfella’s numbers into this format:

Comparison:
snake case factorial: 255102.0 i/s
snake case pascal’s: 57670.1 i/s
snake case iterative: 109 i/s
snake case brute force: 1.59 i/s
Ross: Iterative == Me: Pascal's
Ross: Recursive == Me: Iterative

So Ruby is still faster! Here are some potential conclusions:

  1. I’m a beginning programmer and these could be optimized.
  2. Elixir isn’t great at math.
  3. Use a systems language when speed is required.
  4. Optimize for developer happiness over raw speed; otherwise C.
  5. Naming is hard and I named my functions late at night.

Asides, Caveats, and Learning

Lots to learn here! First, an issue came up during testing that an experienced Elixir dev probably saw in my Gist above. Every time I ran the test suite, the second multi-process solution returned an extremely high number and it took me a few minutes to figure it out. Take a look at the Gist again if you haven’t figured it out already:

Did you see it? I never stopped or reset the Agent after completing the test, and since the name of the Agent is the module, the collision was inevitable. Easy solution:

Agent.stop(__MODULE__)

Otherwise, Agents and Tasks worked great! They are both simple to implement solutions to common problems, particularly in a multi-process functional world.

I guess I should say: I’m a little disappointed at the performance comparison between Elixir and Ruby. I thought Elixir would beat it hands-down, and it lost instead. I guess handling 2 million websockets is a little different from doing a ton of math. There is no perfect language, I guess!

By the way, I’d love to hear your thoughts on all this stuff! You can either comment below, on Github, or ping me on the Elixir Slack channel. Thanks for making it all the way here, and enjoy this parting gift!

Oh, I actually meant GIF. Those errant Ts!