[Back to basics] Counting sort

Published in

Programming basics

5 min readMar 12, 2020

Last time I told you a bit about bubble sort. Bubble sort is cool, however, you will never use it. This time I am going to show you a sorting algorithm faster than the one implemented in your language — counting sort. True story.

Performance

The most popular sorting algorithm — quicksort has a varying complexity between O(log(N) * N) and O(N²). Counting sort works for O(N + M) where M is the difference between the smallest and the biggest elements in your data. Quicksort is a bit more performant on smaller data sets with huge value difference (for example, a hundred elements with values between zero and a billion) but in general counting sort is more performant due to linear complexity.

Quite cool, eh? You may wonder, why is it not being used in standard libraries of programming languages. Well, the problem with counting sort is that it’s a niche. In fact, it can only work with integers (not even floats) or something that can be mapped back and forth to integers. For example, database records usually can be mapped back and forth to integers by their ID, hence, the algorithm is well-suited for sorting data by integer IDs (keeping in mind the ranges, of course).

But let’s take a look at the algorithm already.

Algorithm

In counting sort instead of comparing and swapping elements a program counts all the values (e.g. in) and then fills the array starting from the smallest value.

For example:

Count values in the array [1, 4, 4, 2, 6, 1, 1]
We have: three 1s, two 4s, one 2, one 6
Put them back:
- Put 1s first: [1, 1, 1]
- Next, 2s: [1, 1, 1, 2]
- There's no 3s.
- Now, 4s: [1, 1, 1, 2, 4, 4]
- No 5s.
- One six: [1, 1, 1, 2, 4, 4, 6]

Usually, one would want to use another array to store counters. Hashmaps can also be used, by they are much slower.

Because we need to store counters, the algorithm requires additional memory O(M) which sometimes can be crucial.

[Ruby] Classic implementation

This is the simplest implementation that assumes array contains int elements ≥0.
This is important because negative values are not accounted for when indexes used. There’re ways to work with negative integers as well, but we will see them later. This is for demonstrative purposes.

I tried to do some benchmarking and, to my surprise, ruby implementation terribly failed to Array#sort from standard library. This is due to all object-oriented overhead, I suppose (#sort uses native C code implementing quicksort). I tried to optimize it but still couldn’t get close to the library function.

This is why I decided to implement it in C just for the sake of benchmarking.

[C] Implementation working with any integers

To work with negative numbers we need to find the smallest number in the array as well. After that we will use the difference between them them and adjust our indexes accordingly. The index formula is: i = x — min
For example, in the array [-20, 3, 15, 7, 2] we need 36 counters (-20..15) and the index for 8 is: i = 8 — (-20) = 28.

Other than that, the solution works the same way as the one in Ruby.

Benchmarks

I tested C implementation against the qsort from stdlib.h on my Mac using gcc. I used two types of tests: random data with M<100, and a reversed range N…0 (in this case M = N). The data sizes were: 1 million, 100 million and 500 million. Here are the results:

Random data

            1m       100m     500m
Counting:   7        680      7366
Stdlib:     67       4797     24226

Reversed range

            1m       100m     500m
Counting:   13       1367     13321
Stdlib:     32       3009     21788

Keep in mind that I didn’t do a lot of runs so this comparison is not even close to being definitive. A thorough comparison would require a lot of time and different types of tests and is worth whole research (probably an interesting idea for computer science students).

However, you can already see how linear complexity of counting sort outperforms quicksort in these cases.

[Kotlin] overengineered OOP solution

I decided to have some fun and design overly complicated OOP solution. Initially, I picked Java for that but decided to use Kotlin just because it’s more modern.

For the sake of the post size, I will not show you the full code (you can find the link to the repo in the end). I will show you a simple class diagram I made and a core class.

Class diagram

This is not UML diagram or something, I just used an online diagram tool and made that. Arrows show dependencies. Arrows with <<abstraction>> show implementing interfaces.

I tried to keep abstractions making sense (not to introduce classes just for the sake of numbers). Probably, could’ve gone even further.

Because the solution is so over-engineered, it is has a high level of abstraction. This allows us to use counting sort with any type that has OneToOneRelation<T, Int> . In this case, I implemented only integers so the relation is simple: f(x) = x

Class containing the main algorithm

The core class for our counting sort is app.strategies.countingSortingStrategy.AscendingIterator<T> .

Here it is:

Iterator is like a one-off stream of data. So this iterator receives a collection of elements and streams them back, ordered by the integer relation.

I haven’t done any benchmarks for the Kotlin solution, because I wasn’t interested in performance. Just in the design. Also, it uses hashmap for counting instead of an array.

Conclusion

Counting sort is a handy algorithm to know because in some cases you can use that instead of the standard sort to speed-up your program. This is a great niche algorithm.

Because it’s a niche, depending on the case, you can provide lots of small optimisations. For example, counting sort is often used in competitive programming and a common trick is to allocate memory for counters during compilation time (so the time is not wasted on malloc); for that, you need to know the range of values that can be passed to the program (for example, if values are from 0 to 1000, you could allocate memory right away in C via definingint counters[1000] as a global variable).

Also, the Ruby example showed that even a good algorithm can be ruined by the limitations of a language. In order to get value in ruby, one would need to write a C extension for it. And that can still be worth doing if your program is heavy on sorting.

You can find the code here: https://github.com/Nondv/basics/tree/master/sort/counting_sort