What do we want?? All t3h Arduino randomz!!

Mark C.
5 min readNov 18, 2017

--

…and now you can have them.

Tl;Dr — It is possible to generate all the random numbers an arduino under standard configuration in a matter of minutes. Thus, if you are using the arduino PRNG for anything serious (e.g. anything cryptographic, or handling sensitive data), you’re making a huge mistake.

What should happen now? Well, developers should now be certain to not use AVR microcontrollers for any serious purpose without significant code refactoring/additional controls (e.g. regular re-seeding of the PRNG). Ideally, Microchip (owners of Atmel) should push to implement some sort of entropy seeding with regular reseeding into their core libs, but this is unlikely to be effective. In short, these chips should carry a warning (as I have said in previous articles); “Not suitable for Cryptography — no CSPRNG”.

Background

Recall my other posts on randomness on the arduino? (link, link) Well, some didn’t see why regular reseeding with decent entropy was important. This article shows you why.

I suggest you read the other articles to get the lowdown on why randomness is important in our modern world. Got that knowledge? Cool! :D Read on, traveller…

Arduino PRNG — the Linear Congruential Generator (LCG)

The Atmel AVR SDK uses the GCC LCG implementation (see here). Put simply, these are random-like functions that cover a very large ‘space’ of numbers. 2³¹ -1 to be precise. Whilst this isn’t the whole space of an unsigned long (which is 2³², so it’s roughly half the total available space) it is quite a lot of numbers.

The way that the code works is roughly as follows:

  1. Take a seed. If there’s no seed, use 1. If the seed is zero, use 123459876.
  2. Use the seed to compute the first random number.
  3. If you need another random number, use the last random number to get the next one following this algorithm (this is the LCG algo — see here for more information if you’re mathematically curious):
hi = x / 127773L; 
lo = x % 127773L;
x = 16807L * lo - 2836L * hi;
if (x < 0)
x += 0x7fffffffL;

4. Keep looping like this for as many random numbers as you need.

That’s the essence of the way that most random numbers are generated in the world. This is the skeleton of a PRNG.

So what? What’s wrong?

Well, the issue is that the seed is never updated.

Think of this whole PRNG as a giant merry-go-round. If you never reseed it, you’ll just keep going round and round. That means, your ‘random’ numbers are predictable.

Reseeding in this analogy is akin to stopping the merry-go-round, moving it to another position, and starting it off again. Note that even if you do reseed it, you’re just moving to a different place on the merry-go-round. How do you know how far to move it? Well, that is where your ‘entropy pool’ comes in.

The core idea is that your entropy pool generates comparatively slow, but very random numbers. Your LCG is very fast, but very predictable. Thus, you use the entropy pool to ‘spin the LCG around’. Doing so, and doing it often enough and regularly, you get the speed of the LCG, and yet the randomness of your entropy pool — at least this is the theory.

In actuality, it comes down to the number of ‘bits of entropy’ you have access to. This is related to the ‘bits of security’ used to rate algorithms like Salsa20 or AES256, say.

Important takeaway — the LCG in the arduino has well below the required number of bits of entropy for modern cryptographic purposes — which is nominally 128bits. The PoC below should hopefully demonstrate this.

The issue with AVR’s

The problem is that AVR’s don’t do this entropy pool-to-seed updating step. In fact, if a developer doesn’t manually reseed then it’ll just go round and round ad infinitum.

“But this is just a $1 IC? What did you expect?”

Well, I expected literally this! The issue is not with the way AVR’s are, but that we are seeing a greater emphasis placed on encryption and protecting IoT devices, embedded systems, etc.

Being cheap IC’s, they are very attractive to developers for use in a variety of projects. With new standards, like LoRaWAN, having cryptography turned on by default means that having a distinct lack of random numbers and randomness is a potential bag of angry wasps.

PoC||GTFO

My fave mantra from the ezines of the same name (well worth reading if this kinda stuff is your kinda shindig) here is a link to my test code gist on github for you to play with:

[NB — it has been raised that I’m not using the RAND_MAX in the avr-libc. This is because Arduino IDE isn’t using it. Compare the output directly to the same code on an arduino, which I’m happy to share, and the outputs are the same. This is a separate bug from the one I’m concerned with (one which, if action is taken, won’t matter anyway…)]

Compile this with the following line:

$> gcc -o test test.c

Run it like this:

$> ./test 1

Using this example, you can see two main things:

  1. On a non-AVR x86 based CPU in a fairly standard laptop, going through the entire PRNG space takes minutes (as opposed to 2 days on a real arduino… which I have actually done to check how long it took!)
  2. Breaking the arduino PRNG is trivially easy, even with reseeding.

PoC Explanation:

This PoC (proof of concept, if you’re wondering) generates all the values of the LCG in order, and filters out the values ending in ‘0000’, as this is easier to see what is going on.

To show that the whole system is cyclic, here is a screenshot of a simple bash script to show that the same numbers ending in ‘0000’ appear on any input seed, and also is followed by the same output ending in ‘0000’.

Showing the cyclic (merry-go-round) nature of the LCG

Here are some approx. timings on my x86 machine:

It doesn’t take long to go through the whole PRNG space on a ‘real’ computer…

As such, it is likely clear that the libc LCG in play is very flawed, and should be replaced.

Remediation Proposal

Following LibSodium’s lead, I would suggest an implementation of Salsa20, or as used in LibHydrogen, the Gimli permutation generator, with appropriate entropy seeding from a mixture of sources (see my other article here).

This should give us our desired 128bit security with decent seed entropy pool.

Conclusions

If you take away only a few things from this, here are the salient points:

  1. Don’t use cheap IC’s for anything serious (e.g. Personal data, control/decision critical data, etc. — i.e. anything you might want to add crypto to) Unless you really know what you’re doing.
  2. Seeding with randomSeed() once is not enough. It doesn’t go any way towards remediation.
  3. Choose your IC’s carefully — adding in decent randomness or crypto later is very hard. As with all development if something is hard to apply or fix later, like crypto, then make the right decisions earlier!

I hope this has been interesting. :-D

--

--

Mark C.

Postgrad. Reseacher/Consultant; Maths/infosec. Formerly a violinist & magician. Jazz thinker w/out portfolio. Weakly inaccessible... Views are solely my own.