Thanks for the response! Could you clafiry, from where do you see the independence violations coming?

Suppose we have dataset *D = {d_i, i = 0 … N*} where *I* is an index set. If we permute *D *taking uniform random samples *s *without replacements from *I *we will get a new index set *I_p = {i_{p_1}, i_{p_2}, …, i_{p_N} | i\in I, p ~ Uniform(0, N)}.*

Elements from *D* sequentially taken with indexes from *I_p*

- Have the same distribution as
*D*since we hanve not changed the dataset in any way - Are independent because each premuted index from
*I_p*was taken from uniform distribution

If we will shuffle the array each time before sampling this won’t make the samples more random and independent than they already are. Almost all speedup in coming from the fact that we are generating pseudo ramdom sequence of indices beforehand and do not regenerate it when it’s unnecesary.

Under the hood Numpy uses function `rk_interval`

when shuffling an array: this one just returns a pseudorandom number in given interval. This means that you can also simply get next element index for sampling from *D *as *i ~ Uniform(0, N)*. This way is less efficient than shuffling the whole array in memory in advance.