Reservoir Sampling

Esha Wang
Human in a Machine World
1 min readMar 17, 2016

Suppose we are given a set S of n items, where n is either very large or inputed as a constant stream of items. The goal of reservoir sampling is to randomly choose k items from the set S, so that the probability that some item i in S is in the “reservoir” of k items is given by k/n. Each item in S can appear only once in k.

Method 1: Quadratic Solution

The simple solution is to randomly pick an item from S, and check if that item is already in the reservoir k. If not, add that item to k. Since checking if an item is already in k takes linear time, and there are k items to add to the reservoir, this results in a time complexity of O(k^2). As all quadratic algorithms go, this method can be computationally cumbersome, especially as k becomes very large.

Method 2: Linear Solution

Luckily, there exists a much faster solution with a time complexity of O(n). The idea is as follows:

  • Initialize the reservoir k to be the first k items in S.
  • Iterate through the remaining items in S, keeping track of the current index, X.
  • Generate a random index x in [0, X]. If x is in [0, k-1], then set k[x] = S[X].

Thanks for reading! If you find an error in this post, please feel free to comment below and I will do my best to address any issues. Many thanks to GeeksForGeeks for giving me the inspiration to write a post on this topic.

--

--