The Angry Birds — Chuck

In-spite of being one of the angry birds. Chuck proved it’s substantial speed.

Omar Elgabry
OmarElgabry's Blog
9 min readJan 19, 2017

--

Chuck— angrybirds

This series of articles inspired by Algorithms, Part I course from Princeton University & Algorithms, 4th Edition. The full series of articles can be found here.

Quicksort

The Quicksort falls under the divide and conquer algorithms. It works by dividing the array into two halves recursively by partitioning; Choosing an element K at index j(usually the first element), and reorder the array so that all values less than or equal arr[j] come before it, while all values greater than or equal arr[j] come after it. Then sort each half independently.

Quicksort Overview — algs4.cs.princeton.edu

Implementation

— sort

It does three operations recursively; partition the array, sort left half array[lo…j-1], sort right half array[j+1…hi].

An array of 1 element is considered sorted.

public void sort(Comparable[] a, int lo, int hi) {
if (hi <= lo) return; // base condition

int j = partition(a, lo, hi);

sort(a, lo, j-1);
sort(a, j+1, hi);
}

Initially, you need to shuffle the array before sorting, and send it as a parameter to the sort method, along with the first and last indexes.

Later on, you’ll see why shuffling is required to guarantee performance.

public static void sort(Comparable[] a) {
shuffleArray(a);
sort(a, 0, a.length - 1);
}

— partition

Given an array. Partition the array using the first element as the partitioning element. So, the partition method does the following:

  1. Starting with three pointers; v = arr[lo], i = lo, and j = hi +1. The element v is the partitioning element.
  2. Loop from index i to the right as long as a[i] is less than v, and then loop from index j to the left as long as a[j] is greater than v.
  3. If pointers i and j overlap, or if they are equal, this means we are done with partitioning the array. If not, this means a[i] and a[j] aren’t in the correct positions in the final partitioned array, so we need to swap them.
  4. Repeat the steps 2 through 3 until the condition in step 3 is true.
  5. Finally, swap the partitioning item v at index lo with a[j] and return j. Now, a[lo .. j-1] <= a[j] <= a[j+1 .. hi].
Quicksort Partitioning Overview — algs4.cs.princeton.edu
public int partition(Comparable[] a, int lo, int hi) {
int i = lo;
int j = hi + 1;
Comparable v = a[lo];
while (true) {
// scan from left until we find an item >= v
while (less(a[++i], v))
if (i == hi) break;

// scan from right until we find an item <= v
while (less(v, a[--j]))
if (j == lo) break;

// check if pointers cross
if (i >= j) break;

swap(a, i, j);
}

// put partitioning item v at a[j]
swap(a, lo, j);

// now, a[lo .. j-1] <= a[j] <= a[j+1 .. hi]
return j;
}

Implementation details

There are some details in the implementation that worth mentioning.

  1. In the loop of pointer i(and j), we increment i(or decrement j) first, then do the comparison.
  2. Whenever a[i](or a[j]) equals to v, we terminate the loop.
  3. In the loop of pointer i(or j), we check if they are out of the boundaries (i.e. i == hi, or, j == lo). But, the condition if(j == lo) is redundant, Why? Because it won’t be executed when j equals to lo, as the condition in the loop will be false.
while (less(v, a[--j]))
if (j == lo) break; // redundant if j = lo, then v <= a[--j]

Trace

Besides from the code implementation. It’s important to visualize how Quicksort works, and understand the partitioning.

Trace Quicksort — quora

Quicksort Vs Mergesort Implementation

The difference is, Quicksort does the work(partitioning), then sorts the left and right halves of partitioning item, while Mergesort sorts the two halves then merge them into one sorted array.

Quicksort Vs Mergesort — ezyang

Analysis

Time → It’s more complicated to compute than Mergesort because it depends on the size of subarrays produced in the partitioning.

  • It takes ~ (1/2 N²) compares in the worst case, if the partitioning will produce the most unbalanced subarrays in size. This happens when the array is un-shuffled array (i.e. an already sorted array).
  • It takes ~ (N LogN) compares for N distinct values in the best case, if every partition will produce balanced subarrays; differ in size by at most 1.
  • It takes ~ (2 * NLogN) compares, and ~ (1/3 * NLogN) swaps of N distinct values in the average case. So, more compares than Mergesort, but, It’s faster because of the data movement in Mergesort(copying values in and out of the aux[] array).

Shuffling is needed before running Quicksort algorithm for performance guarantee. So, the random shuffle is the key to performance, and it guarantees that the worst case won’t happen. How? Shuffling ensures partitioning will produce subarrays as balanced as possible.

Here is a proof of the worst-case expected (probabilistic) running time of Quicksort using a randomly shuffled array.

Quicksort Worst-case Vs Best-case Analysis — khanacademy

Memory →The sort is done in-place, unlike the Mergesort.

  • Partitioning: Constant extra space.
  • Depth of recursion: It can guarantee complexity of O(LogN), even in the worst case, by doing recursion on the smaller subarrays then larger ones. But, this is not necessary as long as you’ve done the random shuffle.

Improvements

Could we even improve the algorithm we’ve seen for Quicksort? And the answer is Yes.

1. Use Insertion Sort For Small Arrays

As with Mergesort, Quicksort has too much overhead for tiny arrays or small sub arrays because of the recursive calls. So, you can use Insertion sort whenever the array size is small(≤~ 10).

if (hi <= lo + CUTOFF - 1) {     // Is the array size <= CUTOFF(~10)
insertionSort(array, lo, hi);
return;
}

2. Median-of-three Partitioning

Use the median of a small sample of items(≤~40) as the partitioning item. Doing so will give a slightly better partition, but you’ll pay the cost of computing the median.

So, estimate the median before calling partition() method. Then, swap it with the first element(the next chosen partitioning element).

if (hi <= lo + CUTOFF - 1) {    // CUTOFF for small subarrays(~40)
int m = medianOf3(a, lo, lo + (hi — lo) / 2, hi);
swap(a, lo, m);

}

It slightly decreases the number of compares, and slightly increases the number of exchanges, as more exchanges are required when the partitioning element is in the middle, but, improves the overall running time.

Quicksort (3-way partitioning)

The Quicksort algorithm we’ve seen was called “2-way partitioning”. There is another one called “3-way partitioning”, which is more suited for arrays with many duplicates.

In Mergesort, It doesn’t matter what the keys are like, while it does matter in Quicksort.

Solving The Problem of Duplicates

  1. Bad: Put all equal items on either side of the partitioning item v. It takes 1/2 N² compares when all the keys are equal.
public int partition(Comparable[] a, int lo, int hi) {
// ...
while (true) {
// scan from left until we find an item > v
while (lessOrEqual(a[++i], v))
if (i == hi) break;

// scan from right until we find an item <= v
while (less(v, a[--j]))
if (j == lo) break;

// ...
}

// ...
return j;
}

2. Good: Stop scans on equal keys.

Stop whenever a[i] or a[j] equals to v — (See “Implementation details” above). It takes ~ (NLogN) compares even if all the keys are equal. This is recommended but there is a better solution (see next).

3. Better: Put all equal keys in one place.

The problem of duplicates can be solved by putting all equal items in one place. By this way, equal values will be next to each other. This is what Quicksort “3-way partition” tries to do.

Solving The Problem of Duplicates — algs4.cs.princeton.edu

Implementation

— sort

It does three operations recursively; partition the array so that the left is less than the partitioning item, the middle is equal, the right is greater than. Then, sort left half, sort right half, ignoring the middle part.

Partitioning maintains 3 pointers; i, lt, and gt, such that a[lo...lt-1] is less than v, a[gt+1...hi] is greater than v, a[lt...i-1] is equal to v, and a[i...gt] isn’t examined yet.

Quicksort 3-way partitioning — algs4.cs.princeton.edu
public void sort(Comparable[] a, int lo, int hi) {
if (hi <= lo) return;

// partition
int lt = lo, gt = hi;
Comparable v = a[lo];
int i = lo;
while (i <= gt) {
int cmp = a[i].compareTo(v);
if (cmp < 0) swap(a, lt++, i++);
else if (cmp > 0) swap(a, i, gt--);
else i++;
}

// a[lo..lt-1] < v = a[lt..gt] < a[gt+1..hi].
sort(a, lo, lt-1);
sort(a, gt+1, hi);
}

Analysis

Time → The complexity depends on the existence of duplicates.

  • It takes linear time O(N) in the best case, only if a constant number of distinct items in the array.
array = [A,B,B,A,A,A,C,C,A,B,A,A];   // only 3 distinct characters.
  • Otherwise, it will behave the same way as “2-way partitioning” Quicksort.

The bottom line is, Quicksort 2-way partitioning algorithm will sort the duplicates, but the 3-way partitioning is better when there are many duplicates.

Quickselect

Given an array, find smallest or largest value, we can do a linear time to get the answer.

What if we want to get the k-th smallest element(the value at arr[k] when the array is sorted), but the array is not sorted. So, the question is, Can we get the value at arr[k] without fully sorting the array?.

There is an an algorithm to find the k-th smallest element in an array. It’s based on the idea of partitioning in Quicksort.

Implementation

We will partition the array(same way we did with Quicksort), and use one of it’s halves, and ignore the other.

public void select(Comparable[] a, int k) {
// ensure array is shuffled to guarantee worst case won't happen
shuffleArray(a);

// find the kth smallest element
int lo = 0, hi = a.length - 1;
while (hi > lo) {
int j = partition(a, lo, hi); // partition
if (j < k) lo = j + 1; // ignore left half
else if (j > k) hi = j - 1; // ignore right half
else return a[k]; // found!
}
return a[k];
}

Analysis

Time →-Like Quicksort, the Quickselect is sensitive to the partitioning item that’s chosen; how much the partitioning item decreases the search set?

  • It takes a linear time in best and average cases, if the partitioning item divides the array approximately into two equal halves. We use one of them, and ignore the other. So, total cost = N + N/2 + N/4 + ... + 1 ~ 2N.
  • It takes ~ (1/2 N²) in the worst case, If bad partitioning items are consistently chosen, such as decreasing the search set only by 1.This happens when the array is un-shuffled array. But with random shuffle, this worst case hard to reach.

Memory → It does the selection in-place, doesn’t need extra array.

--

--

Omar Elgabry
OmarElgabry's Blog

Software Engineer. Going to the moon 🌑. When I die, turn my blog into a story. @https://www.linkedin.com/in/omarelgabry