Solving Algorithmic Problems: Find a Duplicate in an Array

This post is part of a series on how to solve algorithmic problems. From my personal experience, I found that most of the resources were just detailing solutions. Yet, it was not very common to actually understand the underlying line of thought allowing to reach an efficient solution. Thereby, this is the goal of this series: describing potential processes of reflection to solve problems from scratch.


Problem

  • Find a duplicate in an array
Given an array of n + 1 integers between 1 and n, find one of the duplicates.
If there are multiple possible answers, return one of the duplicates.
If there is no duplicate, return -1.
Example:
Input: [1, 2, 2, 3]
Output: 2
Input: [3, 4, 1, 4, 1]
Output: 4 or 1

Category: Arrays

Solving Process

Before to see the solutions, let’s talk a bit about the problem. We get an array of n + 1 element with integers between 1 and n.

For example, with an array of 5 integers, it implies that each integer will have a value between 1 and 4 (included). This means there will automatically be at least one duplicate.

The only exception is with an array of size 1. This is the only case where we should return a -1.

Brute Force

The brute force solution is to implement two nested loops this way:

for i = 0; i < size(a); i++ {
for j = i+1; j < size(a); j++ {
if(a[i] == a[j]) return a[i]
}
}

O(n²) time complexity and O(1) space complexity.

Count Iterations

Another solution is to have a data structure to count the number of iterations of each integer. This solution could be implemented either with an array or a hash table.

An implementation in Java:

The value of index i represent the number of iterations of i+1.

This solution is O(n) time but O(n) space as we need an extra structure.

Sorted Array

If we apply the simplification technique, we could try to find a solution with a sorted array.

In this case, we would just have to compare each element with its right neighbor.

In Java:

O(1) in space but O(n log(n)) in time as we need to sort the collection up front.

Sum of the Elements

A direction we may think about is to sum the elements of the array and to compare it with 1 + 2 + … + n.

Let’s see an example:

Input: [1, 4, 3, 3, 2, 5]
Sum = 18
As in this example, we have n = 5:
Sum of 1 to 5 = 1 + 2 + 3 + 4 + 5 = 15
=> 18 - 15 = 3 so 3 is the duplicate

In this example, we are able to find the solution in O(n) time and O(1) space. Yet, it is working solely in the case where we have one duplicate.

A counter-example:

Input: [1, 2, 3, 2, 3, 4]
Sum = 15
As in this example we have n = 5,
Sum of 1 to 5 = 1 + 2 + 3 + 4 + 5 = 15
/!\ Not working

This direction is going nowhere. But sometimes we need to try stuff to find the most optimal solution.

Marker

There is something interesting to mention. So far, our solutions are not really making use of the fact that each integer is between 1 and n. Due to this interesting constraint, each value has its own corresponding index in the array.

The solution is to consider this particular array as a sort of linked list. Any index is pointing to the value of that index.

Example with [2, 3, 3, 1]

We iterate over each element and mark its corresponding index by setting its sign to minus. If we already marked it as negative, it means its index is a duplicate.

Let’s see a concrete step by step example:

Input: [2, 3, 3, 1]
* Iteration 0:
Absolute value = 2
Put a minus sign to a[2] => [2, 3, -3, 1]
* Iteration 1:
Absolute value = 3
Put a minus sign to a[3] => [2, 3, -3, -1]
* Iteration 2:
Absolute value = 3
As a[3] is already negative, it means 3 is a duplicate

In Java:

This solution is O(n) time and O(1) space. Yet, it requires to mutate the input list.

Runner Technique

There is another solution which also considers the given array as a sort of linked list (again, this is possible because of the constraint that each integer is between 1 and n).

Let’s analyze the example [1, 2, 3, 4, 2]:

Example with [1, 2, 3, 4, 2]

With this representation, we can simply say that a duplicate exists when a cycle does exist. Moreover, the duplicate is the entry point of the cycle (in this case, the second element).

If we take inspiration from Floyd’s cycle-finding algorithm, we can derive the following algorithm:

  • Initiate two pointers slow and fast
  • For each step, slow is moving one step at a time with slow = a[slow], whereas fast is moving two steps at a time with fast = a[a[fast]]
  • When slow == fast, we are in a cycle

Is the algorithm completed? Not yet. The entry point of this cycle will be the duplicate. We have to reset slow and move both pointers step by step until they are equals again.

A possible implementation in Java:

This solution is O(n) time and O(1) space and does not require to mutate the input list.


Further reading