Algorithms Exercise: Find the First Duplicate in an Array

Today’s algorithm comes from code fights. It’s a challenge called firstDuplicate and it goes like this:

Given an array a that contains only numbers in the range from 1 to a.length, find the first duplicate number for which the second occurrence has the minimal index. In other words, if there are more than 1 duplicated numbers, return the number for which the second occurrence has a smaller index than the second occurrence of the other number does. If there are no such elements, return -1.

We’re also given the constraints that our solution should be O(n) time and O(1) space.

A first attempt

Every statement in a problem like this typically has some meaning. So the knowledge that our input array a “contains only numbers in the range from 1 to a.length” is a clue. Positive integers, that can’t exceed the size of the array. Clearly, we’re expected to use the array values as array indexes somehow.

One solution would be to walk the array, keeping track of whether we’ve seen each value in a seperate hash or array. That solution would look something like this:

def first_duplicate(a)
counts = []
a.each do |value|
return value if counts[value]
counts[value] = true
end
-1
end

The good news is that this solution 1) works and 2) is O(n) runtime (we only traverse the array once). Unfortunately, it’s also O(n) space so we know it could be more efficient. Let’s try a slightly different approach.

Refining the algorithm

Up until now, we’ve assumed that knowledge about whether a value has been “seen” by our pointer has to be stored in some other variable. We need the left most duplicate, so we can’t sort the array first which might make things easier.

Coming back to our first clue in the description of the challenge, we know that all our input numbers are positive. So, we actually have an extra piece of data in the original array we can play with: it’s sign. We can walk the array, looking up the value at a[value] for each one, and changing it to it’s negative counterpart. If, when we examine a value, we find it’s already negative, we know that we’ve already seen it and it’s therefore a duplicate.

So, let’s try again:

def first_duplicate(a)
a.each do |value|
return value.abs if a[value.abs - 1] < 0
a[value.abs - 1] = -a[value.abs - 1]
end
-1
end

Performance and memory optimized!

Sprinkle some design on it

In real life, if we needed this algorithm we’d probably want a better interface. A more Ruby way of implementing this might be to monkey patch the Array class itself:

class Array
def first_duplicate
# implementation
end
end

Then we can call a.first_duplicate instead. Note that if you’re going to do this, you’ll want to use something like the first implementation of the code, since the more memory efficient version is destructive.

But monkey patching in this way is potentially dangerous. A different approach we could take is to create a module and include that in any classes we want to be able to use it.

module Duplicable
def first_duplicate
# implementation
end
end
# now include it in other classes
class Array
include Duplicable
end

This approach has the advantage of 1) containing our logic in its own module and 2) making it reusable. We could include this module in as many classes as we want, as long as they quack each and [].

Any other approaches?

How would you solve this? Let me know if I missed anything or how you might take it in a different direction.