Remove duplicates from sorted array

Suyash Namdeo
EnjoyAlgorithms
Published in
6 min readMay 16, 2021

Let’s understand the problem

We are given a sorted array. Our task is to remove the duplicate elements such that there is a single occurrence of each element in the array.

  • After moving unique elements to the front, we need to return the length of the array containing unique elements. In other words: there is no need to delete the unnecessary elements at the end after removing the duplicates. It doesn’t matter what you leave beyond the new length.
  • The Input array is passed in by reference, which means a modification to the input array will be known to the caller.
  • The expectation is to solve this problem in-place.
Input: X[] = [2, 2, 2, 2, 2, 3, 3, 3]
Output: 2, X[] = [2, 3]
Explanation: Our function should return length = 2, with the
first two elements of X[] being 2 and 3 respectively.
It doesn't matter what we leave beyond the returned length.
Input: X[] = [1, 2, 2, 3, 4, 4, 4, 5, 5]
Output: 5, X[] = [1, 2, 3, 4, 5]
Explanation: Our function should return length = 5, with the
first five elements of X[] being 1, 2, 3, 4 and 5 respectively.
It doesn't matter what we leave beyond the returned length.

Important Note: Before moving to the solutions, we recommend learners solve this problem. If solved, then well done! We would like to hear your ideas in the comment. Otherwise, no problem, this is an opportunity to learn a new pattern in problem-solving. Think and Enjoy :)

Discussed Solution Approaches

  1. Brute force approach using extra memory
  2. Efficient approach using two pointers
  3. Efficient approach by counting the duplicate elements

1. Brute force approach using extra memory

Algorithm Idea and Steps

One idea would be to store the unique elements into the extra memory and copy the unique elements to the original array. By the end of this process, we return the new length of the input array.

  1. We declare extra array unique[n] to store unique elements.
  2. Now we scan the input array and copy unique elements of X[] to unique[]. We will keep track of the unique elements count using the variable count.
  3. Now we copy unique elements from unique[] to X[].
  4. Finally we return the value of count.

Programming Pseudo code

int removeDuplicates(int X[], int n) 
{
if (n == 0 || n == 1)
return n
int unique[n]
int count = 0
for (int i = 1; i < n; i = i + 1)
{
if (X[i] != X[i-1])
{
unique[count] = X[i]
count = count + 1
}
}
for (int i = 0; i < count; count = count + 1)
X[i] = unique[i]
return count
}

Time and Space Complexity Analysis

We are linearly traversing the array twice. So Time Complexity = Traversing the array to store the unique elements + Traversing the array to copy the unique elements = O(n) + O (count) = O(n). The value of count is equal to n in the worst case. (Think!)

Space Complexity = O(n) for the extra array unique[n].

2. Efficient approach using two pointers

Algorithm Idea

Can we use the sorted array property to solve this problem in-place? Let’s think.

In the sorted array, all duplicate elements will be placed adjacent to each other. So, we can easily track the unique elements and place it into the front positions of the array. For this purpose, we use one fast pointer to scan the array and the slow pointer to track the position of the unique elements.

Algorithm Steps

  • We initialise the slow and fast pointers i.e. slow = 0, fast = 1
  • Now scan the input array till fast < n. Whenever we find a duplicate element i.e X[fast] == X[slow], then we skip the duplicate and increment the fast by 1. But when we encounter a unique element i.e. X[fast] != X[slow], then we increment the slow by 1 and copy X[fast] to X[slow]. We also increment the fast by 1.
  • We repeat the above process until fast reaches the end of the array. Finally, we return the count of the unique element which is slow + 1. (Think!)

Programming Pseudo Code

int removeDuplicates(int X[], int n) 
{
if (n == 0)
return 0
int slow = 0
int fast = 1
while(fast < n)
{
if (X[fast] != X[slow])
{
slow = slow + 1
X[slow] = X[fast]
}
fast = fast + 1
}
return slow + 1
}

Time and Space Complexity Analysis

We are doing a single scan of the array, so time complexity = O(n). We are just using two extra variables, Space complexity = O(1).

3. Efficient approach by counting the duplicate elements

Algorithm Idea and Steps

Here is another idea to solve the problem by counting the duplicate in the array.

  • We scan the array and track the duplicate element count using the variable duplicate_count.
  • When we found a duplicate i.e. X[i] == X[i-1], then we increase the variable duplicate_count by one.
  • Otherwise, we find the first appearance of the unique element and we place the unique element X[i] at the front i.e. at index i — duplicate_count.
  • Finally, we return the unique elements count i.e. n — duplicate_count. (Think!)

Programming Pseudo Code

int removeDuplicates(int X[], int n) 
{
int duplicate_count = 0
for(int i = 1; i < n; i = i + 1)
{
if(X[i] == X[i-1])
duplicate_count = duplicate_count + 1
else
X[i - duplicate_count] = X[i]
}
return n - duplicate_count
}

Important Note: We recommend learners transform the above pseudo codes into a favorite programming language (C, C++, Java, Python, etc.) and verify all the test cases. Please let us know if you find any errors or bugs; we would be highly grateful. Enjoy programming!

Time and Space Complexity Analysis

We are doing a single scan of the array, so time complexity = O(n). Space complexity = O(1), we are just using one extra variable.

Possible questions by the Interviewer: Ideas to Think!

  • What would be the best and worst case scenario of all three approaches?
  • Will the above solutions work when there are no duplicate elements at all?
  • Inside the while loop of the 2nd approach, if (X[fast] != X[slow]), why are we incrementing the slow pointer before copying the value? Similarly in the 3rd approach, if (X[i] == X[i-1]), why are we copying X[i] at X[ i — duplicate_count]?
  • Can we solve this problem using BST or other data structures?
  • To solve this problem, can we use an idea similar to quicksort partition process?
  • Why two pointers approach works perfectly in the case of a sorted array?
  • How we approach this problem if we have an unsorted array in the input?

Comparisons of Time and Space Complexities

  • Brute force approach: Time Complexity = O(n), Space Complexity = O(n)
  • Efficient approach using two pointers: Time Complexity = O(n), Space Complexity = O(1)
  • Efficient approach by counting the duplicates: Time Complexity = O(n), Space Complexity = O(1)

Key takeaway from this coding problem

  • A good coding problem to learn in-place problem solving using two pointers.
  • We can use a similar approach of fast and slow pointer to solve other coding problems.
  • A good coding problem for beginners to start problem solving.

Similar coding questions to practice

You can find some of these problems on leetcode or other coding platforms. Explore and Practice!

  • Sort an array of 0s, 1s and 2s
  • Move zeroes to an end
  • Sort an array in a wave form
  • Find duplicates in an Array with values 1 to N
  • Valid Mountain in an array
  • Find the frequencies of all duplicates elements in the array
  • Remove the duplicates such that duplicates appeared at most twice

Reviewer: Shubham Gautam (https://shubhamsuper30.medium.com)

If you have any ideas/queries/doubts/feedback, please comment below or write us at contact@enjoyalgorithms.com. Enjoy learning, Enjoy coding, Enjoy algorithms!

--

--