Python Algorithms: Remove Duplicates

Recently, I decided to join my University’s coding team. Frankly, I haven’t coded since working for Intel on Venture team and I needed a refresher.

Since your here lets go along for the coding ride. The problem of the day is: Remove duplicate numbers from a list.

def removeDups(lst):
newLst = []
for i in lst:
if i not in newLst:
return newLst

This code works but it is slow as fuck. The time complexity is O(n²) due to us going through two lists. On the for i in lst line that is one O(n) which has another loop nested inside on the if i not in newLst which is another O(n). This brings the complexity to O(n²).

def removeDups(lst):
S = set()
for i in lst:
if i not in S:
return S

This solution however has the time complexity of O(n). The major reason for that is instead of using a new List for the removal of duplicate numbers we used a Set. A set uses a hash-table as its underlying data structure. This explains the O(1) membership checking, since looking up an item in a hash-table is an O(1) operation, on average. So now we only have the O(n) from the for i in lst line.

Congratulations we now know how to remove duplicate numbers from a list!