Day 54: Longest Unique Sequence
Today I’d like to implement an algorithm and keep a track of the process. It may seem to be out of scope of the series, but implementation of algorithm is an incremental process, and often a long one.
A good showcase is somewhat typical task you may receive at a job interview.
In a sequence find the longest unique consecutive subsequence, i.e. sequence consisting of unique items. Make the algorithm run in O(n) time in worst case.
The text often offers a hints in form of restrictions that tell you what to do.
- O(1) time — there’s a pattern; take pen a pencil and find it; solution can be found in a constant number of steps; no cycles needed
- O(log n) time — there’s a recursive pattern; you can eliminate half of input at each step; use single cycle
- O(n) time — you have to cycle through; constants number of cycles can be used, but no nested cycles are allowed (not entirely true, there are exceptions, but rare)
- O(n.log n) time — efficient solution requires sorting or sorted auxiliary structure
- O(n²) time — an auxiliary table has to be built
- O(1) memory — no structures, use as many variables as you wish, but any complex structure must be limited by constant
- O(log n) memory — there is a pattern in bit representation of input that can be captured and applied
- O(n) memory — auxiliary array is needed, maybe table; if it is a table all but one dimension must be constant
- O(n²) memory — auxiliary table is required; time can never be below O(n²) in this case (and you rarely see this one on an interview)
version #1
Somewhat obvious solution is to cycle through the sequence and iteratively shrink and expand sliding window over unique sequences. I will keep track of the items using set
to comply O(n) time.
def longest_unique_sequence(sequence):
i, j, k = 0, 0, set()
bi, bj = 0, 0
while j < len(sequence):
if sequence[j] in k:
k.remove(sequence[i])
i += 1
else:
k.add(sequence[j])
j += 1 if j - i > bj - bi:
bi, bj = i, j return bi, bj
That’s a pretty solid solution … if you are Java or C# developer. In Python I would consider it poor at best.
version #2
OMG, why did I use set
? Because I was expected to use it? The linearity is disputable and I fell into trap of premature optimization. Get rid of it.
def longest_unique_sequence(sequence):
i, j = 0, 0
bi, bj = 0, 0
while j < len(sequence):
if sequence[j] in sequence[i:j]:
i += 1
else:
j += 1
if j - i > bj - bi:
bi, bj = i, j
return bi, bj
Slightly shorter, not pythonic though. Python can be highly expressive and I’m not using it.
version #3
def longest_unique_sequence(sequence):
i, j = 0, 0
b = 0, 0, 0
while j < len(sequence):
k = sequence[j] in sequence[i:j]
i, j = i + k, j + 1 - k
b = max(b, (j - i, i, j)) return slice(b[1], b[2])
That’s … very expressive. Never mind, implementation is a process, any step [even in a wrong direction] is still a good step giving a lesson.
version #4
Let’s return to version #2 and ask if we need all the pointers and all the conditions?
def longest_unique_sequence(sequence):
i, b = 0, ''
while i < len(sequence):
if sequence[i] in sequence[:i]:
i -= 1
sequence = sequence[1:]
else:
i += 1
b = max(b, sequence[:i], key=len) return b
See where I am heading? No? Me neither. Just don’t give up.
version #5
Here’s an idea. What I particularly don’t like about all the previous attempts is searching for the best solution. We shouldn’t be reinventing the wheel. If it’s in Python, do not implement it.
def longest_unique_sequence(sequence):
i, j = 0, 0 while j < len(sequence):
if sequence[j] in sequence[i:j]:
i += 1
else:
j += 1
yield sequence[i:j]
# let python do the job
max(longest_unique_sequence(text), key=len)
And I like what I see. Why should I be implementing another max
if it’s already there?
version #6
Getting back to the version #1 and applying gathered experience.
def longest_unique_sequence(sequence):
i, k = 0, {} for j, x in enumerate(sequence):
i = max(i, k.get(x, 0))
k[x] = j + 1
yield sequence[i:j + 1]
# let python do the job
max(longest_unique_sequence(text), key=len)
This might be a pretty solid pythonic solution. The lower index is kept in dict
and higher index is enumerated. This is an idiomatic way.
version #7
Anyways, I still don’t like using set
or dict
and don’t think I need one. Once again, I am going to get rid of it.
def longest_unique_sequence(sequence, best=''):
for x in sequence:
best = best[best.find(x) + 1:] + x
yield best
# let python do the job
max(longest_unique_sequence(text), key=len)
This version is (a) idiomatic, (b) the only one that works with iterators. And I love it.
But what about O(n) worst case requirement? Well, that’s a lie.
Version #6 looks like O(n) and almost always runs in O(n), but its worst case is O(n*a) where a
is size of alphabet (all items that can occur). It can run at worst in O(n) if you use non-deterministic hashing. Java and C# do not have it and I assume Python neither.
Version #7 runs in O(n*a) and unless you expect troubles, you really don’t need advanced structures. Do not make the best solution. Do the solution you need.
Is your solution better than mine? Feel free to share with me.