Solve String Problems on LeetCode

Li Yin
Algorithms and Coding Interviews
9 min readMar 2, 2018

Steps:

  1. Do we need pre-processing?
  2. Is it a difficult question? Yes, Can be re-construct it to another type of question?
  3. How to solve it with brute force? Not good enough? Can try minor improvement, early stop or trim the searching condition.
  4. Still not good, use better and more clever algorithms.

Note: for easy problem, use your intuition. You may find it is especially helpful. First,allow your brain to search for known methods. If it fits, check how specially make it work for this problem. If it does not fit the problem. Then, go loose on your intuition.

1. One String

The first type is to do operations that meet certain requirements on a single string. Including palindrome ( sequence of characters read the same forward and backward), anagram ( An anagram is a word or phrase formed by rearranging the letters of a different word or phrase), valid number (difficult with a of of cases to consider), Valid Parentheses

e.g. 125. Valid Palindrome, 65. Valid Number, 20. Valid Parentheses (use a stack to save left parenthe), e.g. 214. Shortest Palindrome (KMP lookup table), 5. Longest Palindromic Substring

For hard problem, reconstruct the problem to another so that it can be resolved by an algorithm that you know.

E.g. 214 Shortest Palindrome , KMP lookup table,

for example s=abba, constructed S = abba#abba),

Possible methods: two pointers, one loop+two pointers

2. Two Strings

The problems can be generalized to find pattern in a string, you would be given two strings. (1) If we do not care the order of the letters in the pattern, then it is the best to use Sliding Window; (2) If we care the order matters (identical to pattern), we use KMP.

2.1 Pattern Searching Algorithms

(0) Brute force Pattern Matching

Time complexity O(M*N)

Brute Force
KMP use next array
using lookup table, j=lps[j-1]

The difference of brute force and KMP is, when the matching failed, KMP would keep the i in the string in the same position(brute force, go back i-j+1). The j pointer in the pattern would not go back to start position 0, and it goes to next[j]

(a) Knuth Morris Pratt (Exact Pattern Matching): The KMP matching algorithm uses degenerating property (pattern having same sub-patterns appearing more than once in the pattern) of the pattern and improves the worst case complexity to O(n). The basic idea behind KMP’s algorithm is: whenever we detect a mismatch (after some matches), we already know some of the characters in the text of next window. We take advantage of this information to avoid matching the characters that we know will anyway match. Let us consider below example to understand this. The core is to find the failure lookup table. The lookup table saves the prefix is the same as the suffix.

Left: the process to generate the failure look up table; Right: How to do the minimum length of palindrome and how to find the pattern.
# Generating lookup table for string S using Python
f=[0]*n
for i in xrange(1,n):
t=f[i-1]
while t>0 and S[i]!=S[t]:
t=f[t-1]
if S[i]==S[t]:
t+=1
f[i]=t

(b) Rabin-Karp algorithm(Exact or anagram Pattern Matching): used to find the exact pattern, because different anagram of string would have different hash value.

e.g. 686. Repeated String Match

(c) Sliding window algorithm(Exact or anagram Pattern Matching): used to find any anagram of the pattern inside the string. Here is a summary that sliding window algorithm can solve pretty much all the string pattern matching problems.

(d) longest common substring: Suppose we have string A and B, each with m and n chars. If we use brute force, then in A, there could be M² substring, and to locate these substring in B, we spend O(n) for each substring, which makes the total complexity to be O(n*M²). Note: it is the same if its two arrays

But now, if we use the LCS method, the time and space complexity will be O(n*m) . LCS is DP method that we can use bottom-up with memorization.

Pseudo-code of LCS
#init a[M+1][N+1] two dimensional matrix with [0]
a =[[0 for _ in xrange(N+1)] for _ in xrange(M+1)]
for i in [0,M):
for j in [0,N):
if A[i]==A[j]:
a[i+1][j+1]=a[i][j]+1
result = max(result,a[i+1][j+1])
else:
a[i+1][j+1] = 0

3. Preprocessing

map, filter, reduce, any

Using filter to get rid of irrelevant character.

filtered = map(lambda x: x.lower(), filter(lambda x: x.isalnum(),s)) #get only alpha and numeric and convert it to lower case
filtered=filter(lambda x:x.isalnum() or x=='.' or x=='-' or x==' ' or x=='+', s).lstrip().rstrip().lstrip('-').lstrip('+');
#any
a = any(x<0 for x in lst) #return True or False

4. Examples

Anagrams

438. Find All Anagrams in a String

Example 1:

Input:
s: "cbaebabacd" p: "abc"
Output:
[0, 6]
Explanation:
The substring with start index = 0 is "cba", which is an anagram of "abc".
The substring with start index = 6 is "bac", which is an anagram of "abc".

Python code with sliding window:

def findAnagrams(self, s, p):
"""
:type s: str
:type p: str
:rtype: List[int]
"""
if len(s)<len(p) or not s:
return []
#frequency table
table = {}
for c in p:
table[c]= table.get(c,0)+1

begin,end = 0,0
r = []
counter = len(table)
while end<len(s):
end_char = s[end]
if end_char in table.keys():
table[end_char]-=1
if table[end_char]==0:
counter-=1

#go to longer string, from A, AD,
end+=1

while counter==0: #we have the same char in the window, start to trim it
#save the best result, just to save the beigining
if end-begin == len(p):
r.append(begin)
#move the window forward
start_char = s[begin]
if start_char in table: #reverse the count
table[start_char]+=1
if table[start_char]==1: #only increase when it is 1
counter+=1

begin+=1
return r

Palindrome

LeetCode中关于回文串的其他的题目有 Palindrome Number 验证回文数字 Validate Palindrome 验证回文字符串 Palindrome Partitioning 拆分回文串Palindrome Partitioning II 拆分回文串之二 Longest Palindromic Substring 最长回文串。题目让我们在给定字符串s的前面加上最少个字符,使之变成回文串,那么我们来看题目中给的两个例子,最坏的情况下是s中没有相同的字符,那么最小需要添加字符的个数为s.size() — 1个,第一个例子的字符串包含一个回文串,只需再在前面添加一个字符即可,还有一点需要注意的是,前面添加的字符串都是从s的末尾开始,一位一位往前添加的,那么我们只需要知道从s末尾开始需要添加到前面的个数。这道题如果用brute force无法通过OJ,所以我们需要用一些比较巧妙的方法来解。这里我们用到了KMP算法,KMP算法是一种专门用来匹配字符串的高效的算法,具体方法可以参见这篇博文从头到尾彻底理解KMP

动态规划Dynamic Programming来解,根Palindrome Partitioning II 拆分回文串之二的解法很类似,我们维护一个二维数组dp,其中dp[i][j]表示字符串区间[i, j]是否为回文串,当i = j时,只有一个字符,肯定是回文串,如果i = j + 1,说明是相邻字符,此时需要判断s[i]是否等于s[j],如果i和j不相邻,即i — j >= 2时,除了判断s[i]和s[j]相等之外,dp[j + 1][i — 1]若为真,就是回文串,通过以上分析,可以写出递推式如下:

dp[i, j] = 1 if i == j

= s[i] == s[j] if j = i + 1

= s[i] == s[j] && dp[i + 1][j — 1] if j > i + 1

Manacher’s Algorithm 马拉车算法

125. Valid Palindrome

Given a string, determine if it is a palindrome, considering only alphanumeric characters and ignoring cases.

For example,
"A man, a plan, a canal: Panama" is a palindrome.
"race a car" is not a palindrome.

Note:
Have you consider that the string might be empty? This is a good question to ask during an interview.

For the purpose of this problem, we define empty string as valid palindrome.

Solution: use two pointers

def isPalindrome(self, s):
l, r = 0, len(s)-1
while l < r:
while l < r and not s[l].isalnum():
l += 1
while l <r and not s[r].isalnum():
r -= 1
if s[l].lower() != s[r].lower():
return False
l +=1; r -= 1
return True

680. Valid Palindrome II

Given a non-empty string s, you may delete at most one character. Judge whether you can make it a palindrome.

Example 1:

Input: "aba"
Output: True

Example 2:

Input: "abca"
Output: True
Explanation: You could delete the character 'c'.

Solution: use two pointers, one in the front, one at the end, if s[i]!=s[j], then we check if s[i+1:j] or s[i:j-1] is a palindrome or not.

def validPalindrome(self, s):
"""
:type s: str
:rtype: bool
"""
if not s:
return True

def is_pali_range(i, j):
return all(s[k] == s[j-k+i] for k in range(i, j))

i, j=0, len(s)-1 #like a binary search
while i<=j:
if s[i]==s[j]:
i+=1
j-=1
else:
return is_pali_range(i+1, j) or is_pali_range(i, j-1)
return True

214. Shortest Palindrome

Given a string S, you are allowed to convert it to a palindrome by adding characters in front of it. Find and return the shortest palindrome you can find by performing this transformation.

For example:

Given "aacecaaa", return "aaacecaaa".

Given "abcd", return "dcbabcd".

Solution: reform the problem to be s+’#’+s[::-1], we find the longest prefex sufrex table, result = s[::-1][len(s)-lps[-1]]+s

def shortestPalindrome(self, s):
"""
:type s: str
:rtype: str
"""
r=s[::-1]
s=s+'#'+r
lps=[0]*len(s)
for i in range(1,len(s)):
t=lps[i-1]
if t>0 and s[i]!=s[t]:
t=lps[t-1]
if s[i]==s[t]:
t+=1
lps[i]=t
return r[:len(r)-lps[-1]]+s[:len(r)]

409. Longest Palindrome

Given a string which consists of lowercase or uppercase letters, find the length of the longest palindromes that can be built with those letters.

This is case sensitive, for example "Aa" is not considered a palindrome here.

Note:
Assume the length of given string will not exceed 1,010.

Example:

Input:
"abccccdd"
Output:
7
Explanation:
One longest palindrome that can be built is "dccaccd", whose length is 7.

Solution: we can take maximum one single string in the middle, the others need to be even.

from collections import defaultdict
class Solution:
def longestPalindrome(self, s):
"""
:type s: str
:rtype: int
"""
dict=defaultdict(int)
for c in s:
dict[c]+=1

length = 0
bSingle = False

for v in dict.values():
if not bSingle and (v==1 or v%2==1):
length+=1
bSingle=True
length+=(v//2)*2
return length

The following counting palindrome we can use DP+iterative.

647. Palindromic Substrings

Given a string, your task is to count how many palindromic substrings in this string.

The substrings with different start indexes or end indexes are counted as different substrings even they consist of same characters.

Example 1:

Input: "abc"
Output: 3
Explanation: Three palindromic strings: "a", "b", "c".

Example 2:

Input: "aaa"
Output: 6
Explanation: Six palindromic strings: "a", "a", "a", "aa", "aa", "aaa".

Solution: O(N²). We search for each position, expand the odd length and the even length to see if it is a palindrome. For example, aaa, for i =0–2: i=0, a(odd, l==r=0), aa(r=l+1), i=1, a, aaa, aa, i=2, a, so totally 6.

def countSubstrings(self, s):
"""
:type s: str
:rtype: int
"""
def count(l, r):
ans = 0
while l>=0 and r<len(s): #expand to the left and right
if s[l]==s[r]:
ans+=1
l-=1
r+=1
else:
break
return ans

ans=0
for i in range(len(s)):
ans+=count(i,i) #for odd length
ans+=count(i,i+1) #for even length
return ans

Use dp:

def countSubstrings(self, s):
"""
:type s: str
:rtype: int
"""
n=len(s)
dp= [[0 for _ in range(n)] for _ in range(n)]
print(dp)
res=0
for i in range(n-1,-1,-1):
for j in range(i,n):

if j-i>2:
dp[i][j]=(s[i]==s[j] and dp[i+1][j-1])
else:
dp[i][j] =(s[i]==s[j])
print(i,j, dp[i][j])
if dp[i][j]:
res+=1
return res

5. Longest Palindromic Substring

Given a string s, find the longest palindromic substring in s. You may assume that the maximum length of s is 1000.

Example:

Input: "babad"Output: "bab"Note: "aba" is also a valid answer.

Example:

Input: "cbbd"Output: "bb"

Solution: we can either use dp or the greedy expansion:

When we use dp, we have dp[i,j], if i==j or abs(i-j)==1, then dp[i,j]==(s[i]==s[j]), if the distance is three, id depends on the value of dp[i+1,j-1]

To program iteratvely, i need to from big to small, j neeed to from small to big

def longestPalindrome(self, s):
"""
:type s: str
:rtype: str
"""
n=len(s)
dp= [[0 for _ in range(n)] for _ in range(n)]
res=0
max_str = ''
for i in range(n-1,-1,-1):
for j in range(i,n):
if j-i>2:
dp[i][j]=(s[i]==s[j] and dp[i+1][j-1])
else:
dp[i][j] =(s[i]==s[j])
if dp[i][j] and j-i+1 >len(max_str):
max_str = s[i:j+1]
return max_str

5. Appendix

Python language

String

A*i # repeat A for i times
lest_repeat_times = (len(B)-1)//len(A)+1
if B in A: #check if B is in A
ord('A')# Given a string of length one, return an integer representing the Unicode code point of the character when the argument is a unicode object,

If we want to put string in set, it should be like this:

Compare also the difference between {} and set() with a single word argument.>>> a = set('aardvark') 
>>>
{'d', 'v', 'a', 'r', 'k'}
>>> b = {'aardvark'}# or set(['aardvark']), convert a list of strings to set
>>> b
{'aardvark'}
or put a tuple in the set
a =set([tuple]) or {(tuple)}

--

--