How to strip (certain characters from a string)

Some practice with string manipulation

Jack Holland
Understanding computer science
6 min readFeb 20, 2014

--

This is an ongoing series. Please check out the collection for the rest of the articles.

Have you ever heard of a lipogram? It’s a piece of writing that excludes one or more letters. Once upon a time, Ernest Vincent Wright wrote a 50,000 word novel without once using the letter ‘e’. That’s impressive. In this post, we’re going to do something a little more basic: write code that lets us turn any piece of writing into a lipogram. How? By removing every instance of whichever character you specify.

Vincent’s novel, Gadsby

Let us produce a sample string we can use to test our future code:

I am writing to you because she said you listen and understand and didn’t try to sleep with that person at that party even though you could have.

It’s the opening line from The Perks of Being a Wallflower, a superb coming of age novel (no, I’m not being paid to say this — I just love the book). If you read the quotation again, you’ll notice there are a lot of instances of the letter ‘e’. Let’s get rid of them. We’ll call the instruction remove_e, since that’s what it does. This is a great situation for recursion (assume the string is named wallflower):

The remove_e instruction

I tried to make this code as readable as possible, but its meaning is probably not as intuitively obvious as I’m picturing it. With that in mind, let’s examine it closely. First, note that I snuck in some new syntax, else if:

  • if (condition) then (code) else if (another_condition) then (more_code) else (even_more_code)

This does what you’d expect; if condition is true, then code is computed. If condition is false, then another_condition is checked; if it is true, then more_code is computed. If another_condition is false, then even_more_code is computed. It works just like a regular if then else except that the code checks another if before it gives up and computes the else code.

New syntax aside, some questions still remain. Look at the first condition:

The first condition that remove_e checks

What does this mean? Well, it’s equivalent to this:

This is equivalent to the line of code above

Except that the first version is easier to read. Either way, it checks if the string is empty; that is, if the string has no elements/characters and is just a pair of empty quotations. This is the base case for the recursive instruction; after every character is processed, only an empty string will remain, in which case this empty string should be returned:

The code that is run if the first condition returns true

You could also write

Since wallflower is “”, you could write this instead

since wallflower equals “”. Choose whichever you think is easier to understand. The upshot is that when remove_e encounters an empty string, it just returns it.

Now to the else if part. If the string is not empty, remove_e checks this condition:

The next condition that remove_e checks (if the first one is false)

This checks if the first (0th) character is the letter ‘e’. So if wallflower were “bullion” then this condition would return false; if wallflower were “ebullient” then this would return true. What happens if this returns true? This:

The code that is run if the second condition returns true

Yes, it’s a lot of parentheses, but the logic is actually straightforward. If the 0th character is an ‘e’, then we don’t want it included in the final string. Thus, we should return whatever remove_e returns for the rest of wallflower. For instance, if wallflower is “energy” then we want to drop the first ‘e’ and call remove_e on the rest, “nergy”. After every character in the string has been evaluated, every ‘e’ will have been removed.

But what if the 0th character isn’t an ‘e’? Then we want to keep it! So let’s return a new string that starts with whatever the 0th character is and ends with whatever remove_e returns for the rest of wallflower. If wallflower is “Oregon” then we want “O” ++ remove_e(“regon”). That can be accomplished with the final line of the instruction:

The code that is run if neither condition returns true

On the left side of ++,

The left side of the else expression

we take the 0th character of wallflower, which is wallflower[0], and convert it to a string. Why do we need to convert it? Short answer: types, as discussed last post. Longer answer: because wallflower[0] is a character and ++ works with two strings — not with a character and a string. How do we convert a single character into a string? Call the string instruction, of course!

  • string(char): converts char into a 1-character string, “char

Thus, string(‘a’) returns “a” and string(‘*’) returns “*”. So the left side of ++ converts the 0th character into its string equivalent, allowing it to be used with ++. Now let’s examine the right side of ++:

The right side of the else expression

This copies the format of the last then code; call remove_e with the rest of wallflower. For example, if wallflower equals “book” then the then expression computes

This simplifies to

which in turn simplifies to

which equals “book”. This is expected because “book” has no letter ‘e’ in it, so remove_e(“book”) should just return “book”.

Let’s return to the actual value of wallflower,

I am writing to you because she said you listen and understand and didn’t try to sleep with that person at that party even though you could have.

What does remove_e(wallflower) return? The instruction begins by checking if wallflower is an empty string. It isn’t, so Cake moves to the next condition, whether or not wallflower starts with ‘e’. It doesn’t, so Cake moves to the else expression,

The else expression introduced above

This produces “I” ++ remove_e(the rest of the string). What is remove_e(the rest of the string)? Well, the first character of the rest of the string is ‘ ’, which means it returns “ ” ++ remove_e(the rest of the string). This process repeats until an ‘e’ is found (in the word “because”). Then, the ‘e’ is left out and the rest of the string is evaluated with remove_e. Here is the final result (note that the original wallflower is never changed; remove_e returns a new string each time it is called):

I am writing to you bcaus sh said you listn and undrstand and didn’t try to slp with that prson at that party vn though you could hav.

It may not be as eloquent, but it’s still pretty readable. As a final note, here are two practice problems to work on; I’ll post and discuss their answers next time:

  1. Write a more general version of remove_e called remove_char that removes char from string.
  2. Write a more general version of remove_char called remove_chars that removes each character in the array chars from string.

Image credit: Vincent’s Gadsby

--

--