Mass stripping

Extending last post’s solution

Jack Holland
Understanding computer science
6 min readMar 2, 2014

--

This is an ongoing series. Please check out the collection for the rest of the articles.

Imagine if every song required a separate computer program to play. If you wanted to hear some Jefferson Airplane, you would need one program to play “White Rabbit” and another program to play “Today”. This would obviously be a ridiculous situation. Media players are useful because they can play any (properly formatted) song you give them. It doesn’t matter what the song is, the player plays it.

The instruction we developed last post, remove_e, is kind of like a media player that plays only one song. We don’t want an instruction that returns the given string without any ‘e’ characters. Rather, we want an instruction that returns the given string without any x characters, where x is whatever character we choose. Otherwise we would need to write an instruction for every character as needed: remove_a, remove_b, remove_c, etc. This would be impractical to say the least. A general remove_char instruction is not only a more elegant solution but a more practical one as well.

But how do we go about writing it? Here’s the remove_e instruction again, to help refresh your memory:

Last time str was wallflower, but that doesn’t affect anything

Here str is the string to examine. If you look carefully, you can see that the only part of the code that relies on removing ‘e’ specifically is the third line:

If we made a new instruction called remove_char in which we replace ‘e’ with a general char variable, this new instruction would work for any value given to char. Instead of remove_e(“some string”) we can now write remove_char(“some string”, ‘e’) where ‘e’ is the character to remove. These two instruction calls return the same result: “som string”. In other words, they are logically equivalent. The benefit of the new instruction is that we can also write remove_char(“some string”, ‘i’) and the code returns “some strng”.

remove_char is more flexible without being more cumbersome, which is exactly what we set out to accomplish. But not all generalizations work out as nicely; media players should play any song they’re given, but giving them the ability to also open images, videos, 3D models, and other types of data isn’t necessarily a good thing. The program may have so many features that it becomes difficult to use any of them. I’m sure you’ve encountered programs that fall prey to this kind of problem.

Now let’s take a look at another generalization, remove_chars, which removes from the given string every instance of every character in the array of characters you give it. That’s a mouthful, so here’s an example: remove_chars(“some string”, [‘s’, ‘i’]) returns “ome trng” since it removes every ‘s’ and every ‘i’ from the string you give it. Since arrays of characters are just strings, you could also write the above as remove_chars(“some string”, “si”) and it will return the same result, “ome trng”.

Let’s see what remove_chars looks like, explain how it works, and then discuss its use for a bit:

The remove_chars instruction

This looks very much like remove_e and remove_char, the only difference occurring in the third line,

in is a new instruction that is hopefully fairly intuitive. It’s defined like this:

  • element in array: returns true if element is an element in array

So 4 in [3, 4, 5] returns true because [3, 4, 5] includes a 4. On the other hand, 5 in [1, 2, 3] returns false because [1, 2, 3] doesn’t have a 5.

So the third line of remove_chars checks if the first character in the string is in chars, the array of characters-to-remove; if it is, then it’s removed from the final answer; otherwise, it’s included. (This is the same structure as remove_e and remove_char; please reread the previous post if you’re confused.)

Should we replace remove_char with remove_chars? On one hand, remove_chars encompasses everything that remove_char can do; if you want to replace a single character, simply pass an array with one character (such as [‘e’]) into remove_chars and you’ll get the same result as you would if you had directly used remove_char. So from a minimalist perspective, the more specific remove_char instruction isn’t necessary if we include remove_chars.

But including both remove_char and remove_chars isn’t sloppy or bloated. Why? Because using remove_chars with an array of one character isn’t always clear. If, while reading unfamiliar code, you see

I don’t know why an actual program would include something so specific and arbitrary

it isn’t crystal clear why the instruction name suggests removing multiple characters if the code is passing only one character. This isn’t sloppy code or bad practice or anything that serious, but it’s a case in which a more specific instruction could help the clarity. To see what I mean, observe the equivalent code using remove_char instead:

A more readable version of the code above

The meaning of this is clearer, if only by a little. And that’s the tricky aspect of generalization; there’s no easy, cut-and-dry answer to how much to generalize. There are justifications for and against keeping both instructions, and whether or not both are kept ultimately comes down to a subjective design decision. However, please don’t take the subjectivity of the matter as an excuse not to find an adequate solution; while the decision includes subjective components, objective considerations come into play as well.

We witnessed this above when deciding to consolidate remove_a, remove_b, etc. into one general remove_char; while a language doesn’t have to consolidate like this, keeping each individual remove_a, remove_b, etc. instruction makes for inflexible code, which is a demonstrably bad idea in most circumstances (as previously discussed, you need a goal/context/circumstance/constraint before deciding a “best” solution).

This post is intended to serve two purposes:

  1. Practice string manipulation
  2. Introduce concepts and problems with generalization

If all went well, both of these have been accomplished. If you’re still not comfortable with string manipulation, I encourage you to reread previous posts (sometimes that’s the only way to learn something).

If you’re not quite sure what all the fuss about generalization is, consider again the issue of a media player. How much functionality should a media player have? Does the wide array of functionality that iTunes provides make for a better media player or should it restrict its functionality to avoid bloat? Should strings be considered their own type or should they be considered a specific kind of array that doesn’t need special treatment? We’re not going to solve these problems right away, but these discussions will hone our mental models and intuitions of these problems so that we’re ready to tackle them more adeptly down the road.

--

--