Removing Characters From a String

The best way to remove characters from a string with Ruby

Rebecca Hickson
The Startup
4 min readNov 14, 2020

--

Photo by Félix Prado on Unsplash

Sometimes in life, you’ve got to take what you can get. This is certainly true when it comes to dealing with APIs — particularly if you are like me and many other fledgling developers who chose to use a freely available API for one of your first projects. I leveraged the TVmaze API to provide data for my CLI App, Telly-Ho. My goal was to help someone find their next television show by filtering a collection of shows by their genre and displaying each show’s summary to the user. Overall, I found this API to be very user friendly and easy to work with. As I’ve quickly discovered, in computer programming, there’s always going to be some little “gotcha!” In this case, the challenge was in dealing with HTML tags nested in the strings, as pictured below:

I immediately hopped on Google to see what I could find to help sanitize the data. I came across three different string methods for deleting characters: delete, tr, and gsub. The only one that I had used before was delete, and it seemed so straight-forward, that I was sure I could get it to work… Well, I spent far longer than I should have trying to force it to cooperate for this project. I’ll save you the trouble I encountered: here’s a quick guide to walk you through the pros and cons of these three methods.

Delete - (.delete)

Delete is the most familiar Ruby method, and it does exactly what you would think: deletes a sub-string from a string. It will search the whole string and remove all characters that match your substring.

"<p>Some text</p><br />".delete("</p><br />")
=> "Sometext"
"<p>Some text</p><br />".delete("</p><br/>")
=> "Some text "

The downside of delete is that it can only be used with strings, not RegExs. As you can see in the example above, although I could remove all of the HTML tags, delete’s rigidity can wind up warping your text, and it adds the unnecessary complexity of having to explicitly include every HTML tag that might be used.

Translate - (.tr)

Tr was a new method for me. It accepts two string arguments and replaces the characters by their corresponding index. In this example, every ‘t’ is replaced with a ‘*’ and every ‘e’ is replaced with a ‘_’.

'test string'.tr('te', '*_')
=> "*_s* s*ring"

I could use this to delete specific characters, replacing them with an empty string, but again, it is fairly rigid because it does not accept RegEx.

'<p>Some text</p><br />'.tr('<p><br />', '')
=> "Sometext"
'<p>Some text</p><br />'.tr('</p><br/>', '')
=> "Some text "

I ran into the same warping problems here as in delete, so it’s not a great option either.

Global Substitute -(.gsub)

I had never heard of the global substitute (gsub) method, but I’m so happy I found it! gsub is the only one of these options that accepts a RegEx. It accepts two arguments and replaces the matched characters with those in the second argument.

'<p>Some text</p><br />'.gsub(%r{<\w*>|<\/\w*>|<\w*\s\/>}, '')
=> "Some text"

In the above example, gsub will apply the RegEx to the entire string and replaces each match with an empty string, essentially deleting it. The big advantage here is that I don’t have to explicitly input every HTML element that exists, and I have full control over what part of the string gets modified.

Why not use gsub for everything?

gsub, although extremely useful, is not the best option for all circumstances. I used Ruby’s Benchmark module to measure how long it took these methods to perform similar operations.

Benchmark.bm do |x|input = "Test String, let's practice removing letters."x.report("Delete:") { 100000.times {input.delete('e')}}x.report("Translate:") { 100000.times {input.tr('e', '')}}x.report("Global Substitute:") { 100000.times {input.gsub(/e/, '')}}end                        user     system      total        real
Delete: 0.106322 0.001181 0.107503 ( 0.110769)
Translate: 0.115754 0.000814 0.116568 ( 0.120289)
Global Substitute: 0.449712 0.006773 0.456485 ( 0.463564)

The “real” time is the time it took to run the code. In this case, the time it took to remove all the “e”s from the string with the various methods 100,000 times. As you can see, gsub took the longest by far. Although it would hardly matter for the narrow scope of my project, getting into the habit of using gsub for everything could slow things down significantly over the course of a larger project.

gsub was definitely the most useful method for my project, but it is not always the best choice for every situation. It is important to be aware of all your options and to pick the right tool for each job. I hope this quick overview will help you with your next project. If you’d like to see my CLI app in action, you can check out Telly-Ho on GitHub. Please feel free to leave a comment or reach out to me if you have any questions!

--

--