Source: unsplash.com

The Evolution of Ruby Strings from 1.8 to 3.2

An overview of the String class since Ruby 1.8

Tech - RubyCademy
Published in
3 min readSep 10, 2019

--

In Ruby, a string is represented as an instance of the String class. This class has highly evolved between Ruby 1.8 and Ruby 2.7.

So, the purpose of this article is to detail the main changes that occurred for each major release.

1.8 to 1.9

Let’s see what are the main differences for the String class between 1.8 and 1.9

The first difference remains in the fact that the Enumerable module is included in the String class in Ruby 1.8 when it’s not included anymore in Ruby 1.9.

Also, a set of new instance methods are available for the String class in Ruby 1.9.

But the most important evolution is that in Ruby 1.8, strings are considered as a sequence of bytes while in Ruby 1.9, strings are considered as a sequence of codepoints.

A sequence of codepoints, coupled to a specific encoding, allows Ruby to handle encodings.

Indeed, a string is stored as a sequence of bytes.

An encoding simply specifies how to take those bytes and convert them into codepoints.

So, from Ruby 1.9, Ruby natively handles string encoding while in 1.8 the iconv library was required to do this job.

Note that the default encoding of each string is Binary (read as a sequence of bytes).

Finally, the iconv library is deprecated in Ruby 1.9.

1.9 to 2.0

In Ruby 2.0, UTF8 is the default encoding of each string literal of a running program — when in 1.9 it was Binary.

This behavior is a bit similar to Java which uses UTF16 as the default encoding.

Note that from Ruby 2.0, the iconv library is no longer part of the language.

2.0 to 2.1

In Ruby 2.0, encoding a string from encoding to the same one — UTF8 to UTF8 for example — results in a no-op

Here we can see that in Ruby 2.0, a UTF8 string that we explicitly encode in UTF8 returns the string without replacing the unknown codepoints. So the invalid: :replace operation is omitted.

In Ruby 2.1, the invalid: :replace operation is processed and the default characters Replaces each invalid codepoint in the sequence.

2.1 to 3.2

Since Ruby 2.1 and in addition to providing many performance improvements, the String class added two main features:

The frozen_string_literal: true magic comment (since Ruby 2.3)

Case conversion for non-ASCII strings (since Ruby 2.4)

Benchmark string allocations

The following benchmark is generated using the benchmark-ips gem

And it produces the following result for each version of Ruby

Here we can see that string allocation in Ruby 2.5 is about 4 times more efficient than in Ruby 1.8

Ruby Mastery

We’re currently finalizing our first online course: Ruby Mastery.

Join the list for an exclusive release alert! 🔔

🔗 Ruby Mastery by RubyCademy

Also, you can follow us on x.com as we’re very active on this platform. Indeed, we post elaborate code examples every day.

💚

--

--