The issue of types

Are strings and arrays the same type of thing?

Jack Holland
Understanding computer science
6 min readFeb 15, 2014

--

This is an ongoing series. Please check out the collection for the rest of the articles.

Last time, I confidently declared that strings are just special arrays and that fundamentally, they’re the same. This is true but only from a certain point of view (no, I’m not pulling an Obi Wan — what I said is actually true but there are subtleties to deal with). Strings are just arrays of characters but strings and arrays are not the same type of data. What is a data type? It’s a label that classifies how a datum should be considered. More specifically, a datum’s type determines:

  • what kind of values the datum can change into
  • how it’s stored in the computer’s memory
  • how it mixes with other types of data
  • what it should look like when it’s viewed or printed to the screen

This is not meant to be a comprehensive list but more of an intuitive guide to what a data type signifies. I discussed the types of data that Cake has in the last post, but let’s briefly revisit the topic. First, we have the number type. A number can be any real number, where “real” means any whole number, fraction, decimal, etc. as long as the number isn’t imaginary. Basically, real numbers are exactly what you think of when you think about regular numbers.

Next, we have the Boolean type, which has just two values: true and false. While true and false can be represented by numbers and, like everything else, are stored as numbers, they are not considered numbers. What does this mean? It means that in Cake, you can’t write true + 1. This has no meaning in Cake and is regarded as an error. This is because + is an instruction that accepts two numbers, not a number and a Boolean. In fact, every instruction has requirements about the types it accepts. +, -, *, /, and % work only for two numbers. and, or, and not work only for two Booleans. You can’t write 4 or 3. What would that mean, anyway?

Are you beginning to understand what a type means? To help solidify your mental model of it, let’s look at an analogy. Natural languages also have types of a sort; they’re called parts of speech. Nouns represent things, objects, concepts, and so on. Verbs represent actions and relationships. You can’t replace a verb with a noun and expect the sentence to make sense! (Actually, verbification is notoriously prevalent and accepted in English, but you know what I mean). Data types work very much the same; each type has a role and you can’t change its role willy-nilly.

This doesn’t mean that you can’t convert one type to another — we’ll see plenty of that in the future — but you can’t write 3 and true and then expect Cake to know what you’re talking about. First you have to convert 3 to a Boolean in some way or another and then you can use it with the and instruction. Of course, some instructions require two different types:

  • slice(array, start, end): returns an array made of a slice of array from the start index to the end index

So slice returns a slice of the given array, like a slice of pizza from the whole pie. You give slice the index to start slicing from and the index to stop slicing from, and it returns that portion, or slice, of the array. As you can see, slice accepts three values: an array and two numbers. This means that to use slice, you must give it an array and two numbers. Giving it an array and two strings doesn’t make sense.

Instead of giving slice an array and two numbers, what about giving it a string and two numbers? This question gets at the core of what data types mean. On one hand, a string is an array of characters. On the other hand, a string is not the right type. If you want a clear and easy answer to this conundrum, I’ve got some bad news for you: there isn’t one. There are many solutions and different programming languages take different approaches.

On one extreme, a language could insist that if an instruction requests a certain type of data then that request must be satisfied. In other words, you couldn’t give slice a string because even if it’s made of the same stuff as an array, it’s not labeled as such. This may sound overly stringent, but it helps avoid mistakes.

On the other extreme, a language can declare that if something walks, swims, and quacks like a duck, then it’s considered a duck. If strings and arrays are made of the same stuff, then they can both be used when an instruction requires an array. With this model, you could give slice a string because it’s made of the same stuff as an array. This is convenient but a bit dangerous; if there is a typo, the program may unknowingly run incorrectly because it is so forgiving about the types of data that it accepts.

There are many hybrid approaches that don’t fall on either extreme of the spectrum. Some languages dictate that if one type can be automatically converted to another type, then you can use it like that type. Since strings can be used just like arrays, they would be considered arrays; but since numbers can’t be used like arrays, they would not be considered arrays. Many languages develop hierarchies of types so that if two types share behavior — they both have a certain property — they are interchangeable in situations that require that property and nothing else.

As an example, let’s say that there’s another type called an infinite array that is made of elements just like an array, but has no end. Let’s not worry about the practicalities of this (it can be done; some languages can construct infinite lists). Rather, focus on the similarities and differences between a regular array and an infinite one. Accessing elements is the same; if the infinite array is named inf then instructions like inf[3] and inf[1006] should work as they would for finite arrays. So if an instruction requires an array whose elements can be accessed, an infinite array fits right in. With this model, instead of giving the instruction a regular array, you could give it an infinite one. But what if an instruction involves taking the size of an array? Then an infinite array becomes problematic — how can you compute the size of a never-ending array? For this instruction, you couldn’t replace a regular array with an infinite one because the infinite one doesn’t share the right properties with the finite one.

I’ve only scratched the surface of types. How to classify types of data is an enormous enterprise, studied through advanced fields like type theory and category theory that take a long time to understand. I’m not trying to intimidate you — actually, I hope that you get excited over the prospect of eventually studying fields like these. Rather, I’m trying to give you a bit of perspective on the issue and inform you that if you’re not exactly sure how types fit into programming, that’s OK. We’ll discuss them more when we learn real languages since then we’ll have tangible type-models to work with. Finally, as with most complex issues, there is never one, right answer. Which model you pick depends on your context and goals (which may be uncertain or subjective).

For Cake, we’ll take a lenient perspective on the issue. If it makes sense to give a string to an array instruction, we’ll allow it. So, size(“house”) returns 5 since “house” has five characters. We can take this approach because Cake has only a handful of array instructions, all of which make sense with strings.

To wrap up the discussion of Cake’s types, I want to mention that along with numbers, Booleans, arrays, and strings, there is another type: characters. A character is, as we’ve already seen, any letter, number, or symbol enclosed in apostrophes like this: ‘A’, ‘5', ‘#’, ‘ ’. Each character has a corresponding number, which is what is stored in the computer’s memory when we use characters. But, importantly, characters are not numbers, just like strings are not arrays, and should not be considered the same type. However, as with strings, Cake shall take a lenient approach and allow characters to be used as numbers for the sake of convenience. Please note that not all languages are this lenient and forgiving (some are, though!).

--

--