Genetics and CS, Part 1

Jason Oswald
Learning Computer Science with Swift
4 min readOct 2, 2017

Borrowing from the Rosalind project, we will be building a set of tools that will enable us to work with genetic information. Along the way we will learn about some general programming structures and algorithms.

  • Defining our own types with enums
  • Creating our own types with structs
  • Adding functionality to existing types with extensions
  • Defensive programming with guard and exceptions

Nucleotides and Enums– Defining our first type

Resources

As you may know, there are two nucleic acids that store genetic information: DNA and RNA. Those nucleic acids are composed of combinations of five nucleotides: adenine, cytosine, guanine, thymine, and uracil. We eventually want to create a type that represents a strand of genetic material, but before we get there, we want a way to say if a nucleic acid is DNA or RNA, and we want a way to describe an individual nucleic acid. Because they are unique types with limited value possibilities, defining an enum for each is appropriate.

The example usage in this code isn’t very useful.

To get a feel for how this will be useful, let’s think about how we might want to represent a strand of genetic material. A strand typically is represented a list of characters, like so: “GATTACA.” It makes sense for us to represent a strand of genetic material as an array of Nucleotide enums.

Creating a Genetic Sequence Struct

Resources

  • The Swift documentation on Classes and Structs. Try to focus exclusively on the struct aspects, though, by and large, there isn’t much different between the two for our purposes.

A struct is another way to define our own types. When we create a struct (or a class), we want to think about the properties of our type and what we want our types to be able to do. It helps to initially think about these things abstractly and not in terms of code. As we develop our ideas, we can get closer to writing code.

For us, we want a type that represents a genetic sequence. We want to be able to interpret that sequence as either DNA or RNA and we want to be able to represent a sequence of nucleotides of any length. We also want to come up with good names for our properties.

It also makes sense for us to define our enums inside of this struct, since the ideas are so closely coupled.

You’ll note that we don’t directly write the initializer for this struct. One is automatically created for us based on the properties we create. You’ll also notice that this code is a little inconvenient in that we have to, by hand, append each nucleotide to the strand. We’d much rather be able to have the computer do that. Let’s create a constructor that accepts a String and creates a the appropriate genetic strand.

This works, but it is a little clunky in a couple of different ways. The first is that it shouldn’t really be the struct’s job to convert a character from a String into a Nucleotide; that really should be the job of Nucleotide enum. The second is the clunky use of if…else if… this is the perfect time to introduce the switch statement, along with a couple of other elements.

Take note of the use of the new keyword, static to define our function in our enum. Also note that looping over a String gives us Character objects, so we have to be careful with our parameters.

The switch construct is, essentially, a pretty if…else if… else. As you can see, char is what is being tested. Instead of writing if char == “A” we simply define the cases we are looking for and what to do in each case. default specifies what to do in the else case and is not optional. As the compiler will tell you, switch must be exhaustive.

Our code continues to be a bit problematic, however. Take a moment to think about what sort of problems could arise from our switch statement. For instance, the choice of returning .U for all other cases is a bit arbitrary and can lead to problems. Write some ideas here before continuing.

Throwing Errors

There are certainly ways of dealing with some of the problems that might emerge. For instance, if we receive an “a” instead of an “A,” could we not just return .A? Similarly, we could create a new nucleotide case called “error” or something like that to deal with letters that aren’t A, C, G, T, and U. There are several other effective ways of dealing with the problem, but fundamentally, the issue at hand is the passing of incorrect data into our functions. A String that looks like this “G@tTAçA” is wrong, but perhaps savable, whereas this one, “MR. OSWALD” has no hope. In both cases, instead of trying to deal with what could be wrong, we would rather just reject the input outright. We do this using Swift’s error handling mechanisms.

Take note of the functions labled with throws and the use of the keyword try before attempting anything that might be dangerous

Our First Problem

Now that we have our struct in place, we can take a look at the first problem on the Rosalind site, counting DNA nucleotides. The problem asks us to say how many instances of each nucleotide is in a given genetic sequence. For instance, for the sequence “GATTACA” we would want to say “3 1 1 2.” Let’s start by writing a function that takes as a parameter a nucleotide and returns how many instances of that nucleotide are in the sequence.

Edit the occurencesOf function to return the correct number of occurences for the given nucelotide. Add additional calls to the occurnecesOf function so that your print statement is in the correct format.

--

--