Genetics and CS, Part 2

Jason Oswald
Learning Computer Science with Swift
4 min readOct 4, 2017

In Part 1, we left off with a good structure for solving more problems from the Rosalind Project. Now, we will proceed with further developing our tools to solve more problems.

Transcribing DNA into RNA

Resources

Project Rosalind

This problem deals with how RNA is created (transcribed from DNA). It is, in essence, simply replacing Ts with Us. Your challenge is to create a new RNA GeneticSequence from an existing DNA GeneticSequence:

Please note that I’ve omitted bits of the code that we don’t need to see at the moment. You should work from the current version of your code.

You’ll notice a couple of changes in the code. First, there is a new error. Think about why this was necessary to add. Next, you’ll note a new keyword, guard. Guard acts as a gatekeeper for code. In practice, it isn’t much different from using an if…else… construct. It does, however, allow you to clearly delineate code that is defensive from code that is logical. This is a good use of the guard statement– we only want to transcribe DNA.

Next, you’ll see an algorithm for solving this problem. After reading it, you may notice a couple of assumptions being made. First of all, we have no way to create a GeneticSequence object without a string. It also isn’t obvious how we can change a GeneticSequence’s strand array.

Creating a New init

There are many approaches to this conundrum, but the recommended approach is to simply create the initializer that we need.

init( kind: NucleicAcid ) {  self.kind = kind  strand = [Nucleotide]()}

Then, since this is basically the first two lines of our existing initializer, we can call this initializer from that one, and then proceed with the code particular to that initializer.

init( kind: NucleicAcid, fromString string: String ) throws {  self.init( kind: kind )  for c in string {    try strand.append(Nucleotide.nucleotide(fromChar: c))  }}init( kind: NucleicAcid ) {  self.kind = kind  strand = [Nucleotide]()}

This is pretty standard practice.

This now fills in some missing pieces of our transcription function.

func transcribeToRNA() throws -> GeneticSequence  {
guard self.kind == .DNA else { throw BioinformaticsException.cannotTranscribeRNAError }
// create a GeneticSequence object representing the rna
var RNA = GeneticSequence( kind: .RNA )
// for each nucleotide in the dna strand
// if the nucelotide is .T
// append .U to the rna's strand
// else
// append nucleotide rna's strand
return RNA
}

Performing the Transcription

The only missing piece remaining is the knowledge that you can access the properties of a struct using dot notation:

func transcribeToRNA() throws -> GeneticSequence  {
guard self.kind == .DNA else { throw BioinformaticsException.cannotTranscribeRNAError }
var RNA = GeneticSequence( kind: .RNA ) for nucleotide in strand {
if nucleotide == .T {
RNA.strand.append( .U )
}
else {
RNA.strand.append( nucleotide )
}
}
return RNA
}

Converting to String

The last part of this is that if we try printing myRNA, we get some ugly output. Let’s pretty that up a bit. What we’d really like to do is when we call `print(myRNA) that it will print “GAUUACA.” In order to do this, we have to convert our Nucleotide types back into a string. The algorithm for doing this is pretty straightforward

  1. start with an empty string
  2. loop over strand
  3. convert each Nucleotide into a String
  4. append it to string
  5. return the string

In order to do this right (being able to call print(myRNA), we need to add a bit of voodoo to our code. The first is that we will change the struct declaration from struct GeneticSequence to struct GeneticSequence: CustomStringConvertable. Xcode will probably give you an error at this point saying something like, “GeneticSequence does not conform to CustomStringConvertable.” If we click on the error, there’s a fix button for this which adds “protocol stubs.” This creates a String variable called description and the error goes away, but we need to do a bit more to get it to work. This is where the second bit of voodoo comes into play. We are going to associate code with the variable we just created.

var description: String {// this is where the code goes for creating the string}

This is using a feature of swift called a computed property. This enables you to, as the term likely makes obvious, compute properties instead of simply storing them.

The last piece of this is converting a Nucleotide into a string. We could write a function that uses a switch statement to do this, but we can also take advantage of a neat trick with Swift’s enums. If we like, we can associate a “raw value” with an enum. We will associate a String with our Nucleotides and let Swift handle the conversions for us. We simply change the enum definition to enum Nucleotide: String and then use nucleotide.rawValue in the code for description. Give it a shot!

Generating the Reverse Complement of a DNA Strand

Resources

Project Rosalind

This problem deals with creating the reverse comlement of a DNA strand. Each nucleotide is paired with another nucleotide and they are each others’ complements. A is paired with T and G is paired with C. The algorithm for this problem is relatively straightforward.

  1. create a new genetic strand of DNA
  2. read through our strand of DNA
  3. append the complement of the current nucleotide to the new DNA’s strand array
  4. reverse the strand array of the new DNA
  5. return the genetic strand

Steps 1, 2, and 5 are all very familiar to us:

As before, note that the implementation of GeneticSequence is partial. I’ve only included code relevant to the problem at hand.

What we’d really like to be able to do is call

reverseComplement.strand.append( nucleotide.complement() )

Which we can do. Your job is to define this function that is part of the nucleotide enum (to be clear, the function will not be static) and to call in in the for loop of the reverseComplement function defined above.

Bonus Points

In the above solution, we’ve made use of a Swift array’s ability to reverse itself. Bonus points are available for your own code that reverses the array.

Counting Point Mutations

Resources

Project Rosalind

In this problem, we are trying to determine how different one strand of DNA is from another. We do this by counting in how many positions the nucleotides from one strand are different from another. Write the following function:

Finding GC Content

Resources

Project Rosalind

In this problem, we find out what percentage of a sequence is made of G or C. Write the following function:

--

--