Creating bioactive molecules from online data

From a genome sequence to that protein in your hand is a well –established process

Until I started collaborating closely with biologists ten years ago, my impression of how easy it was to get a sample of a complex biomolecule like a protein was based on popular science stories describing days/months of tedious purification of buckets of biomass (sea sponges, exotic fungi, cow bile, etc) to end up with a single vial containing so little it was invisible to the naked eye.

But, in the last decade I have discovered that molecular biologists have an incredible armoury of techniques to convert the data of a DNA sequence into protein in real life and in almost any quantity.

Here’s an example from a project of ours.

This is an interesting organism Monosiga brevicollis.

phase contrast image of Monosiga brevicollis, by Stephen Fairclough

It has a really interesting enzyme in it which I thought might have some relevance in synthetic chemistry. However it floats around in the sea like plankton, and I would have no idea how to catch it, let alone purify the tiny percentage of its body weight which is the enzyme. Luckily it had had its genome sequenced, and released onto publically available databases for anyone with a browser to search. This is how I found out about it- it had a sequence which looked to me very much like a type of enzyme I study. So I wanted some of the enzyme…

Below I list the process by which I have ended up with a sample of this protein in my lab. This was a sample of protein which has never been anywhere near the sea let alone the original organism.


Step 1:

Have a good look at the amino acid sequence encoded by the DNA sequence online. Does it look OK (gaps, weirdness: bad, the features that make it of interest to you: good)? If that’s a yes, move on…

Step 2:

Get a friendly molecular biologist to have a look at the sequence and suggest if it needs modifications. If it is a eukaryote then it may have introns so those can be pruned out. It may have a section which anchors it to a specific organelle which you won’t be needing when it is acting as an individual protein- those can go too. This step leaves you with the minimal sequence you need.

Step 3:

The protein will be being produced in a host cell which is not the original organism but one of a number of very well studied microbial systems like Escherichia coli. Whilst all of life uses the same coding pattern of DNA to amino acids, different organisms have different favourite codons so it’s best to use a programme to comb though the DNA sequence to optimize it for the planned organism’s preferences.

Step 4:

This is a magic bit… submit your optimized DNA sequence via a webpage to one of the many DNA synthesis companies who will use chemical methods to create your DNA request, and add in all the molecular biology machinery so that it is ready to go the minute it arrives in a DHL packet in a few days.

Step 5:

More molecular biology marvels which can be summed up in a single phrase- express the DNA in your host system. Rather like TV adverts say “Steps removed and sequence shortened” in that previous sentence, and it isn’t a trivial undertaking but it is something that a biotechnologist is quite happy doing.

Step 6:

Once the host organism has your sequence of interest integrated into its DNA, then it is a case of growing up it up and harvesting the enzyme. We tend to use large flasks which gentle shake their liquid broth contents in an incubator over a period of hours/days until they are thick with the organism. Harvesting for us means destroying all the microbial life and separating off and then using only the liquid fraction where we hope the enzyme resides.

Step 7:

Do some chemistry with it.

Now, this is not to say this is a trivial procedure at any stage (and it can be the case that your enzyme doesn’t do what you want it to do or indeed do anything at all) but it is a procedure that is carried out in laboratories across the world and it is magic!

From bits to things, and a true marvel of science.