Library preparation for sequencing
This is the first article in a series of articles about sequencing.
Have you ever got your genome sequenced before? Lots of people are starting to see the benefits of getting their genome sequenced or they are just doing it out of curiosity. But how does it work.?
If you’ve ever gotten your genome sequenced before you probably had to send a sample of your blood or saliva. How do you think they use these samples for sequencing?
Well, they just take the sample and put it into one of those sequencing machines right?
Actually you’re not right, but you’re not wrong either. You do eventually put the sample into a sequencing machine, but there is a bunch of library preparation steps before that.
So what is library preparation?
Its exactly what it sounds like, you’re preparing a sample library to be sequenced.
Sequencing generally has 4 main steps, though based on factors like what you’re trying to sequence (eg. DNA, RNA, methylation) or what type of sample you use (eg. blood, saliva, sputum, tumor):
- DNA Fragmentation/Target Selection
- Adapter Sequences
- Size selection
- Final Library Quantification and Quality Control
Based on the type of sample that is used, there are different steps to break the sample down into purified DNA before moving on to DNA fragmentation
A blood sample is for someone undergoing investigations for a rare disease, in this case blood is often taken from both family members as well for comparison. It can also be used for general DNA profiling.
Saliva sample is similar to blood sample, but there is a high chance of it being contaminated, since the methods and kits available to handle saliva aren’t as well established as blood.
Tumor sample is to investigate the genome of tumor cells, for diseases like cancer.
Hair can be used for DNA profiling. Actual hair strands don’t contain DNA, it is only present in the follicle of the hair.
A sputum sample can be used for diseases like mycobacterium tuberculosis, where the sputum contains the infectious bacteria. A feces sample is similar to a sputum sample.
Once you have the purified DNA, it has to be assessed to see if it can be used for sequencing. One of the main steps in assessing the DNA is using a spectrophotometer which is a device that uses light to measure the absorbance of DNA from which you can find out how much DNA there is (the concentration), and how clean it is (contamination level). After assessing it, the DNA can be adjusted according to the sequencing method being used. All of this happens before DNA Fragmentation.
DNA Fragmentation/Target Selection
DNA Fragmentation is exactly what it sounds like. Its basically just fragmenting the DNA.
The reason this needs to be done is because to achieve the best results from the sequencing process, the DNA needs to be the optimum length for the sequencing machine which varies based on different sequencing technologies. It can usually range anywhere from 150–1000 base pairs, though 350–500 is the most common optimum length.
Fragmentation can be carried out using physical or enzymatic methods. For example ultrasonic sound is commonly used for fragmentation. This results in sheared pieces of random lengths. This method yields results that are kind of like if you snapped a branch over your leg with untidy ends.
This would then have to be repaired in the lab, to achieve the optimum size, and to clean up the untidy ends. You might want to use certain enzymes for this since they allow for more control. For example one enzymatic method is called restriction digestion which involves cutting DNA with enzymes called Restriction Endonucleases.
The libraries produced through either physical or enzymatic methods are known as fragment libraries.
PCR (Polymerase Chain Reaction, its a method used to make copies of DNA fragments) Amplification can also be used if the sequence of specific DNA targets are known, to produce DNA amplicons based on the size range. With PCR you can copy the DNA fragments to be the right size, so you usually don’t need to worry about the size unless the PCR copies are not set to the right size.
The libraries produced through PCR are known as amplicon libraries.
DNA Fragmentation and Target Selection done simultaneously is called tagmentation.
Adapter Sequences
This step modifies the DNA fragments for the sequencing process by adding small pieces of DNA called adapters. Adapters can be added to DNA fragments for 3 different purposes.
- They indicate where the sequencing primer should bind. A sequencing primer indicates where the sequencing should begin.
- They anchor the DNA to a surface and hold it in place during the sequencing process
- They contain a reference code which is kind of like a bar code. So if 2 libraries from 2 different patients are sequenced, the reference codes help the scientists differentiate between the genome fragments from the patients.
After all the adapters are added to the fragments the library is washed to remove all the adapters that haven’t been used.
Size selection
We want to make sure that our fragments are all the right size.There are 2 common methods for size selection.
- Gel electrophoresis
- Bead based method
Gel electrophoresis:
Gel electrophoresis is a technique that is used to separate macromolecules like DNA fragments, RNA, protein, etc. based on their size and their charge. You run a current through the gel containing the fragments and based on their size and charge they will travel through the gel in different directions and in different speeds allowing them to separate. Since fragments of the same type have the same charge (ie. All DNA fragments have the same charge), they separate based on only the size. So this allows us to separate the fragments, compare the sizes, and only use the fragments that are the right size.
Bead based method:
The bead based method is when you use magnetic beads with different concentrations of buffers to isolate different sizes of fragments, which can then be sorted, and the fragments of the right size can be used for the sequencing process.
Size selection is usually not necessary for amplicon libraries since you can use PCR to produce the amplicons in the right size range.
Final Library Quantification and Quality Control
This is the final step in library preparation, and it is to make sure that there are enough fragments/amplicons for sequencing, and also quality control if there are any fragments/amplicons that still need it, at this point though there are usually only fragments that have already gone through the proper quality control procedures.
There are 3 most common methods for quantification:
- Fluorescence spectroscopy
- Real time PCRs
- ddPCR
There are many different variations of these methods but we’ll just look at these methods for now.
Fluorescence spectroscopy:
When an electron collides with high energy particles like photons or other excited electrons, it gets excited and releases photons while lowering its energy back to the ground state. When this happens the molecules that are showing this fluorescence activity are called fluorophores. These fluorophores can act as a bio marker which would allow us to quantify the fragments. The fluorophores used for quantification are usually extrinsic (eg. radioactive probes, or dies), but there are also intrinsic ones like certain amino acids. Intrinsic fluorophores are usually less expensive and do not require intervention.
Real Time PCR:
Real Time PCR or qPCR(Quantitative PCR) tracks the concentration based on the PCR cycle number and derives the initial concentration. It uses a polymerase, dNTPs, and two primers designed to match sequences within the sample. If you set the 2 primers to the adapters, this means that the qPCR will only measure fragments with adapters.
If you were curious as to what polymerases, dNTPs, and primers are then here are quick definitions.
Polymerases are enzymes that synthesize long chains of polymers or nucleic acids, like DNA or RNA.
The individual bases need to be supplied to the polymerases. dNTPs supply the building blocks for the polymerases to synthesize chains of polymers or nucleic acids.
A primer is a short nucleic acid sequence that is a starting point for the synthesis process.
ddPCR:
In ddPCR(Droplet Digital PCR), 20000 droplets are formed in a water-oil emulsion. The samples are divided into the 20000 nanoliter-sized droplets. Then the measurement process is done using PCR in all 20000 droplets. Sample partitioning allows for thousands of measurements from one sample. Droplet Digital PCR is a type of digital PCR, all digital PCRs have the sample partitioning feature.
Key Takeaways
- Library preparation is a very important step of sequencing
- There are 4 main steps in library preparation
1. DNA Fragmentation/Target Selection
2. Adapter Sequences
3. Size selection
4. Final Library Quantification and Quality Control - Fragmentation can be done using physical, and enzymatic methods or by using PCR
- Adapters modify the DNA fragments for the sequencing process by adding small pieces of DNA called adapters.
- Adapters have 3 main purposes
1. Indicate where the sequencing primer should bind
2. Anchor the DNA in place.
3. Contain a reference code. - There are 2 common methods for size selection
1. Gel electrophoresis
2. Bead based method - Quantification is to determine the concentration of the sample. There are 3 commonly used methods of quantification:
1. Fluorescence spectroscopy
2. Real Time PCR/qPCR
3. ddPCR