Reply
Highlighted

Question about linked reads in smaller genomes

Posted By: james, on Sep 9, 2016 at 7:07 AM

In the BioRxiv paper the authors discuss how/why smaller genomes get lower linked read coverage: "For smaller genomes, assuming that the same DNA mass was loaded and that the library was sequenced to the same read­depth, the number of Linked­Reads (read pairs) per molecule would drop proportionally, which would reduce the power of the data type. For example, for a genome whose size is 1/10th the size of the human genome (320 Mb), the mean number of Linked­Reads per molecule would be about 6, and the distance between Linked­Reads would be about 8 kb, making it hard to anchor barcodes to short initial contigs."

 

I don't understand this as my first assumption was that genome size would have no impact on linked read depth per molecule, but it would significantly affect the amount of the genome present in a single droplet. As such the smaller genome, with DNA fragments of the same size should still have around 60 linked reads per DNA molecule, but a 10MB genome would mean 5% was in each droplet making the phasing much harder to determine.

 

Can someone please explain?

9 Replies

Re: Question about linked reads in smaller genomes

Posted By: joel, on Sep 12, 2016 at 2:34 AM

james wrote:

"For smaller genomes, assuming that the same DNA mass was loaded and that the library was sequenced to the same read­depth"

 

Perhaps they mean that the smaller genome is sequenced to the same coverage, i.e. 60X? Loading the same amount of DNA of the same length means the same number of molecules/droplet. Sequencing the smaller genome to the same genomic coverage would mean you have fewer reads per barcode (or linked reads per molecule). And as you write, the smaller the genome the higher the probability that two molecules from the same locus have the same barcode given that you load the same amount. Is the solution then to reduce the amount of DNA loaded to 1/10th?

Re: Question about linked reads in smaller genomes

Posted By: ash-10x, on Sep 12, 2016 at 8:00 AM

Hello James,

There are three different things going on here. The fixed number of partitions, the mass loaded into the system per genome size, and how LinkedReads Per Molecule (LPM) are affected by mass.

The 10x Genomics Chromium Genome Kit protocol creates approximately 1 million Gel Bead-In-EMulsions (GEMs). No matter how much mass is inputted into the system, the same number of GEMs are created.

Our loading mass recommendations ensure a specific number of haploid copies of the genome are inputted into the system. The number of haploid copies of a genome present in a given aliquot is also known as, Genome Equivalence (GE). The genome copies inputted will affect both the limiting dilution (essentially how many gDNA molecules are distributed per GEM) and downstream data analysis. To determine how much loading mass to input for your species of interest, you need to know the species haploid genome size. Based on the genome size of your species, you can now calculate the genome equivalence achieved for a given mass.

For human samples, 1-1.2ng of the 3.2GB size genome is ~300-360GE. Consequently, only a small number of genomic equivalents are loaded in each GEM (~400 kb, 0.1% of the haploid human genome).

For that same human example, at 30X read coverage, ~35 LPM will end up sequenced from each 50Kb input molecule. If you were to double the amount of mass from 1ng to 2ng, there would be double the mass per GEM. Since there would be double the mass in the GEM, each individual molecule will have approximately half the number of LinkedReads created, resulting in an LPM of ~17.

Pulling all of that together, when loading any species, you want to load the correct mass to maintain a proper distribution of ~300GE across all of the GEMs. The GE is proportional to genome size.

In your example, a 1ng loading mass of a 10MB genome would contain ~91000GE. That number of physical copies equally inputted into all the GEMs would result in less than 1 LPM.

An additional consideration for running non-human genomes is the input range that can be loaded onto the Chromium Controller. Our system is optimal and supported between 0.6ng - 2.4ng. Genomes smaller than 1.8GB will fall out of this range to achieve ~300GE. In some cases, we recommend loading higher than 300GE if the number of copies will minimally affect the limiting dilution and number of fragments per GEM. There will be a point that the genome is just too small for both loading and the Supernova de novo assembly software.

Application notes will soon be released describing the above calculations in more detail. Until then, please let me know if I can clarify any points further.

Thank you,
Ash

Re: Question about linked reads in smaller genomes

Posted By: ash-10x, on Sep 12, 2016 at 8:09 AM

Hello Joel,

 

You are absolutely correct, the smaller the genome, the less gDNA mass you would want to input into the system. A 320MB genome may be too small.  There is a balance of staying within the optimal mass loading ranges of the Chromium Controller (0.6ng-2.4ng) and maintaining a genome equivalence that will not negatively affect gDNA molecules are distributed per GEM.

 

We have successfully loaded and assembled 1GB genomes. 

 

Thank you,

Ash

Re: Question about linked reads in smaller genomes

Posted By: jafar, on Sep 13, 2016 at 3:06 PM

For smaller genomes that 300GE mass is below 0.6 (for instance it is 0.2 ng) would it be possible to load 0.6 ng and increase sequencing depth proportionally (3x recommended depth) to get similar results to larger genomes.