Reply
Highlighted

Scaffold count of pseudohap2

Posted By: graham, on Sep 15, 2017 at 6:39 AM

I've noticed that the two assemblies from a pseudohap2 output of Supernova can containing similar, but not identical scaffold counts. I’d previously presumed that the scaffold count would be identical and if two haplotypes were the same, a copy would be found in each of the two assemblies (perhaps this is the case). Could someone tell me under what circumstances a sequence would find its way into one of the pseudohap2 assemblies but not the other.

Many thanks,

Graham

 

 

Dr. Graham Etherington
The Earlham Institute, Norwich, UK
Twitter: @bioinformatiks

1 Reply

Re: Scaffold count of pseudohap2

Posted By: neil-10x, on Sep 15, 2017 at 9:15 AM

Hi Graham,

 

supernova mkoutput has a default minimum size of 1000 bases for records, although you could change this via the --minsize option.

 

I think that what you're seeing are cases where scaffolds are above the threshold in one pseudohap and below the threshold in the other.

 

In Supernova 1.0, we numbered records consecutively in each file, which was problematic in cases such as this, because the absence of a homologous record from one file caused the numbering to get out of sync.

 

As of Supernova 1.1.0, corresponding records between the two files always share a common sequence id.  So I suspect that if you found the records that appear in one file, and not the other, based solely on the sequence id, then you'd find that those records are fairly close to the threshold that was used.  As a further sanity check, you could run with --minsize=0 and you should recover everything -- both files should have the same number of records and be consecutively numbered, with no "holes."

 

 

Let us know if you find otherwise!

 

Best regards,

Neil