Long ranger basic output

Posted By: pirita, on Sep 19, 2016 at 3:27 AM

Could someone explain to me the FASTQ header of the long ranger basic pipeline? Do I interpret correctly that AAACACCAGCTGCGAA-1 is the corrected barcode, and AAACACCAGCTGCGAA is the uncorrected barcode? So sometimes these two are the same, and sometimes there is a very small difference, usually just one basepair. For the FASTQ header that don't have the barcode, it has not been classified as having one? Are the reads on the output FASTQ sorted so that if I find one barcode cluster, there are no reads with this barcode somewhere else in the file? I am working in an organism that is not human, so cannot use the inbuilt pipelines, and am thinking of how best to use the data in its raw format. Thank you for any insights.

2 Replies

Re: Long ranger basic output

Posted By: shuoguo, on Nov 4, 2016 at 8:27 AM

I have not seen these two barcodes different, except BX having the "-1" notation.

There are three "types" of barcode:

1. a barcode is NOT in the 4M barcode list

2. a barcode having "N" as first base pair, but otherwise can be found in the 4M list

3. a barcode is in the 4 M list


For senario 1, the read will have RX but not BX, since the barcode is unlikely (my understanding) to be a real barcode.



For senario 2&3, RX==BX.


Again I have not seeing they different. You are welcome to point me to an example.

Re: Long ranger basic output

Posted By: RonaldNieuw, on Jun 10, 2018 at 3:26 AM

Though this is an old thread, I want to point out that:


Quote shuoguo, on Nov 4, 2016 at 8:27 AM 

"For senario 2&3, RX==BX"


Is not true in the BAM output and is contradictory to what you said in point 2. Since the barcodes are error corrected, RX can be different from BX, I have come accros this one:




Just to clarify for future readers.