Convert Count to TPM for Secondary Analysis
I have read this article about data conversion: https://kb.10xgenomics.com/hc/en-us/articles/115003684783-How-to-calculate-TPM-RPKM-or-FPKM-instead-...
However, I want to use sigEMD (https://github.com/NabaviLab/SigEMD) for Differentially Expressed Gene analysis, they required to have log2(TPM+1) values as input. I also read some articles regarding conversion from count to TPM, but they need the effective gene length to calculate that, which I don't know how to get from the count matrix generated from cellranger.
Could you please inform me how to do this? Thank you
Re: Convert Count to TPM for Secondary Analysis
Thank you for your question.
We don't calculate TPM (transcripts per kilobase million), because 10x single-cell gene expression assay is 3' or 5'-based and there is no disproportionate read counting due to transcript length.
In traditional RNA-seq data, complete transcripts are fragmented followed by cDNA synthesis, end repair, and adapter ligation. In this workflow, the probability of sampling a fragment from a long transcript is larger than from a short one. Therefore, it makes sense to normalize read counts by the length of the transcript. However, in 10x single cell 3' or 5' gene expression assay, this bias does not exist. Therefore, we would not advise on normalizing reads counts by gene length.
If you are interested in knowing more about our UMI counting algorithms, you can find more detail here: https://support.10xgenomics.com/single-cell-gene-expression/software/pipelines/latest/algorithms/ove....
In addition, sigEMD is not developed by 10x and we cannot assist you with any issue you might experience using it. I would suggest contacting the developer of the tool for any questions. I apologize for any frustration caused by using this third-party tool and hope the developer can assist you in getting it to work.