mRNA Expression HT-Seq Normalization
RNA-Seq expression level read counts produced by HT-Seq are normalized using two similar methods: FPKM and FPKM-UQ. Normalized values should be used only within the context of the entire gene set. Users are encouraged to normalize raw read count values if a subset of genes is investigated.
The Fragments per Kilobase of transcript per Million mapped reads (FPKM) calculation normalizes read count by dividing it by the gene length and the total number of reads mapped to protein-coding genes.
Upper Quartile FPKM
The upper quartile FPKM (FPKM-UQ) is a modified FPKM calculation in which the total protein-coding read count is replaced by the 75th percentile read count value for the sample.
- RCg: Number of reads mapped to the gene
- RCpc: Number of reads mapped to all protein-coding genes
- RCg75: The 75th percentile read count value for genes in the sample
- L: Length of the gene in base pairs; Calculated as the sum of all exons in a gene
Note: The read count is multiplied by a scalar (109) during normalization to account for the kilobase and ‘million mapped reads’ units.
Sample 1: Gene A
- Gene length: 3,000 bp
- 1,000 reads mapped to Gene A
- 1,000,000 reads mapped to all protein-coding regions
- Read count in Sample 1 for 75th percentile gene: 2,000
FPKM for Gene A = (1,000)*(10^9)/[(3,000)*(1,000,000)] = 333.33
FPKM-UQ for Gene A = (1,000)*(10^9)/[(3,000)*(2,000)] = 166,666.67