STAR -quantMode GeneCounts vs HTSeq

In this post, we will compare STAR –quantMode GeneCounts vs HTSeq. Their goal is to perform gene quantification during or after read mapping, respectively. To achieve this, they need a GTF file containing the reference genome annotation.

Tools used

STAR --version

2.7.1a

htseq-count

Written by Simon Anders (sanders@fs.tum.de), European Molecular Biology
Laboratory (EMBL). (c) 2010. Released under the terms of the GNU General
Public License v3. Part of the 'HTSeq' framework, version 0.6.1p1.

STAR

STAR is a very efficient free software that enables to align reads to a FASTA file containing the reference genome of the species of interest. Interestingly, it has integrated the –quantMode parameter to count the number of reads per gene at the same type:

STAR \
--runMode alignReads \
--runThreadN ${SLURM_CPUS_PER_TASK} \
--genomeDir ${STARIndexDir} \
--readFilesIn ${reads1} ${reads2} \
--readFilesCommand gunzip -c \
--outFileNamePrefix STAR/${prefix}_ \
--outSAMtype BAM SortedByCoordinate \
--outSAMunmapped Within \
--quantMode GeneCounts \
--limitBAMsortRAM 30000000000

Result of the read mappping and gene counting:

head 10 STAR/${prefix}_ReadsPerGene.out.tab

Head of the gene count table produced by STAR --quantMode GeneCounts

tail 10 STAR/${prefix}_ReadsPerGene.out.tab

Tail of the gene count table produced by STAR --quantMode GeneCounts

HTSeq

HTSeq is another free software that quantifies the number of reads that map to genomic features (genes, exons, binding regions,…) in three different ways. The default mode (union) takes into account the reads that fully and partially overlap genes:

htseq-count -f bam -r pos -s no -a 0 STAR/${prefix}_Aligned.sortedByCoord.out.bam ${GTF} > HTSeq/${prefix}_ReadsPerGene.out.tab

I used the following parameters:

-f option for input file format:

SAM file: -f sam
BAM file: -f bam

-r option for sorting order (for paired-end reads only):

by read position: -r pos
by read name: -r name

-s option for strand specificity (depending on the library preparation method used):

unstranded: -s no (ex: SMART-seq v4 from Takara)
forward: -s yes (ex: QuantSeq 3′ mRNA FWD from Lexogen)
reverse: -s reverse (ex: TruSeq Stranded mRNA from Illumina)

-a option equal to 0 to not skip any alignment because of low quality

Result of the gene counting:

head 10 HTSeq/${prefix}_ReadsPerGene.out.tab

Head of the gene count table produced by HTSeq

tail 10 HTSeq/${prefix}_ReadsPerGene.out.tab

STAR vs HTSeq

Comparison of the structure of the output tables:

	STAR	HTSeq
number of columns	4	2
name of columns	gene_id unstranded read1 read2	gene_id -s parameter
position of the four special counters below	head	tail
unmapped	N_unmapped	__not_aligned
multimapped	N_multimapping	__alignment_not_unique
uniquely mapped to no feature	N_noFeature	__no_feature
uniquely mapped to more than one feature	N_ambiguous	__ambiguous

Comparison of the content of the output tables:

diff <(sort STAR/${prefix}_ReadsPerGene.out.tab | awk '{print $1"\t"$2}') HTSeq/${prefix}_ReadsPerGene.out.tab

Only the part describing the four special counters differ between the two files. I obtained the same number of:

not unique / multimapped reads
uniquely mapped to no feature reads
uniquely mapped to more than one feature / ambiguous reads

However, there is a difference of 20 reads for the number of reads which are not mapped. There are 11,750,839 reads in the fastq file, which corresponds to the sum of the HTSeq output.

What happens if we don’t keep the unmapped reads

Moreover, if I redo the same analysis but with removing the –outSAMunmapped Within command from STAR, I obtain a warning for HTSeq: “Warning: Mate records missing for 20 records”.

Conclusion

In conclusion, I obtained the same gene counts with STAR and HTSeq. Do you also get the same results?

STAR –quantMode GeneCounts vs HTSeq

Tools used

STAR

HTSeq

STAR vs HTSeq

What happens if we don’t keep the unmapped reads

Conclusion

Related posts

Like this:

Comments

Leave a ReplyCancel reply

Tools used

STAR

HTSeq

STAR vs HTSeq

What happens if we don’t keep the unmapped reads

Conclusion

Related posts

Share this:

Like this:

Comments

Leave a ReplyCancel reply