This time, we will compare STAR –waspOutputMode SAMtag vs WASP. Both of them perform allele-specific read mapping.
STAR
STAR is a well-known free software that align reads to a reference genome.
vA tag: variant allele
- 1 means variant base in the read is the reference allele
- 2 means variant base in the read is the alternative allele
- 3 means variant base in the read is another base than reference and alternative allele
- 4 means variant base in the read is N
Examples of reads overlapped by 1, 2, 4 or 8 variants, respectively:
- vA:B:c,3
- vA:B:c,2,1
- vA:B:c,1,2,1,4
- vA:B:c,3,2,2,2,2,2,2,2
vG tag: 0-based genomic coordinate of the variant
Examples of reads overlapped by 1, 2, 4 or 8 variants, respectively:
- vG:B:i,965124
- vG:B:i,965349,965349
- vG:B:i,1013489,1013540,1014227,1014273
- vG:B:i,41303372,41303440,41303441,41303473,41303485,41303499,41303661,41303664
vW tag: result of WASP filtering
- 1 means alignment passed (vW:i:1)
- 2 means multi-mapping read (vW:i:2)
- 3 means variant base in the read is N (vW:i:3)
- 4 means remapped read dit not map (vW:i:4)
- 5 means remapped read multi-maps (vW:i:5)
- 6 means remapped read maps to a different locus (vW:i:6)
- 7 means read overlaps more than 10 variants (vW:i:7)
WASP
WASP is a pipeline that corrects for allelic mapping biases, among other things.
STAR vs WASP
STAR (version 2.7.1a) | WASP (version 0.3.4) | |
removes reads that overlap indels | no | yes |
is able to change the maximum number of SNPs that can overlap a read | no (default = 10) | yes (default = 6) |
is able to take phase information into account | no | yes |
algorithms used | STAR grep HTSeq | snp2h5 or extract_vcf_snps.sh STAR find_intersecting_snps.py STAR filter_remapped_reads.py samtools merge samtools sort samtools index HTSeq |
None of them are able to use reads that overlap insertions / deletions (indels): WASP removes those reads while STAR ignores the indels. However, we can:
- keep reads that overlap indels with WASP by removing indels from the VCF file
- remove reads that overlap indels with STAR by looking at position of the reads
Conclusion
In conclusion, WASP (the original algorithm) is configurable but it requires many tools and is very slow compared to STAR –waspOutputMode (the re-implementation).