How to filter reads that overlap indels in a BAM file

Today, we will see how to filter reads in a BAM file that overlap indels defined in a VCF file using the samtools software.

Tools used

In this post, we will use the following version of bcftools and samtools:

bcftools --version
bcftools 1.11
Using htslib 1.11
Copyright (C) 2020 Genome Research Ltd.
License Expat: The MIT/Expat license
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.
samtools --version
samtools 1.9
Using htslib 1.9
Copyright (C) 2018 Genome Research Ltd.

What are indels

The term “indel” corresponds to insertions or deletions of bases in the genome compared to a reference. In a VCF file, indels are recognized by the different length between the reference and the alternative alleles.

Extract indels from the VCF file

Before using samtools to work on the BAM file, we need to extract indels from the VCF file with bcftools for example:

bcftools view --types indels ${VCF_file} | awk '/#CHROM/,EOF {print $1"\t"$2-1"\t"$2}' | awk '{sub(/^chr/,"",$0)} 1' | sed 1d > ${VCF_file/.vcf.gz/_indels.bed}

Create index for the BAM file

The -M parameter from samtools (which will be used in the next section) requires an indexed file:

samtools index ${prefix}_Aligned.sortedByCoord.out.bam

Extract reads that overlap indels

On the one hand, you can use the following command to extract the reads that overlap indels:

samtools view -M -L ${VCF_file/.vcf.gz/_indels.bed} ${prefix}_Aligned.sortedByCoord.out.bam | awk '{print $0}' > ${prefix}_overlap_indels.txt

Remove reads that overlap indels

On the other hand, you can decide to remove the reads that overlap indels from the BAM file:

samtools view -h ${prefix}_Aligned.sortedByCoord.out.bam | grep -f ${prefix}_overlap_indels.txt -F -v -w | samtools view -h -b > ${prefix}_Aligned.sortedByCoord.out.bam

Conclusion

To sum up, we can use the samtools software to filter reads that overlap indels. Is this post helpful to you?

Related posts

Comments

No comments yet. Why don’t you start the discussion?

Leave a Reply