How to filter variants with a low MAF

To filter variants with a low minor allele frequency (MAF) from a variant call format (VCF) file, we can use softwares such as bcftools, plink or R.

What is minor allele frequency?

The MAF refers to the frequency of the least frequent allele. This least frequent allele can be the reference allele or the alternate allele. A common threshold is 5% (0.05) but it depends on your cohort size.

1. bcftools

To remove variants with a low MAF using bcftools, we need to use the view command with the –min-af parameter:

bcftools view --min-af 0.05:minor -Oz input.vcf.gz > output.vcf.gz

It is equivalent to using the abbreviated parameter -q instead of –min-af:

bcftools view -q 0.05:minor input.vcf.gz > output.vcf

2. plink

To make plink (v1.9) filter variants with a low MAF, you need to use the –maf parameter:

plink --vcf input.vcf.gz --maf 0.05 --recode vcf-iid --out output.vcf

3. R

a) Import the dosages

If you don’t know how to open VCF files and extract dosages in R, I invite you to read my blog post about How to extract GT or DS fields from VCF files in R. The dosages variable is a matrix where the columns correspond to individuals and the rows correspond to variants.

b) Compute the MAF

To filter variants with a low MAF, we first need to compute it:

dosages[, "MAF"] = rowSums(round(dosages)) / (2 * (ncol(dosages)))

This command was slightly adapted from this topic.

c) Filter the variants

Then, we need to keep the variants with a MAF between 0.05 and 0.95:

dosages = dosages[dosages[, "MAF"] >= 0.05 & dosages[, "MAF"] <= 0.95,]

Conclusion

In conclusion, we can easily filter variants with a low minor allele frequency using bcftools, plink and R. What else would you like to do with VCF files?

Related posts

Comments

No comments yet. Why don’t you start the discussion?

Leave a Reply