In this blog post, we will see how to make Manhattan plots in R with real data. Indeed, we will use the association statistics from an article that reports risk loci from genome-wide association studies (GWAS) for one group of autoimmune diseases.
What is a Manhattan plot?
A Manhattan plot is a scatter plot that shows the result of a GWAS: the x axis corresponds to the genomic position of the variants in each chromosome and the y axis corresponds to the significance of the association with a disease or trait. The higher the variant, the more associated it is with the disease or trait.
Import the data
First, we will import the GWAS data with the fread function of the data.table library which allows files to be read faster:
library(data.table)
import_dataset = function(disease){
if (disease == "CD"){
filename = "cd_build37_40266_20161107.txt.gz"
} else if (disease == "IBD"){
filename = "ibd_build37_59957_20161107.txt.gz"
} else if (disease == "UC"){
filename = "uc_build37_45975_20161107.txt.gz"
}
dataset = as.data.frame(data.table::fread(filename))
return(dataset)
}
diseases = c("CD", "UC", "IBD")
manhattan = do.call("rbind", lapply(diseases, import_dataset))
Prepare the data
Next, we will prepare the data for plotting. We need to:
- extract the chromosome and genomic position of each variant
- compute the -log10(p-values)
- define a different color on every other chromosome
- compute the cumulative length and center of each chromosome
- compute the cumulative genomic position of each variant
Here is the implementation in base R:
manhattan[, "chr"] = sub(":.*", "", manhattan[, "MarkerName"])
manhattan[, "pos"] = as.numeric(sub(".*:", "", sub("_.*", "",
manhattan[, "MarkerName"])))
manhattan[, "log_p_value"] = -log10(manhattan[, "P.value"])
manhattan[, "col"] = ifelse(manhattan[, "chr"] %in% seq(1, 22, by = 2),
"black", "darkgrey")
chroms = NULL
cum_length = 0
for (chr in unique(manhattan[, "chr"])){
min_pos = min(manhattan[manhattan[, "chr"] == chr, "pos"])
max_pos = max(manhattan[manhattan[, "chr"] == chr, "pos"])
center = (min_pos + max_pos + 2 * cum_length) / 2
chroms = rbind(chroms, data.frame(chr, cum_length, center))
cum_length = cum_length + max_pos
}
manhattan = merge(manhattan, chroms, by = "chr")
manhattan[, "cum_pos"] = manhattan[, "cum_length"] + manhattan[, "pos"]
manhattan = manhattan[order(manhattan[, "cum_pos"]),]
Plot the data
Then, we will make the Manhattan plot in base R with a significance threshold of 7.3:
png(file = "GWAS.png", width = 2000, height = 500, type = "cairo")
plot(x = manhattan[, "cum_pos"], y = manhattan[, "log_p_value"],
col = manhattan[, "col"], xlab = "Chromosome", ylab = "-log10(p)",
cex.lab = 1.1, pch = 19, cex = 0.5, axes = FALSE)
axis(1, cex.axis = 1.1, at = unique(manhattan[, "center"]),
labels = unique(manhattan[, "chr"]), col = "white")
axis(2, cex.axis = 1.1)
abline(h = 7.3, col = "red", lty = "dashed", lwd = 2)
dev.off()
Conclusion
In conclusion, it is really easy to make Manhattan plots in base R! Have you found what you were looking for?