How to create Manhattan plots in R

In this blog post, we will see how to make Manhattan plots in R with real data. Indeed, we will use the association statistics from an article that reports risk loci from genome-wide association studies (GWAS) for one group of autoimmune diseases.

What is a Manhattan plot?

A Manhattan plot is a scatter plot that shows the result of a GWAS: the x axis corresponds to the genomic position of the variants in each chromosome and the y axis corresponds to the significance of the association with a disease or trait. The higher the variant, the more associated it is with the disease or trait.

Import the data

First, we will import the GWAS data with the fread function of the data.table library which allows files to be read faster:

library(data.table)

import_dataset = function(disease){
  
  if (disease == "CD"){
    filename = "cd_build37_40266_20161107.txt.gz"
    
  } else if (disease == "IBD"){
    filename = "ibd_build37_59957_20161107.txt.gz"
    
  } else if (disease == "UC"){
    filename = "uc_build37_45975_20161107.txt.gz"
  }
  
  dataset = as.data.frame(data.table::fread(filename))
  
  return(dataset)
}

diseases = c("CD", "UC", "IBD")
manhattan = do.call("rbind", lapply(diseases, import_dataset))

Prepare the data

Next, we will prepare the data for plotting. We need to:

extract the chromosome and genomic position of each variant
compute the -log10(p-values)
define a different color on every other chromosome
compute the cumulative length and center of each chromosome
compute the cumulative genomic position of each variant

Here is the implementation in base R:

manhattan[, "chr"] = sub(":.*", "", manhattan[, "MarkerName"])
manhattan[, "pos"] = as.numeric(sub(".*:", "", sub("_.*", "",
                                                   manhattan[, "MarkerName"])))
manhattan[, "log_p_value"] = -log10(manhattan[, "P.value"])
manhattan[, "col"] = ifelse(manhattan[, "chr"] %in% seq(1, 22, by = 2),
                            "black", "darkgrey")

chroms = NULL
cum_length = 0

for (chr in unique(manhattan[, "chr"])){
  min_pos = min(manhattan[manhattan[, "chr"] == chr, "pos"])
  max_pos = max(manhattan[manhattan[, "chr"] == chr, "pos"])
  center = (min_pos + max_pos + 2 * cum_length) / 2
  chroms = rbind(chroms, data.frame(chr, cum_length, center))
  cum_length = cum_length + max_pos
}

manhattan = merge(manhattan, chroms, by = "chr")
manhattan[, "cum_pos"] = manhattan[, "cum_length"] + manhattan[, "pos"]
manhattan = manhattan[order(manhattan[, "cum_pos"]),]

Plot the data

Then, we will make the Manhattan plot in base R with a significance threshold of 7.3:

png(file = "GWAS.png", width = 2000, height = 500, type = "cairo")

plot(x = manhattan[, "cum_pos"], y = manhattan[, "log_p_value"],
     col = manhattan[, "col"], xlab = "Chromosome", ylab = "-log10(p)",
     cex.lab = 1.1, pch = 19, cex = 0.5, axes = FALSE)

axis(1, cex.axis = 1.1, at = unique(manhattan[, "center"]),
     labels = unique(manhattan[, "chr"]), col = "white")

axis(2, cex.axis = 1.1)

abline(h = 7.3, col = "red", lty = "dashed", lwd = 2)

dev.off()

Conclusion

In conclusion, it is really easy to make Manhattan plots in base R! Have you found what you were looking for?

How to create Manhattan plots in R

What is a Manhattan plot?

Import the data

Prepare the data

Plot the data

Conclusion

Related posts

Like this:

Comments

Leave a ReplyCancel reply

What is a Manhattan plot?

Import the data

Prepare the data

Plot the data

Conclusion

Related posts

Share this:

Like this:

Comments

Leave a ReplyCancel reply