How to create Manhattan plots in R

In this blog post, we will see how to make Manhattan plots in R with real data. Indeed, we will use the association statistics from an article that reports risk loci from genome-wide association studies (GWAS) for one group of autoimmune diseases.

What is a Manhattan plot?

A Manhattan plot is a scatter plot that shows the result of a GWAS: the x axis corresponds to the genomic position of the variants in each chromosome and the y axis corresponds to the significance of the association with a disease or trait. The higher the variant, the more associated it is with the disease or trait.

Import the data

First, we will import the GWAS data with the fread function of the data.table library which allows files to be read faster:

library(data.table)

import_dataset = function(disease){
  
  if (disease == "CD"){
    filename = "cd_build37_40266_20161107.txt.gz"
    
  } else if (disease == "IBD"){
    filename = "ibd_build37_59957_20161107.txt.gz"
    
  } else if (disease == "UC"){
    filename = "uc_build37_45975_20161107.txt.gz"
  }
  
  dataset = as.data.frame(data.table::fread(filename))
  
  return(dataset)
}

diseases = c("CD", "UC", "IBD")
manhattan = do.call("rbind", lapply(diseases, import_dataset))

Prepare the data

Next, we will prepare the data for plotting. We need to:

  1. extract the chromosome and genomic position of each variant
  2. compute the -log10(p-values)
  3. define a different color on every other chromosome
  4. compute the cumulative length and center of each chromosome
  5. compute the cumulative genomic position of each variant

Here is the implementation in base R:

manhattan[, "chr"] = sub(":.*", "", manhattan[, "MarkerName"])
manhattan[, "pos"] = as.numeric(sub(".*:", "", sub("_.*", "",
                                                   manhattan[, "MarkerName"])))
manhattan[, "log_p_value"] = -log10(manhattan[, "P.value"])
manhattan[, "col"] = ifelse(manhattan[, "chr"] %in% seq(1, 22, by = 2),
                            "black", "darkgrey")

chroms = NULL
cum_length = 0

for (chr in unique(manhattan[, "chr"])){
  min_pos = min(manhattan[manhattan[, "chr"] == chr, "pos"])
  max_pos = max(manhattan[manhattan[, "chr"] == chr, "pos"])
  center = (min_pos + max_pos + 2 * cum_length) / 2
  chroms = rbind(chroms, data.frame(chr, cum_length, center))
  cum_length = cum_length + max_pos
}

manhattan = merge(manhattan, chroms, by = "chr")
manhattan[, "cum_pos"] = manhattan[, "cum_length"] + manhattan[, "pos"]
manhattan = manhattan[order(manhattan[, "cum_pos"]),]

Plot the data

Then, we will make the Manhattan plot in base R with a significance threshold of 7.3:

png(file = "GWAS.png", width = 2000, height = 500, type = "cairo")

plot(x = manhattan[, "cum_pos"], y = manhattan[, "log_p_value"],
     col = manhattan[, "col"], xlab = "Chromosome", ylab = "-log10(p)",
     cex.lab = 1.1, pch = 19, cex = 0.5, axes = FALSE)

axis(1, cex.axis = 1.1, at = unique(manhattan[, "center"]),
     labels = unique(manhattan[, "chr"]), col = "white")

axis(2, cex.axis = 1.1)

abline(h = 7.3, col = "red", lty = "dashed", lwd = 2)

dev.off()
Manhattan plot for CD, IBD and UC in R

Conclusion

In conclusion, it is really easy to make Manhattan plots in base R! Have you found what you were looking for?

Related posts

Comments

No comments yet. Why don’t you start the discussion?

Leave a Reply