How to perform a rank normal transformation in R

In this tutorial, we will see how to perform a rank-based inverse normal transformation in R before a genome-wide association study (GWAS) or an expression quantitative trait loci (eQTL) analysis.

Indeed, it is recommanded by some people to rank normal transform the phenotypes (to make them follow a normal distribution) before the linear regression to avoid false positive associations.

Here is a subset of my gene expression dataset before the transformation:

columns = read.table("input.bed", comment.char = "", nrows = 1)

input = read.table("input.bed", header = TRUE, stringsAsFactors = FALSE,
                   check.names = FALSE, comment.char = "",
                   colClasses = c(rep("NULL", 3), "character", rep("NULL", 2),
                                  rep("numeric", ncol(columns) - 6)))

View(input)
Screenshot of the input dataset containing gene expression data

Rank normal transformation in QTLtools

QTLtools can rank normal transform the gene quantifications with the –normal parameter:

QTLtools correct --bed input.bed --normal --out output.bed

We can look at the result in R:

columns = read.table("output.bed", comment.char = "", nrows = 1)

output = read.table("output.bed", header = TRUE, stringsAsFactors = FALSE,
                    check.names = FALSE, comment.char = "",
                    colClasses = c(rep("NULL", 3), "character", rep("NULL", 2),
                                   rep("numeric", ncol(columns) - 6)))

View(output)
Screenshot of the output dataset after the rank-based inverse normal transformation obtained with QTLtools correct --normal

Rank normal transformation in R

Method 1

First, we will reimplement this function from its source code in R:

normal_transform = function(x){
  R = rank(x, na.last = "keep")
  R = R - 0.5
  maximum = max(R) + 0.5
  R = R / maximum
  V = qnorm(R)
}

test1 = input
test1[, -1] = t(apply(test1[, -1], 1, normal_transform))
View(test1)
Screenshot of the output dataset after the rank-based inverse normal transformation reimplemented in R

Method 2

Then, we will apply the answer provided to this question to the dataset and compare the results:

test2 = input
test2[, -1] = t(apply(test2[, -1], 1, function(x){qnorm((rank(x, na.last = "keep") - 0.5) / sum(!is.na(x)))}))

The two datasets are equal:

all(test1 == test2)

Conclusion

In conclusion, we reimplemented the rank-based inverse normal transformation in R and applied it to each gene. Is this blog post helpful to you?

Related posts

Comments

No comments yet. Why don’t you start the discussion?

Leave a Reply