In this tutorial, we will see how to perform a rank-based inverse normal transformation in R before a genome-wide association study (GWAS) or an expression quantitative trait loci (eQTL) analysis.
Indeed, it is recommanded by some people to rank normal transform the phenotypes (to make them follow a normal distribution) before the linear regression to avoid false positive associations.
Here is a subset of my gene expression dataset before the transformation:
columns = read.table("input.bed", comment.char = "", nrows = 1)
input = read.table("input.bed", header = TRUE, stringsAsFactors = FALSE,
check.names = FALSE, comment.char = "",
colClasses = c(rep("NULL", 3), "character", rep("NULL", 2),
rep("numeric", ncol(columns) - 6)))
View(input)
Rank normal transformation in QTLtools
QTLtools can rank normal transform the gene quantifications with the –normal parameter:
QTLtools correct --bed input.bed --normal --out output.bed
We can look at the result in R:
columns = read.table("output.bed", comment.char = "", nrows = 1)
output = read.table("output.bed", header = TRUE, stringsAsFactors = FALSE,
check.names = FALSE, comment.char = "",
colClasses = c(rep("NULL", 3), "character", rep("NULL", 2),
rep("numeric", ncol(columns) - 6)))
View(output)
Rank normal transformation in R
Method 1
First, we will reimplement this function from its source code in R:
normal_transform = function(x){
R = rank(x, na.last = "keep")
R = R - 0.5
maximum = max(R) + 0.5
R = R / maximum
V = qnorm(R)
}
test1 = input
test1[, -1] = t(apply(test1[, -1], 1, normal_transform))
View(test1)
Method 2
Then, we will apply the answer provided to this question to the dataset and compare the results:
test2 = input
test2[, -1] = t(apply(test2[, -1], 1, function(x){qnorm((rank(x, na.last = "keep") - 0.5) / sum(!is.na(x)))}))
The two datasets are equal:
all(test1 == test2)
[1] TRUE
Conclusion
In conclusion, we reimplemented the rank-based inverse normal transformation in R and applied it to each gene. Is this blog post helpful to you?