How to add significance levels in R

Today, we will see how to add significance levels to box plots in R. Data will be processed with the tidyr package, the plot will be made with ggplot2 and the significance levels will be added with ggsignif.

Load the packages

library("tidyr")
library("ggplot2")
library("ggsignif")

Load the iris dataset

data("iris")

Variables assignment

dataset = iris
y_columns = c("Sepal.Length", "Sepal.Width", "Petal.Length", "Petal.Width")
fill_column = "Species"
fill_values = c("versicolor", "virginica")

From wide to long format

To reshape data from wide to long format, we will use the pivot_longer function from the tidyr package. Four other functions are introduced in the article From long to wide format in R.

dataset = dataset[dataset[, fill_column] %in% fill_values,]
results_gather = as.data.frame(pivot_longer(dataset, names_to = "variables",
                                            values_to = "values",
                                            -all_of(fill_column)))
iris dataset after gathering the four variables (sepal length, sepal width, petal length and petal width) into a column named "variables"

Compute p values for each variable

compute_signif = function(variable, dataset, fill_column, fill_values){
  results = t.test(dataset[dataset[, "variables"] == variable &
                             grepl(fill_values[1], dataset[, fill_column]),
                           "values"],
                   dataset[dataset[, "variables"] == variable
                           & grepl(fill_values[2], dataset[, fill_column]),
                           "values"])$p.value
  return(results)
}

p_values = do.call("cbind", lapply(y_columns, dataset = results_gather,
                                   fill_column = fill_column,
                                   fill_values = fill_values, compute_signif))
colnames(p_values) = y_columns
p value compute for the four variables (sepal length, sepal width, petal length and petal width)

Change significance value to code

signif_code = symnum(p_values, corr = FALSE, na = FALSE,
                     cutpoints = c(0, 0.001, 0.01, 0.05, 1),
                     symbols = c("***", "**", "*", "NS"))

Add significance levels to box plots in ggplot2

n = length(y_columns)

ggplot(results_gather, aes_string(x = "variables", y = "values",
                                  fill = fill_column)) + 
  geom_boxplot() +
  geom_signif(y_position = rep(max(results_gather[, "values"], na.rm=TRUE)*1.1,
                               n),
              xmin = seq(from = 0.8, to = 0.8 + n - 1, by = 1), 
              xmax = seq(from = 1.2, to = 1.2 + n - 1, by = 1),
              annotation = signif_code)
Box plot with significance levels produced with ggplot2

Conclusion

In conclusion, we can use the ggsignif R package to add significance brackets to plots made with ggplot2.

Related posts

Comments

No comments yet. Why don’t you start the discussion?

Leave a Reply