Today, we will see how to add significance levels to box plots in R. Data will be processed with the tidyr package, the plot will be made with ggplot2 and the significance levels will be added with ggsignif.
Load the packages
library("tidyr")
library("ggplot2")
library("ggsignif")
Load the iris dataset
data("iris")
Variables assignment
dataset = iris
y_columns = c("Sepal.Length", "Sepal.Width", "Petal.Length", "Petal.Width")
fill_column = "Species"
fill_values = c("versicolor", "virginica")
From wide to long format
To reshape data from wide to long format, we will use the pivot_longer function from the tidyr package. Four other functions are introduced in the article From long to wide format in R.
dataset = dataset[dataset[, fill_column] %in% fill_values,]
results_gather = as.data.frame(pivot_longer(dataset, names_to = "variables",
values_to = "values",
-all_of(fill_column)))
Compute p values for each variable
compute_signif = function(variable, dataset, fill_column, fill_values){
results = t.test(dataset[dataset[, "variables"] == variable &
grepl(fill_values[1], dataset[, fill_column]),
"values"],
dataset[dataset[, "variables"] == variable
& grepl(fill_values[2], dataset[, fill_column]),
"values"])$p.value
return(results)
}
p_values = do.call("cbind", lapply(y_columns, dataset = results_gather,
fill_column = fill_column,
fill_values = fill_values, compute_signif))
colnames(p_values) = y_columns
Change significance value to code
signif_code = symnum(p_values, corr = FALSE, na = FALSE,
cutpoints = c(0, 0.001, 0.01, 0.05, 1),
symbols = c("***", "**", "*", "NS"))
Add significance levels to box plots in ggplot2
n = length(y_columns)
ggplot(results_gather, aes_string(x = "variables", y = "values",
fill = fill_column)) +
geom_boxplot() +
geom_signif(y_position = rep(max(results_gather[, "values"], na.rm=TRUE)*1.1,
n),
xmin = seq(from = 0.8, to = 0.8 + n - 1, by = 1),
xmax = seq(from = 1.2, to = 1.2 + n - 1, by = 1),
annotation = signif_code)
Conclusion
In conclusion, we can use the ggsignif R package to add significance brackets to plots made with ggplot2.