On the insignificance of significance

In this post, Alexander Feckler talks about their recently published paper “When significance becomes insignificant: effect sizes and their uncertainties in Bayesian and frequentist frameworks as an alternative approach when analyzing ecotoxicological data”.

The statistical evaluation of test results is a key component in science. In ecotoxicology, for instance, we want to know whether test organisms under stress differ in their performance from their unstressed counterparts. Most commonly we employ null hypothesis significance testing (NHST), which provides us with a value on the probability that our test results are due to chance alone – the p-value. When this p-value gets small enough (conventionally < 0.05), we decide that pure chance was not the underlying reason for our results. In this case, we say that the results are statistically significant and accept chemical stress as the actual reason for a change in the performance of our test organisms. The often-criticized aspect of NHST, however, is that it concentrates only on the statistical significance and does not inform on the relevance of the effect itself.

Bayesian probability distribution illustrating the effect size as well as its certainty and probability (graphic by A. Feckler)

For many questions in ecotoxicology, it is generally more valuable to know about the magnitude and certainty of an effect as well as its probability to occur, to ultimately evaluate its biological relevance. Thus, we compared how well inferences based on NHST on one hand and effect-size based statistics in Bayesian and frequentist frameworks on the other hand are comparable, using data from previously published studies (see related posts “You are what you eat!“, “You are what you eat – continued“, and “Does the exposure pathway influence fungicides’ toxicity?“).

Our results indicate that NHST dismissed the biological relevance of more effects relative to the other methods. This is because effects showed non-significance using NHST (i.e., p > 0.05) – and were therefore judged as not important – despite effect sizes (and their probabilities in the Bayesian framework) suggesting biological relevance. On the other hand, NHST misidentified differences between stressed and unstressed test organisms as significant, although the respective effect sizes were marginal.

Our paper therefore highlights that effect-size based statistics can optimize the statistical evaluation of ecotoxicological test results, since this approach moves the focus from the significance of effects towards their magnitude, certainty and probability. We thus suggest, as a first step, that reporting effect sizes and their uncertainties along with p-values would facilitate others to make an informed judgement on the biological relevance of the effects based on their expert knowledge. For a more comprehensive picture, we additionally propose to further the use of Bayesian methods, as these allow us to add information on the likelihood of observed effect sizes – a feature that cannot be provided by frequentist methods.

The paper was authored by Alexander Feckler, Matthew Low, Jochen Zubrod, and Mirco Bundschuh, and published in Environmental Toxicology and Chemistry.

For further reading on how to calculate frequentist confidence intervals, see “Provide some confidence!