- 35 studies in the replications showed positive effect of p < 0.05 compared to 97 original studies
- 82 studies showed a stronger effect size in the original study than in the replication
- Effect size comparisons showed a 47.4% replication success rate
- 39 studies were subjectively rated as successfully replicated
While some news about this publication reported failures in the test (e.g., Nature’s "Over half of psychology studies fail reproducibility test"), the Science article emphasized the challenges of reproducibility itself and care with which interpretations of successes and failures need to be made. The authors of the study pointed out that while replications produced weaker evidence for the original findings,
“It is too easy to conclude that successful replication means that the theoretical understanding of the original finding is correct. Direct replication mainly provides evidence for the reliability of a result. If there are alternative explanations for the original finding, those alternatives could likewise account for the replication. Understanding is achieved through multiple, diverse investigations that provide converging support for a theoretical interpretation and rule out alternative explanations.
It is also too easy to conclude that a failure to replicate a result means that the original evidence was a false positive. Replications can fail if the replication methodology differs from the original in ways that interfere with observing the effect. We conducted replications designed to minimize a priori reasons to expect a different result by using original materials, engaging original authors for review of the designs, and conducting internal reviews. Nonetheless, unanticipated factors in the sample, setting, or procedure could still have altered the observed effect magnitudes.”