Aug 31, 2015

Lessons from replication of research in psychology

Science magazine has published an article “Estimating the reproducibility of psychological science”, which reports the first findings from 100 replications completed by 270 contributing authors. A quasi-random sample was drawn from three psychology journals: Psychological Science (PSCI), Journal of Personality and Social Psychology (JPSP), and Journal of Experimental Psychology: Learning, Memory, and Cognition (JEP:LMC). The replications were performed by teams and then independently reviewed by other researchers and reproduced by another analyst. The reproducibility was evaluated using significance and P values, effect sizes, subjective assessments of replication teams, and meta-analyses of effect sizes. Some highlights from the results:

  • 35 studies in the replications showed positive effect of p < 0.05 compared to 97 original studies

  • 82 studies showed a stronger effect size in the original study than in the replication

  • Effect size comparisons showed a 47.4% replication success rate

  • 39 studies were subjectively rated as successfully replicated


While some news about this publication reported failures in the test (e.g., Nature’s "Over half of psychology studies fail reproducibility test"), the Science article emphasized the challenges of reproducibility itself and care with which interpretations of successes and failures need to be made. The authors of the study pointed out that while replications produced weaker evidence for the original findings,
“It is too easy to conclude that successful replication means that the theoretical understanding of the original finding is correct. Direct replication mainly provides evidence for the reliability of a result. If there are alternative explanations for the original finding, those alternatives could likewise account for the replication. Understanding is achieved through multiple, diverse investigations that provide converging support for a theoretical interpretation and rule out alternative explanations.

It is also too easy to conclude that a failure to replicate a result means that the original evidence was a false positive. Replications can fail if the replication methodology differs from the original in ways that interfere with observing the effect. We conducted replications designed to minimize a priori reasons to expect a different result by using original materials, engaging original authors for review of the designs, and conducting internal reviews. Nonetheless, unanticipated factors in the sample, setting, or procedure could still have altered the observed effect magnitudes.”