Data collection studies should be replicable and reproducable.

A replicable study yields the same results when it is repeated by other researchers. There are many reasons why a study may fail to replicate. For example, the results may have been a fluke or may not generalize to other populations or settings. In other cases, the original researchers may have made a mistake in data collection or analysis, invalidating the findings. Understanding the conditions under which a study does or does not replicate sharpens the scientific contribution of the original study.

A broad notion of scientific replication involves repeating a study multiple times using data collected from a new population, often in different settings or analysed using different methods from the original study. As noted in the illustration above, the results may not replicate if they apply only to individuals with particular characteristics. The results may also not replicate if the original findings were a statistical fluke. When sample sizes are small, estimates are noisy and thus an erroneously large estimate is more likely to arise by chance. Researchers who run several experiments may choose to publish only studies that find large effects and to ignore those with null findings, which generates statistical bias often referred to as “publication bias” or the “file drawer problem.” In these cases, rerunning the same experiment with a larger sample size will often result in a smaller or even non-existent effect.

Transparency in an Era of Data-Driven Policy: The Importance of Reproducible Research