News

Lies, damn lies, and research: Improving reproducibility in biomedical science

Publish date: March 24, 2016

View on the News

Guidelines should not seem threatening

In the recent Nature editorial, “Repetitive flaws,” comments are offered regarding the new NIH guidelines that require grant proposals to account for biological variables and describe how experimental materials may be authenticated (2016 Jan 21. doi: 10.1038/529256a). It is proposed that these requirements will attempt to improve the quality and reproducibility of research. Many concerns regarding scientific reproducibility have been raised in the past few years. As the editorial states, the NIH guidelines “can help to make researchers aspire to the values that produced them” and they can “inspire researchers to uphold their identity and integrity.”

To those investigators who strive to report only their best results following exhaustive and sincere confirmation, these guidelines will not seem threatening. Providing experimental details of one’s work is helpful in many ways (you can personally reproduce them with new and different lab personnel or after a lapse of time, you will have excellent experimental records, you will have excellent documentation when it comes time to write another grant, and so on), and I have personally been frustrated when my laboratory cannot duplicate published work of others. However, questions raised include who will pay for reproducing the work of others and how will the sacrifice of additional animals or subjects be justified? Many laboratories are already financially strapped due to current funding challenges and time is also extremely valuable. In addition, junior researchers are on tenure and promotion timelines that provide stress and need for publications to establish independence and credibility, and established investigators must document continued productivity to be judged adequate to obtain continued funding.

The quality of peer review of research publications has also been challenged recently, adding to the concern over the veracity of published research. Many journals now have mandatory statistical review prior to acceptance. This also delays time to publication. In addition, the generous reviewers who perform peer review often do so at the cost of their valuable, uncompensated time.

Despite these hurdles and questions, those who perform valuable and needed research to improve the lives and care of our patients must continue to strive to produce the highest level of evidence.

Dr. Jennifer S. Lawton is a professor of surgery at the division of cardiothoracic surgery, Washington University, St. Louis. She is also an associate medical editor for Thoracic Surgery News.

References

The issue of scientific reproducibility has come to the fore in the past several years, driven by noteworthy failures to replicate critical findings in several much-publicized reports coupled to a series of scandals calling into question the role of journals and granting agencies in maintaining quality and oversight.

In a special Nature online collection, the journal assembled articles and perspectives from 2011 to the present dealing with this issue of research reproducibility in science and medicine. These articles were supplemented with current editorial comment.

Seeing these broad spectrum concerns pulled together in one place makes it difficult not to be pessimistic about the current state of research investigations across the board. The saving grace, however, is that these same reports show that a lot of people realize that there is a problem – people who are trying to make changes and who are in a position to be effective.

According to the reports presented in the collection, the problems in research accountability and reproducibility have grown to an alarming extent. In one estimate, irreproducibility ends up costing biomedical research some $28 billion wasted dollars per year (Nature. 2015 Jun 9. doi: 10.1038/nature.2015.17711).

A litany of concerns

In 2012, scientists at AMGEN (Thousand Oaks, Calif.) reported that, even cooperating closely with the original investigators, they were able to reproduce only 6 of 53 studies considered to be benchmarks of cancer research (Nature. 2016 Feb 4. doi: 10.1038/nature.2016.19269).

Scientists at Bayer HealthCare reported in Nature Reviews Drug Discovery that they could successfully reproduce results in only a quarter of 67 so-called seminal studies (2011 Sep. doi: 10.1038/nrd3439-c1).

According to a 2013 report in The Economist, Dr. John Ioannidis, an expert in the field of scientific reproducibility, argued that in his field, “epidemiology, you might expect one in ten hypotheses to be true. In exploratory disciplines like genomics, which rely on combing through vast troves of data about genes and proteins for interesting relationships, you might expect just one in a thousand to prove correct.”

This increasing litany of irreproducibility has raised alarm in the scientific community and has led to a search for answers, as so many preclinical studies form the precursor data for eventual human trials.

Despite the concerns raised, human clinical trials seem to be less at risk for irreproducibility, according to an editorial by Dr. Francis S. Collins, director, and Dr. Lawrence A. Tabak, principal deputy director of the U.S. National Institutes of Health, “because they are already governed by various regulations that stipulate rigorous design and independent oversight – including randomization, blinding, power estimates, pre-registration of outcome measures in standardized, public databases such as ClinicalTrials.gov and oversight by institutional review boards and data safety monitoring boards. Furthermore, the clinical trials community has taken important steps toward adopting standard reporting elements,” (Nature. 2014 Jan. doi: 10.1038/505612a).

The paucity of P

Today, the P-value, .05 or less, is all too often considered the sine qua non of scientific proof. “Most statisticians consider this appalling, as the P value was never intended to be used as a strong indicator of certainty as it too often is today. Most scientists would look at [a] P value of .01 and say that there was just a 1% chance of [the] result being a false alarm. But they would be wrong.” The 2014 report goes on to state how, according to one widely used calculation by authentic statisticians, a P value of .01 corresponds to a false-alarm probability of at least 11%, depending on the underlying probability that there is a true effect; a P value of .05 raises that chance of a false alarm to at least 29% (Nature. 2014 Feb. doi: 10.1038/506150a).

Beyond this assessment problem, P values may allow for considerable researcher bias, conscious and unconscious, even to the extent of encouraging “P-hacking”: one of the few statistical terms to ever make it into the Urban Dictionary. “P-hacking is trying multiple things until you get the desired result” – even unconsciously, according to one researcher quoted.

In addition, “unless statistical power is very high (and much higher than in most experiments), the P value should be interpreted tentatively at best” (Nat Methods. 2015 Feb 26. doi: 10.1038/nmeth.3288).

So bad is the problem that “misuse of the P value – a common test for judging the strength of scientific evidence – is contributing to the number of research findings that cannot be reproduced,” the American Statistical Association warns in a statement released in March, adding that the P value cannot be used to determine whether a hypothesis is true or even whether results are important (Nature. 2016 Mar 7. doi: 10.1038/nature.2016.19503).

Lies, damn lies, and research: Improving reproducibility in biomedical science

View on the News

References

Pages

Recommended Reading