Commentary

Taking Critical Appraisal to Extremes

Author and Disclosure Information

The Need for Balance in the Evaluation of Evidence


 

References

In January 2000 an article in The Lancet drew attention when it questioned the supporting evidence for screening mammography.1 Danish investigators Peter Gøtzsche and Ole Olsen presented a series of apparent flaws in the 8 randomized trials of mammography, ultimately concluding that screening is unjustified. Their cogent arguments and the press coverage they received left many physicians wondering whether they should continue to order mammograms. The story led the CBS Evening News2 and was featured in the Washington Post,3 Time,4 and Reuters.5

A Patient-Oriented Evidence that Matters (POEM) review in the April 2000 issue of The Journal of Family Practice6 that addressed The Lancet study lent apparent support to these concerns. The POEM related the arguments in The Lancet article without challenging them and concluded that “mammography screening has never been shown to help women to live longer.” The authors of this POEM suggested that the only reasons for screening to continue are “politics, patients’ preconceptions, and the fear of litigation.” Unlike most POEMs, this one included no critical appraisal of the methods or assumptions of the reviewed study. This lack of comment, combined with the authors’ negative remarks about mammography, may have convinced family physicians that the criticisms of Gøtzsche and Olsen were beyond dispute.

However, controversy does surround their arguments, as the many letters to the editor published in The Lancet attest.7 For example, The Lancet critique made much of inconsistent sample sizes and baseline dissimilarities between screened and unscreened women. The authors asserted that such age and socioeconomic differences were “incompatible with adequate randomization.” That premise is contestable. It is normal and predictable that a proportion of population variables will differ between groups for statistical reasons, no matter how perfect the randomization. Also, the observed age difference (1 to 6 months) would not explain the 21% reduction in mortality observed in the trials.8

For Gøtzsche and Olsen the discrepant age patterns and sample sizes were less a cause of the results than a warning sign that randomization had been subverted (because of failure to conceal allocation). Since mortality in the screened and unscreened groups differed by only a relatively small number of deaths, they reasoned that very little bias would be necessary to tip the scales in favor of mammography.

Several arguments weaken their case, however. First, they offered no evidence that subversion or unconcealed allocation actually occurred. They equated inexplicit documentation of procedures (and dissimilar group characteristics) with improper randomization. Second, even if unconcealed allocation occurred, it does not in itself thwart randomization. Investigators who know to which group a patient will be assigned can still follow the rules and make the correct assignment. Anecdotal reports of subversion (by deciphering assignment sequences to divert or target patients for allocation) do not offer denominator data to assess how often this occurs.9 It would have had to occur in every trial that favored mammography to uphold the authors’ allegations. Third, even if the trials were subverted there is no indication that case mix differed enough to skew outcomes. Age differences were minor; the authors speculated that sizable imbalances in unmeasured factors could have altered results, but they gave no evidence. They cited reports that poorly concealed allocation is associated with a 37% to 41% exaggeration in odds ratios,10,11 but these reports concerned other trials and made arguable assumptions. Finally, their confirmatory finding—that only the 6 “flawed” trials reported a benefit for mammography and that the 2 acceptable trials showed no effect—was based on recalculated relative risk rates. The original trial data show no such pattern.8

This is not to suggest that weaknesses in the mammography trials do not merit scrutiny. Others have also voiced criticisms.12 But the alarm raised by Gøtzsche and Olsen goes further, compelling us to rethink the purpose of critical appraisal and the extremes at which it might cause more harm than good.

Excessive critical appraisal

We seek perfection in evidence to safeguard patients. Prematurely adopting (or abandoning) interventions through uncritical acceptance of findings risks overlooking potential harms or more effective alternatives. But critical appraisal can do harm if valid evidence is rejected. Deciding whether to accept evidence counterbalances the risks of acceptance against the risks of rejection, which are inversely related. At one extreme of the spectrum, where data are accepted on face value (no appraisal), the risk of a type I error (accepting evidence of efficacy when the intervention does not work or causes harm) is high, and that of a type II error (discarding evidence when the intervention actually works) is low. At the other extreme (excessive scrutiny) the risk of a type II error is great; such errors harm patients because knowledge is rejected that can save (or improve) lives. Obviously, patients are best served somewhere in the middle, striking an optimal balance between the risks of type I and type II errors.

Pages

Recommended Reading

Acceptance of Reassurance as Treatment
MDedge Family Medicine
Remission of Alcohol Disorders
MDedge Family Medicine
Which venous leg ulcers will heal using limb compression bandages?
MDedge Family Medicine
Is treatment of hypertension with acetylcholine esterase inhibitors (ACEIs) superior to other antihypertensives in preventing significant cardiovascular events and death in patients with type 2 diabetes?
MDedge Family Medicine
Which pharmacologic therapies are effective in preventing acute mountain sickness?
MDedge Family Medicine
Is either sotalol or amiodarone more effective than digoxin for converting patients with new-onset atrial fibrillation (AF) to sinus rhythm within 48 hours?
MDedge Family Medicine
What is the differential diagnosis for patients with symptoms of congestive heart failure (CHF) and normal systolic function?
MDedge Family Medicine
Is test-and-eradicate or prompt endoscopy more effective for treatment of dyspepsia in Helicobacter pylori–positive patients?
MDedge Family Medicine
Is diltiazem as effective as diuretics and b-blockers in preventing complications from hypertension?
MDedge Family Medicine
Does exposure of young children to older siblings or to children at day-care facilities protect against the development of asthma later in childhood?
MDedge Family Medicine