Two important systematic reviews of the literature appear in this issue of JFP. Smucny and colleagues1 address the question of whether people with cough following an acute respiratory infection should be routinely offered b-agonists. Scholten and coworkers2 ask which clinical examination maneuvers of the knee are accurate for diagnosing meniscus lesions. These questions are important, because the answers might change the way we practice. What do they mean, and what should we make of the increasing number of reviews of the literature? Our commentary is designed to provide a short description of systematic reviews, using these 2 papers as examples. First we will look at the requirements for a good systematic review and then see how these 2 papers matched up.
What characterizes a good systematic review?
Reviews of the literature are designed for clinicians who need to know what information published research can provide for patient care but who do not have the time (or the skills) to search it out for themselves. Systematic reviews are designed to minimize the biases of the reviewer, which are all too easily introduced unless the data coming from the literature are treated with the same objectivity and respect as the data in a primary trial. If certain questions are not addressed, the opinions of the reviewer may unduly influence his conclusions.3
Did the review look at all the evidence?
What does it matter if one study is left out? The answer is that it might introduce bias if the omission is because of the study’s results and not mere chance. Perhaps trials of b-agonists for acute bronchitis were more likely to be published if they were positive than if they were negative. Negative trials might only be found in the “grey literature”—that is, research that may not have been indexed into formal peer-reviewed journals, making it harder to locate-which would give positive papers exaggerated prominence. Omitting negative studies would give the impression that b-agonists were more effective than they really are.
The paper by Smucny and colleagues describes a very adequate electronic search of the literature (including the standard collections of intervention trials, such as The Cochrane Library, MEDLINE, and EMBASE), conference proceedings, and other similar collections. They also wrote to people who might know of unpublished research. This is a very thorough investigation of the grey literature. Scholten and coworkers limited their search to English, French, German, and Dutch, without attempting any grey literature search (although they obtained more information from the authors of one study to clear up some uncertainty). Does this create a potential bias? The answer lies in trying to decide to what extent this might have caused a bias. If Thai research written in Thai was excluded, but Thai research written in English was included,4 then there is such a potential. But perhaps most scientific medical papers in Thailand are written in English. A question mark remains.
Were studies properly selected on the criteria of quality?
Papers must be selected on quality rather than anything else. One sure test of objectivity is whether the methods description allows replication of the review by another group. Both these papers describe a very clear process; criteria were set before the process was started. If we go to the JFP Web site, www.jfponline.com, we can check each paper’s quality score. The way individual investigators each scored the papers independently should minimize any systematic bias. The paper by Smucny and colleagues checked each paper for randomization, blinding, and loss of patients from the analysis. These have been shown to be the most important issues to assess in randomized controlled trials (Shulz). Since the Scholten and coworkers paper is about diagnostic accuracy we should look for different items, particularly in the selection of patients, the “gold standard,” and any loss to follow-up of patients. It is always hard to know whether the positive results were really positive. So it is important to have robust gold standards against which the physical examinations for all patients were independently compared. These consisted of arthrotomy, arthroscopy, or magnetic resonance imaging. They sensibly decided to leave out studies on the dead: It would be difficult to know if such studies could generalize to the living.
Data Extraction
The next stage in a systematic review is the extraction of data from the studies to compare them with each other. Clearly it is important to decide on the right things to extract. Smucny and colleagues only included outcomes that were of direct patient relevance, such as cough (presumably dropping indirect outcomes, such as respiratory function tests). Scholten and coworkers looked at the sensitivity, specificity, and other measures of diagnostic accuracy for estimating joint effusion, joint line tenderness, and the McMurray test.