Original Research

The Effect of Cluster Randomization on Sample Size in Prevention Research

Author and Disclosure Information

 

References

Finally, using the formula by Donner and coworkers14 and the ICC, the sample size for comparing 2 independent groups allowing for clustering for both up-to-datedness and inappropriateness was determined. The formula is: n = 2(Z/2+Z02)2F2[1+(xc-1)D]/*2 where n is the per group sample size; Z−/2= 1.96 and Z02= 0.84 are the standard normal percentiles for the type I and type II error rates at 0.05 and 0.20, respectively; Fis the standard deviation in the outcome variable; * is the expected difference between the two means; xcis the average cluster size; and D is the ICC. The sample size calculations were based on an expected difference of 0.09 between groups (with 80% power and 5% significance) and a standard deviation of 0.10. Table 3 shows the effect on per-group sample size for varying ICC values and average cluster sizes.

Results

A total of 46 HSOs were recruited out of a possible 100 sites, for a response rate of 46% at baseline. The response rate to the physician questionnaire was 98% (106 of 108). Physicians in practices that agreed to participate differed significantly from those who did not. Participating physicians were younger, having graduated in 1977 on average compared with 1971 (t=4.58 [df=191], P<.001) and were more likely to be women, 30.4% compared with 9.9% for nonparticipating physicians (c2=11.09 [df=1, N=193], P=.001). Table 2 provides descriptive information on practice and physician characteristics. Five practices of 46 needed to have the entire 100 charts re-audited. Final concordance between the 2 auditors for each practice verification was 85% (k=.71).

The mean up-to-datedness score for the practices or the mean proportion of A and B maneuvers performed was 53.5% (95% confidence interval [CI], 51.0%-56.0%) and the mean inappropriateness score was 21.5% (95% CI, 18.1%-24.9%). In other words, on average, 53.5% of patients eligible for recommended preventive maneuvers received them and 21.5% of eligible patients received inappropriate preventive maneuvers.

Table 1 gives the practice mean square, the error mean square, the ICC, and the required sample size per group for the overall measures of up-to-datedness and inappropriateness as well as for 13 preventive maneuvers individually. For inappropriateness, there was more variability between practices than within practices among physicians, resulting in a larger practice mean square and a significant F statistic (P <.05). For up-to-datedness, the variability within practices among physicians was greater than the variability between practices, although not significantly so. Table 1 shows the intraclass correlation as 0.0365 for up-to-datedness and 0.1790 for inappropriateness. Inappropriateness scores were not normally distributed, and 2 physicians had scores greater than 0.60. However, with these extreme outliers removed, the ICC for inappropriateness remained high at 0.14.

The ICC ranges from 0.005 for blood pressure measurement to 0.66 for chest x-rays of smokers. The variability between and within group clusters is the same for blood pressure measurement. For chest x-rays of smokers the variability between clusters is very significant and within clusters it is small, indicating that some practice clusters perform a larger number of chest x-rays on smokers than other practices. However, the performance of chest x-rays was not normally distributed, with 79% of physicians not performing them and one solo physician with an extreme score of 0.53. With this extreme outlier removed the ICC for chest x-rays was 0.25, with a mean square between practices of 0.0024 and a mean square within practices of 0.0012 (P<.01). Table 1 shows the effect on sample size for analysis at the level of the physician as the ICC varies.

Discussion

Statistical theory points to the consequences of cluster randomization as a reduction in effective sample size. This occurs because the individuals within a cluster cannot be regarded as independent. The precise effect of cluster randomization on sample size requirements depends on both the size of the cluster and the degree of within-cluster dependence as measured by ICC.2 Cluster randomized trials are increasingly being used in health services research particularly for evaluating interventions involving organizational changes when it is not feasible to randomize at the level of the individual. Cluster randomization at the level of the practice minimizes the potential for contamination between treatment and control groups. However, the statistical power of a cluster randomized trial when the unit of randomization is the practice and the unit of analysis is the health professional can be greatly reduced in comparison to an individually randomized trial.15

To preserve power the researcher should, whenever possible, ensure that the unit of randomization and the unit of analysis are the same.16 In this manner standard statistical tests can be used. Often this is not possible given secondary research questions that may be targeted to the health professionals within the practice and not the practice as a whole. If data are analyzed at the level of the individual and not at the level of the cluster (in effect ignoring the clustering effect), then there is a strong possibility that P values will be artificially extreme and confidence intervals will be overly narrow, increasing the chances of spuriously significant findings and misleading conclusions.15 When using the individual physician as the unit of analysis, one must take into account the correlation between responses of individuals within the same cluster. For continuous outcome variables that are normally distributed, a mixed-effects analysis of variance (or covariance) is appropriate, with clusters nested within the comparison groups.17 For dichotomous variables, Donner and Klar suggest that an adjusted chi-square test be used.8 Although we focus on the issue of clustering for study designs using random allocation, the issue of clustering is also apparent in cross-sectional and cohort studies, where the practice-level and/or physician-level factors may have an impact on patient-level data. Researchers need to be aware of the possibility of intracluster correlation and the implications for analysis in these studies as well.18

Pages

Recommended Reading

Validating the Adult Primary Care Assessment Tool
MDedge Family Medicine
Physician and Nursing Perspectives on Patient Encounters in End-of-Life Care
MDedge Family Medicine
Should breech babies be delivered vaginally or by planned cesarean delivery?
MDedge Family Medicine
Are there adverse maternal and neonatal outcomes associated with induction of labor when there is no well-accepted indication?
MDedge Family Medicine
Is mometasone furoate aqueous nasal spray (MFNS) effective in reducing symptoms in acute recurrent sinusitis?
MDedge Family Medicine
What is the risk of venous thromboembolism (VTE) among women taking third-generation oral contraceptives (OCs) in comparison with those taking contraceptives containing levonorgestrel?
MDedge Family Medicine
Does the increased sensitivity of the new Papanicolaou (Pap) tests improve the cost-effectiveness of screening for cervical cancer?
MDedge Family Medicine
What is the optimal strategy for managing acute migraine headaches?
MDedge Family Medicine
Do back-up antibiotic prescriptions for the treatment of common respiratory symptoms alter fill rates and patient satisfaction?
MDedge Family Medicine
Is cilostazol more effective than pentoxifylline in the treatment of symptoms of intermittent claudication?
MDedge Family Medicine