The Effect of Cluster Randomization on Sample Size in Prevention Research

,

Original Research

The Effect of Cluster Randomization on Sample Size in Prevention Research

J Fam Pract. 2001 March;50(3):242

References

1. Cornfield J. Randomization by group: a formal analysis. Am J Epidemiol 1978;108:100-02.

2. Donner A. An empirical study of cluster randomization. Int J Epidemiol 1982;11:283-86.

3. Kerry SM, Bland JM. The intercluster correlation coefficient in cluster randomisation. BMJ 1998;316:1455.

4. Gail MH, Mark SD, Carrol RJ, Green SB, Pee D. On design considerations and randomization based inference for community intervention trials. Stat Med 1996;15:1069-92.

5. Bero LA, Grilli R, Grimshaw JM, Harvey E, Oxman AD, Thomson MA. Closing the gap between research and practice: an overview of systematic reviews of interventions to promote the implementation of research findings: the Cochrane Effective Practice and Organization of Care Review Group. BMJ 1998;317:465-86.

6. Divine GW, Brown JT, Frazier LM. The unit of analysis error in studies about physicians’ patient care behavior. J Gen Intern Med 1992;7:623-29.

7. Simpson JM, Klar N, Donner A. Accounting for cluster randomization: a review of primary prevention trials, 1990 through 1993. Am J Public Health 1995;85:1378-83.

8. Donner A, Klar N. Methods for comparing event rates in intervention studies when the unit of allocation is the cluster. Am J Epidemiol 1994;140:279-89.

9. Murray DM, Perry CL, Griffen G, et al. Results from a statewide approach to adolescent tobacco use prevention. Prev Med 1992;21:449-72.

10. Lemelin J, Hogg W, Baskerville B. Evidence to action: a tailored multi-faceted approach to changing family physician practice patterns and improving preventive care. CMAJ. In press.

11. Canadian Task Force on the Periodic Health Examination The Canadian guide to clinical preventive health care. Ottawa, Canada: Health Canada; 1994.

12. Fleiss JL. Statistical methods for rates and proportions. 2nd ed. New York, NY: John Wiley & Sons; 1981.

13. Bland JM, Altman DG. Measurement error and correlation coefficients. BMJ 1996;313:41-42.

14. Donner A, Birkett N, Buck C. Randomization by cluster: sample size requirements and analysis. Am J Epidemiol 1981;114:906-14.

15. Campbell MK, Grimshaw JM. Cluster randomised trials: time for improvement. BMJ 1998;317:1171-72.

16. Bland JM, Kerry SM. Trials randomised in clusters. BMJ 1997;315:600.

17. Koepsell TD, Martin DC, Diehr PH, et al. Data analysis and sample size issues in evaluations of community based health promotion and disease prevention programs: a mixed-model analysis of variance approach. Am J Public Health 1995;85:1378-83.

18. Feldman HA, McKinlay SM. Cohort versus cross-sectional design in large field trials: precision, sample size, and a unifying model. Stat Med 1994;13:61-78.

19. Campbell M, Grimshaw J, Steen N. Sample size calculations for cluster randomised trials. J Health Serv Res Policy 2000;5:12-16.

Finally, using the formula by Donner and coworkers¹⁴ and the ICC, the sample size for comparing 2 independent groups allowing for clustering for both up-to-datedness and inappropriateness was determined. The formula is: n = 2(Z₋_/2+Z₀₂)²F²[1+(x_c-1)D]/*² where n is the per group sample size; Z_−/2= 1.96 and Z₀₂= 0.84 are the standard normal percentiles for the type I and type II error rates at 0.05 and 0.20, respectively; Fis the standard deviation in the outcome variable; * is the expected difference between the two means; x_cis the average cluster size; and D is the ICC. The sample size calculations were based on an expected difference of 0.09 between groups (with 80% power and 5% significance) and a standard deviation of 0.10. Table 3 shows the effect on per-group sample size for varying ICC values and average cluster sizes.

Results

A total of 46 HSOs were recruited out of a possible 100 sites, for a response rate of 46% at baseline. The response rate to the physician questionnaire was 98% (106 of 108). Physicians in practices that agreed to participate differed significantly from those who did not. Participating physicians were younger, having graduated in 1977 on average compared with 1971 (t=4.58 [df=191], P<.001) and were more likely to be women, 30.4% compared with 9.9% for nonparticipating physicians (c²=11.09 [df=1, N=193], P=.001). Table 2 provides descriptive information on practice and physician characteristics. Five practices of 46 needed to have the entire 100 charts re-audited. Final concordance between the 2 auditors for each practice verification was 85% (k=.71).

The mean up-to-datedness score for the practices or the mean proportion of A and B maneuvers performed was 53.5% (95% confidence interval [CI], 51.0%-56.0%) and the mean inappropriateness score was 21.5% (95% CI, 18.1%-24.9%). In other words, on average, 53.5% of patients eligible for recommended preventive maneuvers received them and 21.5% of eligible patients received inappropriate preventive maneuvers.

Table 1 gives the practice mean square, the error mean square, the ICC, and the required sample size per group for the overall measures of up-to-datedness and inappropriateness as well as for 13 preventive maneuvers individually. For inappropriateness, there was more variability between practices than within practices among physicians, resulting in a larger practice mean square and a significant F statistic (P <.05). For up-to-datedness, the variability within practices among physicians was greater than the variability between practices, although not significantly so. Table 1 shows the intraclass correlation as 0.0365 for up-to-datedness and 0.1790 for inappropriateness. Inappropriateness scores were not normally distributed, and 2 physicians had scores greater than 0.60. However, with these extreme outliers removed, the ICC for inappropriateness remained high at 0.14.

The ICC ranges from 0.005 for blood pressure measurement to 0.66 for chest x-rays of smokers. The variability between and within group clusters is the same for blood pressure measurement. For chest x-rays of smokers the variability between clusters is very significant and within clusters it is small, indicating that some practice clusters perform a larger number of chest x-rays on smokers than other practices. However, the performance of chest x-rays was not normally distributed, with 79% of physicians not performing them and one solo physician with an extreme score of 0.53. With this extreme outlier removed the ICC for chest x-rays was 0.25, with a mean square between practices of 0.0024 and a mean square within practices of 0.0012 (P<.01). Table 1 shows the effect on sample size for analysis at the level of the physician as the ICC varies.

Discussion

Statistical theory points to the consequences of cluster randomization as a reduction in effective sample size. This occurs because the individuals within a cluster cannot be regarded as independent. The precise effect of cluster randomization on sample size requirements depends on both the size of the cluster and the degree of within-cluster dependence as measured by ICC.² Cluster randomized trials are increasingly being used in health services research particularly for evaluating interventions involving organizational changes when it is not feasible to randomize at the level of the individual. Cluster randomization at the level of the practice minimizes the potential for contamination between treatment and control groups. However, the statistical power of a cluster randomized trial when the unit of randomization is the practice and the unit of analysis is the health professional can be greatly reduced in comparison to an individually randomized trial.¹⁵

To preserve power the researcher should, whenever possible, ensure that the unit of randomization and the unit of analysis are the same.¹⁶ In this manner standard statistical tests can be used. Often this is not possible given secondary research questions that may be targeted to the health professionals within the practice and not the practice as a whole. If data are analyzed at the level of the individual and not at the level of the cluster (in effect ignoring the clustering effect), then there is a strong possibility that P values will be artificially extreme and confidence intervals will be overly narrow, increasing the chances of spuriously significant findings and misleading conclusions.¹⁵ When using the individual physician as the unit of analysis, one must take into account the correlation between responses of individuals within the same cluster. For continuous outcome variables that are normally distributed, a mixed-effects analysis of variance (or covariance) is appropriate, with clusters nested within the comparison groups.¹⁷ For dichotomous variables, Donner and Klar suggest that an adjusted chi-square test be used.⁸ Although we focus on the issue of clustering for study designs using random allocation, the issue of clustering is also apparent in cross-sectional and cohort studies, where the practice-level and/or physician-level factors may have an impact on patient-level data. Researchers need to be aware of the possibility of intracluster correlation and the implications for analysis in these studies as well.¹⁸