Original research
Susan D. Mathias MPH
Abstract
The Brief Pain Inventory–Short Form (BPI-SF) is widely used for assessing pain in clinical and research studies. The worst pain rating is often the primary outcome of interest; yet, no published data are available on its minimally important difference (MID). Breast cancer patients with bone metastases enrolled in a randomized, double-blind, phase III study comparing denosumab with zoledronic acid for preventing skeletal related events and completed the BPI-SF, FACT-B, and EQ-5D at baseline, week 5, and monthly through the end of the study. Anchor- and distribution-based MID estimates were computed. Data from 1,564 patients were available. Spearman correlation coefficients for anchors ranged from 0.33–0.65. Mean change scores for worst pain ratings corresponding to one-category improvement in each anchor were 0.26–1.04 for BPI-SF current pain, −1.40 to −2.42 for EQ-5D Index score, 1.71–1.98 for EQ-5D Pain item, −2.22 to −0.51 for FACT-B TOI, −1.61 to −0.16 for FACT-G Physical, and −1.31 to −0.12 for FACT-G total. Distribution-based results were 1 SEM = 1.6, 0.5 effect size = 1.4, and Guyatt's statistic = 1.4. Combining anchor- and distribution-based results yielded a two-point MID estimate. An MID estimate of two points is useful for interpreting how much change in worst pain is considered clinically meaningful.
Article Outline
- Methods
- Study Design
- Outcome Measures and Assessment Intervals
- Anchor-Based Analysis
- Distribution-Based Analysis
- Integrating Anchor-Based and Distribution-Based Mid Estimates
The MID may be estimated through distribution-based methods and/or anchor-based methods. Distribution-based methods are based on the distribution of the data. Examples of distribution-based methods include effect size measures, the standard error of measurement (SEM), one-half times the standard deviation, and the responsiveness index.[2] and [3] Anchor-based methods are based on the association between the PRO measure and an interpretable external measure, such as a global rating of change or a response to treatment. These methods may result in somewhat different estimates, and no particular estimate is considered the most valid.[2], [3] and [4] Therefore, researchers are encouraged to use more than one method and to present a range of MID estimates.
A frequently used PRO measure for the assessment of pain is the Brief Pain Inventory–Short Form (BPI-SF). The foundation of the BPI-SF is the Wisconsin Brief Pain Questionnaire, which was developed over 25 years ago based on interviews with cancer patients, expert opinion, and then-current psychometric standards.5 Over time, the Wisconsin Brief Pain Questionnaire evolved into the Brief Pain Inventory, which was later reduced to a shorter version, the BPI-SF. Today, the BPI-SF is the standard for clinical and research use. It has been used in over 400 studies, including psychometric evaluations and clinical applications with a wide range of conditions (e.g., cancer pain, fibromyalgia, neuropathic pain, and joint diseases).6
The BPI-SF includes two domains: pain severity and pain interference. The pain severity domain, the focus of this report, includes items specific to pain at “worst,” “least,” “average,” and “now” (current pain), with a numerical response scale ranging from 0 (no pain) to 10 (pain as bad as you can imagine). In clinical trials, the worst pain item has been used alone as a measure of pain severity.6 Its use as a single item is supported by a consensus panel on outcome measures for chronic pain clinical trials.7 In addition, the Food and Drug Administration's (FDA) guidance on PROs states that a single-item PRO measure of pain severity is appropriate for assessing the effect of a treatment on pain.8 Although extensive psychometric evaluation of the BPI-SF has been conducted, no estimates of the MID are available for the BPI-SF worst pain item. Establishing the MID for the BPI-SF worst pain item is important because it will provide a clinically relevant reference to interpret changes in pain scores. Therefore, the objective of this current report was to estimate the MID of the worst pain item of the BPI-SF.
Methods
Study Design
Patients with advanced breast cancer and bone metastases were enrolled in an international, randomized, double-blind, double-dummy, active-controlled phase III study comparing denosumab with zoledronic acid for delaying or preventing skeletal related events. Patients were eligible to participate if they had histologically or cytologically confirmed breast adenocarcinoma; current or prior radiologic, computed tomography, or magnetic resonance imaging evidence of at least one bone metastasis; and an Eastern Cooperative Oncology Group (ECOG) performance status of 0, 1, or 2. Patients with current or prior intravenous bisphosphonate administration were excluded. Patients completed PRO assessments, including the BPI-SF, at baseline, week 5, and every 4 weeks thereafter until the end of the study. Assessments were scheduled to take place prior to any study procedures and prior to study drug administration. Although data collection continued, PRO analyses for efficacy were truncated when approximately 30% of patients dropped out of the study due to death, disease progression, or withdrawn consent.
Outcome Measures and Assessment Intervals
A number of outcome measures were assessed in the study and considered for use as anchors for evaluating the MID of the BPI-SF worst pain item, including one clinician-reported measure (ECOG Performance Status) and several PRO measures: the EuroQoL 5 Dimensions (EQ-5D) Index score, the Functional Assessment of Cancer Therapy-Breast Cancer (FACT-B), and the BPI-SF current pain rating.
The ECOG Performance Status, which assesses how a patient's disease or its treatment is progressing and how the disease affects the daily living abilities of the patient, is a single-item, six-point, clinician-rated assessment of performance ranging from 0 (fully active, no restrictions) to 5 (dead).9 The EQ-5D Index score is a measure of health status, which assesses five dimensions: mobility, self-care, usual activities, pain/discomfort, and anxiety/depression. Each dimension is comprised of three response options: no problems, some/moderate problems, and extreme problems. Responses are converted to a weighted health state index, with scores ranging from −0.594 (worst health) to 1.0 (full health). The single item on pain from the EQ-5D was also evaluated separately as an anchor. The FACT-B includes the four core FACT-General (FACT-G) dimensions of physical well-being, social/family well-being, emotional well-being, and functional well-being, for which scale scores and a total score can be computed. In addition, the FACT-B includes a breast cancer–specific subscale.10 The FACT-B Trial Outcome Index (TOI) is the sum of the physical well-being score, the functional well-being score, and the breast cancer subscale. The four FACT-G scale scores, the FACT-G total score, the FACT-B TOI, and a single-item overall quality-of-life (QOL) rating from the functional well-being section were all evaluated as potential anchors. The single-item overall QOL item from the functional well-being scale was selected to balance out the single item on pain that was selected from the EQ-5D, by serving as a more general potential anchor in breadth and scope. For all of these FACT outcome measures, a higher score indicates better health-related QOL. Finally, the current pain rating from the BPI-SF, ranging from 0 (no pain) to 10 (pain as bad as you can imagine), was also considered as an anchor because it was hypothesized to be highly correlated with the worst pain rating and because it would assist in understanding the behavior of other potential anchors.
Several assessment intervals were considered for evaluation of the MID for the BPI-SF worst pain item: baseline to week 5, baseline to week 13, and baseline to week 25. The analysis for each time interval included only those patients with complete baseline and end-of-interval (i.e., week 5, week 13, or week 25) assessments on the BPI-SF worst pain item and the relevant anchor of interest. In addition, a post hoc confirmatory analysis was conducted using a longer interval of time, from baseline to week 49. No imputation of missing data was performed. Analysis was performed on pooled data, regardless of treatment assignment.
Anchor-Based Analysis
The usefulness of an anchor depends on the correlation of the PRO change score and the anchor.11 Therefore, to select the most appropriate anchors and time interval for estimating the MID for the BPI-SF worst pain item, Spearman correlation coefficients were calculated between changes in the BPI-SF worst pain rating and changes in potential anchors across each of the potential time intervals. The time interval with the highest correlations and the anchors with statistically significant (P < 0.05) a priori specified correlations above 0.30 were selected for inclusion in the MID analysis.12
A one-category change was defined as a one-point change for the BPI-SF current pain item, a one-point change for the EQ-5D pain item, a three-point change for the FACT-G Physical Well-Being scale,13 a six-point change for the FACT-G total and FACT-B TOI scores,14 and a 0.20 change for the EQ-5D Index score. For the selected interval and anchors, the mean change in BPI-SF worst pain item that corresponds to a one-category increase and decrease in each anchor was calculated. In addition, ordinary least squares regression models were used to regress changes in BPI-SF worst pain ratings on changes of each of the anchors.[15] and [16] The regression models included main effects for change in each anchor and an interaction term expressing the change in anchor-by-baseline anchor.
Distribution-Based Analysis
The following distribution-based measures were calculated for the BPI-SF worst pain item: (1) the SEM, (2) effect size (Cohen's d), and (3) Guyatt's statistic. The SEM is a measure of the precision of a test instrument. It is calculated on the basis of sample data using the sample standard deviation and the sample reliability coefficient. While the standard deviation and the reliability of a measure are sample-dependent, their relationship (and hence the SEM) remains relatively constant across samples. Therefore, the SEM is considered to be an attribute of the measure and not a characteristic of the sample per se.17 Threshold values of 1 SEM have been suggested for defining clinically meaningful differences.18 The reliability coefficient was estimated for the BPI-SF worst pain item by calculating the intraclass correlation coefficients (ICCs) using two intervals of time. One used 7 days (days 1–8), a more typical interval for assessing reproducibility, while the other approach used a later interval, from week 105 to week 109. (Note: The 1-month interval was dictated by the schedule of assessments.) For both ICC values, only those patients whose FACT-B overall QOL ratings changed by 10% or less during the respective intervals were included. The 10% criterion was selected after reviewing the full distribution of change scores and their associated sample sizes, to arrive at a reasonable sample size of approximately 100 subjects.
Cohen's d, alternatively referred to as the “standardized effect size,” is calculated by dividing the difference between the baseline and week-25 scores by the standard deviation at baseline.19 The effect size represents individual change in terms of the number of baseline standard deviations. A value of 0.20 is a small effect, 0.50 is a medium effect, and 0.80 is a large effect. Effect sizes of 0.20, 0.50, and 0.80 were calculated in this study.
Guyatt's statistic, also referred to as the “responsiveness statistic,” is calculated by dividing the difference between baseline and week-25 change by the standard deviation of change observed for a group of stable patients.20 The denominator of the responsiveness statistics adjusts for spurious change due to measurement error. Values of 0.20 and 0.50 have been used to represent “small” and “medium” changes, respectively.21 Values representing 0.20 and 0.50 were calculated in this study. Stable patients were defined as those whose ECOG Performance rating did not change during the assessment interval. A different variable was used in defining the stable population for purposes of calculating the SEM and Guyatt's statistic because both variables were not consistently collected on the same schedule of assessments.
Integrating Anchor-Based and Distribution-Based Mid Estimates
The minimal detectable change (MDC) for the worst pain item was established by comparing distribution-based estimates. The MDC represents the smallest change that can be reliably distinguished from random fluctuation and, thus, the lower bound for establishing the MID.11 If the MID were lower than the MDC, then the instrument would not be capable of distinguishing the MID. The SEM was considered the primary distribution-based estimate because it takes into account the reliability of the measure and, thus, estimates the precision of the instrument.11 Other distribution-based measures were also considered in establishing the MDC. Standardized effect size was considered a secondary distribution-based estimate because of its reliance on interperson variability, which is generally higher and less consistent than intraperson variability. Anchor-based estimates of the MID range were then compared. A final MID range was established that is greater than the MDC and integrates estimates from the various anchors.
Results
Patient Population
Demographic and clinical characteristics for patients included in the baseline to week 25 interval are presented in Table 1. Data from 1,564 of 2,049 patients who participated in the study and had valid (i.e., nonmissing) baseline and end-of-interval scores for the BPI-SF and anchors were used in these analyses. Patients were predominantly female with an average age of 57.2 ± 11.2 years. The majority of patients were white (80.9%). Average pain scores at baseline were 2.45 ± 2.51, with a full range of scores (0–10) being used. Clinical results from the study have been presented previously.22