What Does This Score Mean? a Clinical Standard Setting Method Applied to Neuro-QOL Outcomes in a Sample of Persons with MS

Thursday, May 29, 2014
Trinity Exhibit Hall
Deborah M Miller, Ph.D., LISW , Mellen Center, Cleveland Clinic, Cleveland, OH
Karon Cook, PhD , Medical Social Sciences, Northwestern University, Chicago, OH
David Victorson, Ph.D. , Medical Social Sciences, Northwestern University, Chicago, OH

Background: There are few established methods that include qualitative methods to interpret meaningful classifications between standardized patient reported outcomes.

Objectives: To establish clinically relevant classifications for four Neuro-QOL measures (mobility, upper body function, fatigue, and sleep).

Methods: We adapted an educational standard-setting methodology, bookmarking, to identify cut scores for symptom severity based on Neruo-QOL scores. Following this method, clinical “vignettes” were developed at multiple points along the continuum of symptom severity. Each vignette inlcuded five Neuro-QOL items selected from the item bank and the corresponding IRT-predicted responses for each item at the specified severity level. Two groups of expert panels were identified: a clinician group, and a group of persons with multiple sclerosis (MS). Panelists individually rated the vignettes for a given domain by symptom severity. In separate, one-day, in-person workshops the panel of persons with MS (PwMS) and the panel of clinicians identified adjacent vignettes they judged to represent the threshold between two levels of severity for a given domain. After an iterative process of discussion, judgment, review of validity evidence, and reconsideration of thresholds; the panel of PwMS and the clinician panel reached consensus on thresholds for each of the four targeted measures. Cut-scores were defined as the mean location for each pair of threshold vignettes.

Results: PwMS and clinician panels derived identical thresholds for severity levels of mobility and sleep. For the domains of upper extremity and fatigue, there was 75% and 88% concordance, respectively. In every case of divergence, PwMS set higher thresholds for more severe classifications of symptoms (by 0.5 SDs) than did clinicians.

Conclusions: We adapted a standard setting exercise commonly used in educational testing to establish interpretation thresholds for four Neuro-QOL measures and achieved strong congruence between panels of PwMS and clinicians about where those thresholds rest for each of the measures.