Burden “ the time, effort and other demands placed on a patient or research participant to complete an outcome measure (respondent burden) or on those administering an outcome measure (administrative burden). Respondent burden relates to issues such as the number of questions, reading and comprehension level, and acceptability of the instrument to the respondent. Administrative burden relates to issues such as the time and training required to score responses and administer the outcome measure.
Concept - an abstraction that cannot be measured directly and is based on observations of certain behaviours or characteristics such as pain or stress.
Conceptual model or framework “ refers to how variables are expected to relate to each other and why (e.g., variables related to CAM use). At a higher level of abstraction it may be defined as concepts that are interrelated by virtue of their relevance to a common theme (e.g. psychological and social factors related to the decision-making process).
Construct “ a highly abstract concept that cannot be measured directly. Such a concept is invented (constructed) by researchers for the purposes of research. Constructs are measured using multiple items that in combination assess their meaning. Examples of constructs are locus of control, self-efficacy and CAM. The term concept and construct are often used interchangeably as they are very close in meaning.
Directly measurable health outcomes - outcomes that do not require a standardized, pre-tested measuring instrument to be assessed. Some examples are cost, many biological markers and number of days of work lost.
Health - a state of complete physical, mental and social well-being and not merely the absence of disease or infirmity (WHO).
Health outcome domains - groupings of different types of health outcomes that are intended to capture changes in a persons health status, quality of life, level of function and sense of well being that can be attributed to an intervention.
Indirectly measurable health outcomes - outcomes that measure a construct and thus require a standardized, pre-tested, often multidimensional instrument to be assessed in a valid manner. Some examples are stress, well-being, power and adjustment.
Internal consistency “ the degree of correlation among items on a multi-item outcome measure instrument (see comment on FAQ).
Interpretability “ the degree to which one can assign easily understood meaning to an instrument's score. For example, what a score of 70 means on a functional ability scale that ranges from 0 - 100.
Inter-rater reliability “ the degree of stability of instrument scores between two raters (or observers) using the same instrument at the same point in time. This should be assessed in a situation in which scoring of the instrument is complex.
Intra-rater reliability “ the degree of stability of instrument scores for one rater (or observer) across two or more time points. It is assumed that the outcome of interest remains the same between time points.
Measurement model “ an outcome measures scale and sub-scale structure, which includes direction regarding the procedures to be followed to calculate the scores. This applies to complex instruments that consist of several sub-scales, such as the SF-36.
Reliability “ the consistency with which an instrument measures what it is designed to measure, or, the extent to which random variation influences the result. Reliability can be assessed by testing for:
¢ stability, i.e., how stable it is over time (test-retest)
¢ equivalence, i.e. consistency of the results by different raters (inter-rater reliability) or similar tests at the same time, and
¢ internal consistency i.e., the measurement of the concept is consistent in all parts of the test.
Responsiveness “ the degree to which an outcome measure can detect change, often defined as the minimal change considered to be important by a person. Evidence for responsiveness is an important factor when assessing construct validity.
Test-retest reliability “ the consistency of test scores over time based on a correlation between test and retest scores within the same sample. It is assumed that the outcome of interest remains the same over that time period.
Validity “ the degree to which an outcome measure measures what it purports to measure. Evidence for the validity of an outcome measure is assessed in three primary ways: criterion validity, construct validity, and content validity as described below.
Criterion validity “ Multiple measures of the same concept: one is compared to a second instrument that measures the same concept. This second instrument is the criterion by which the validity of the new instrument is assessed. It should be a measure of the target construct that is widely accepted as a valid measure of that construct (a criterion measure). Criterion validity is divided into two subtypes: concurrent validity and predictive validity.
Concurrent validity “ a type of criterion validity that assesses the degree to which the scores of a measure relate to the score(s) of a criterion measure, when the two scores are assessed concurrently. For example, the relationship between behavioural ratings from the staff in a mental institution regarding readiness for discharge and a formal test that assesses readiness for discharge.
Predictive validity “ a type of criterion validity that assesses if scores on a new instrument can predict future standing, status or performance. For example whether grades can predict academic success, or a test to assess readiness for discharge can predict re-hospitalization.
Construct validity “ the degree to which an instrument measures the construct under investigation. There are various ways to assess construct validity, however, they are all based on the logical analysis of hypothesized relationships between constructs in the form of: If constructs A and B are related, then instruments for A and B should also be related. Or: if C and D are instruments to measure the same construct, then instruments C and D should be related. Convergent validity and discriminant validity are subtypes of construct validity, as described below
Convergent validity “ the degree to which measures of constructs that theoretically should be related to each other are, in fact, observed to be related to each other (i.e., they converge).
Discriminant validity “ the degree to which measures of constructs that theoretically should not be related to each other are, in fact, observed to not be related to each other (i.e., they discriminate).
Content validity “ the degree to which an instrument is consistent with 1) the known literature about the construct that the instrument attempts to measure, and 2) the opinion of experts who have done work in the field.
1) Medical Outcomes Trust Scientific Advisory Committee (1995). Instrument Review Criteria. Medical Outcomes Trust Bulletin, September, I-IV. (with link to pdf of article)
2) Portney, L.G., and Watkins, M.P. (2000). Reliability. In. L.G. Portney and M.P. Watkins (eds) Foundations of Clinical Research: Applications to Practice (pp. 61-77). New Jersey. Prentice Hall.
3) Portney, L.G., and Watkins, M.P. (2000). Validity of Measurements. In. L.G. Portney and M.P. Watkins (eds) Foundations of Clinical Research: Applications to Practice (pp. 79-110). New Jersey: Prentice Hall.
4) McDowell, I. (2006). The Theoretical and Technical Foundation of Health Measurement. In. I. McDowell (ed.) Measuring Health: A Guide to Rating Scales and Questionnaires (pp. 10-54). Oxford: Oxford University Press.
5) Polit, D.F., and Hungler, B.P (1999). Nursing Research “ Principles and Methods. 6th edition. Philidelphia: Lippincott.