Compare and contrast the differences between single case research designs (SCRD) and group research designs used to investigate the effectiveness of intervention
Task summary: Dear freelancer, please write a 300-word APA-style discussion post.
Full order description: Use at least 2 of the 3 attached sources and 1 outside academic source. Have at least 1 in-text citation.
This week, you have learned about the characteristics of SCRD’s, including how they differ from traditional group design research. You have also been introduced to the types of SCRD graphs, and what features of the design contribute to the determination of experimental control within evidence-based practices, along with the importance that internal and external validity have within SCRD.
Please respond to the following:· Provide a discussion that compares and contrasts the differences between single case research designs (SCRD) and group research designs used to investigate the effectiveness of interventions (e.g., studying a problem in an applied setting).o Make sure to include a detailed description of the characteristics of SCRD, including the considerations regarding internal and external validity.· Explain the primary purpose of a comparative, component, and parametric analysis, and provide an original example of how each might be used as part of a single case research design and incorporated purposefully into a practical intervention plan.
Attached:
Instructions
Week’s readings
Requirements: 300 words
This week, you have learned about the characteristics of SCRD’s, including how they differ from traditional group design research. You have also been introduced to the types of SCRD graphs, and what features of the design contribute to the determination of experimental control within evidence-based practices, along with the importance that internal and external validity have within SCRD.
Please respond to the following:
Provide a discussion that compares and contrasts the differences between single case research designs (SCRD) and group research designs used to investigate the effectiveness of interventions (e.g., studying a problem in an applied setting).
Make sure to include a detailed description of the characteristics of SCRD, including the considerations regarding internal and external validity.
Explain the primary purpose of a comparative, component, and parametric analysis, and provide an original example of how each might be used as part of a single case research design and incorporated purposefully into a practical intervention plan.
1 What Works ClearinghouseSINGLE‐CASE DESIGN TECHNICAL DOCUMENTATION Developed for the What Works Clearinghouse by the following panel: Kratochwill, T. R. Hitchcock, J. Horner, R. H. Levin, J. R. Odom, S. L. Rindskopf, D. M Shadish, W. R. June 2010 Recommended citation: Kratochwill, T. R., Hitchcock, J., Horner, R. H., Levin, J. R., Odom, S. L., Rindskopf, D. M & Shadish, W. R. (2010). Single-case designs technical documentation. Retrieved from What Works Clearinghouse website: http://ies.ed.gov/ncee/wwc/pdf/wwc_scd.pdf.
2 SINGLE-CASE DESIGNS TECHNICAL DOCUMENTATION In an effort to expand the pool of scientific evidence available for review, the What Works Clearinghouse (WWC) assembled a panel of national experts in single-case design (SCD) and analysis to draft SCD Standards. In this paper, the panel provides an overview of SCDs, specifies the types of questions that SCDs are designed to answer, and discusses the internal validity of SCDs. The panel then proposes SCD Standards to be implemented by the WWC. The Standards are bifurcated into Design and Evidence Standards (see Figure 1). The Design Standards evaluate the internal validity of the design. Reviewers assign the categories of Meets Standards, Meets Standards with Reservations and Does not Meet Standards to each study based on the Design Standards. Reviewers trained in visual analysis will then apply the Evidence Standards to studies that meet standards (with or without reservations), resulting in the categorization of each outcome variable as demonstrating Strong Evidence, Moderate Evidence, or No Evidence. A. OVERVIEW OF SINGLE-CASE DESIGNS SCDs are adaptations of interrupted time-series designs and can provide a rigorous experimental evaluation of intervention effects (Horner & Spaulding, in press; Kazdin, 1982, in press; Kratochwill, 1978; Kratochwill & Levin, 1992; Shadish, Cook, & Campbell, 2002). Although the basic SCD has many variations, these designs often involve repeated, systematic measurement of a dependent variable before, during, and after the active manipulation of an independent variable (e.g., applying an intervention). SCDs can provide a strong basis for establishing causal inference, and these designs are widely used in applied and clinical disciplines in psychology and education, such as school psychology and the field of special education. SCDs are identified by the following features: • An individual “case” is the unit of intervention and unit of data analysis (Kratochwill & Levin, in press). A case may be a single participant or a cluster of participants (e.g., a classroom or a community). • Within the design, the case provides its own control for purposes of comparison. For example, the case’s series of outcome variables are measured prior to the intervention and compared with measurements taken during (and after) the intervention. • The outcome variable is measured repeatedly within and across different conditions or levels of the independent variable. These different conditions are referred to as phases (e.g., baseline phase, intervention phase). As experimental designs, a central goal of SCDs is to determine whether a causal relation (i.e., functional relation) exists between the introduction of a researcher-manipulated independent variable (i.e., an intervention) and change in a dependent (i.e., outcome) variable (Horner & Spaulding, in press; Levin, O’Donnell, & Kratochwill, 2003). Experimental control
3 involves replication of the intervention in the experiment and this replication is addressed with one of the following methods (Horner, et al., 2005): • Introduction and withdrawal (i.e., reversal) of the independent variable (e.g., ABAB design) • Iterative manipulation of the independent variable across different observational phases (e.g., alternating treatments design) • Staggered introduction of the independent variable across different points in time (e.g., multiple baseline design) SCDs have many variants. Although flexible and adaptive, a SCD is shaped by its research question(s) and objective(s) which must be defined with precision, taking into consideration the specifics of the independent variable tailored to the case(s), setting(s), and the desired outcome(s) (i.e., a primary dependent variable). For example, if the dependent variable is unlikely to be reversed after responding to the initial intervention, then an ABAB reversal design would not be appropriate, whereas a multiple baseline design across cases would be appropriate. Therefore, the research question generally drives the selection of an appropriate SCD. B. CAUSAL QUESTIONS THAT SCDS ARE DESIGNED TO ANSWER The goal of a SCD is usually to answer “Is this intervention more effective than the current “baseline” or “business-as-usual” condition?” SCDs are particularly appropriate for understanding the responses of one or more cases to an intervention under specific conditions (Horner & Spaulding, in press). SCDs are implemented when pursuing the following research objectives (Horner et al., 2005): • Determining whether a causal relation exists between the introduction of an independent variable and a change in the dependent variable. For example, a research question might be “Does Intervention B reduce a problem behavior for this case (or these cases)?” • Evaluating the effect of altering a component of a multi-component independent variable on a dependent variable. For example, a research question might be “Does adding Intervention C to Intervention B further reduce a problem behavior for this case (or these cases)?” • Evaluating the relative effects of two or more independent variables (e.g., alternating treatments) on a dependent variable. For example, a research question might be “Is Intervention B or Intervention C more effective in reducing a problem behavior for this case (or these cases)?”
4 SCDs are especially appropriate for pursuing research questions in applied and clinical fields. This application is largely because disorders with low prevalence may be difficult to study with traditional group designs that require a large number of participants for adequate statistical power (Odom, et al., 2005). Further, in group designs, the particulars of who responded to an intervention under which conditions might be obscured when reporting only group means and associated effect sizes (Horner et al. 2005). SCDs afford the researcher an opportunity to provide detailed documentation of the characteristics of those cases that did respond to an intervention and those that did not (i.e., nonresponders). For this reason, the panel recommends that What Works Clearinghouse (WWC) reviewers systematically specify the conditions under which an intervention is and is not effective for cases being considered, if this information is available in the research report. Because the underlying goal of SCDs is most often to determine “Which intervention is effective for this case (or these cases)?” the designs are intentionally flexible and adaptive. For example, if a participant is not responding to an intervention, then the independent variables can be manipulated while continuing to assess the dependent variable (Horner et al., 2005). Because of the adaptive nature of SCD designs, nonresponders might ultimately be considered “responders” under particular conditions.1 In this regard, SCDs provide a window into the process of participant change. SCDs can also be flexible in terms of lengthening the number of data points collected during a phase to promote a stable set of observations, and this feature may provide additional insight into participant change. C. THREATS TO INTERNAL VALIDITY IN SINGLE-CASE DESIGN2 Similar to group randomized controlled trial designs, SCDs are structured to address major threats to internal validity in the experiment. Internal validity in SCDs can be improved through replication and/or randomization (Kratochwill & Levin, in press). Although it is possible to use randomization in structuring experimental SCDs, these applications are still rare. Unlike most randomized controlled trial group intervention designs, most single-case researchers have addressed internal validity concerns through the structure of the design and systematic replication of the effect within the course of the experiment (e.g., Hersen & Barlow, 1976; Horner et al., 2005; Kazdin, 1982; Kratochwill, 1978; Kratochwill & Levin, 1992). The former (design structure, discussed in the Standards as “Criteria for Designs…”) can be referred to as “methodological soundness” and the latter (effect replication, discussed in the Standards as “Criteria for Demonstrating Evidence…”) is a part of what can be called “evidence credibility” (see, for example, Kratochwill & Levin, in press). 1 WWC Principal Investigators (PIs) will need to consider whether variants of interventions constitute distinct interventions. Distinct interventions will be evaluated individually with the SCD Standards. For example, if the independent variable is changed during the course of the study, then the researcher must begin the replication series again to meet the design standards. 2 Prepared by Thomas Kratochwill with input from Joel Levin, Robert Horner, and William Shadish.
5 In SCD research, effect replication is an important mechanism for controlling threats to internal validity and its role is central for each of the various threats discussed below. In fact, the replication criterion discussed by Horner et al. (2005, p. 168) represents a fundamental characteristic of SCDs: “In most [instances] experimental control is demonstrated when the design documents three demonstrations of the experimental effect at three different points in time with a single case (within-case replication), or across different cases (inter-case replication) (emphasis added).” As these authors note, an experimental effect is demonstrated when the predicted changes in the dependent measures covary with manipulation of the independent variable. This criterion of three replications has been included in the Standards for designs to “meet evidence” standards. Currently, there is no formal basis for the “three demonstrations” recommendation; rather, it represents a conceptual norm in published articles, research, and textbooks that recommend methodological standards for single-case experimental designs (Kratochwill & Levin, in press). Important to note are the terms level, trend and variability. “Level” refers to the mean score for the data within a phase. “Trend” refers to the slope of the best-fitting straight line for the data within a phase, and “variability” refers to the fluctuation of the data (as reflected by the data’s range or standard deviation) around the mean. See pages 17-20 for greater detail. Table 1, adapted from Hayes (1981) but without including the original “design type” designations, presents the three major types of SCDs and their variations. In AB designs, a case’s performance is measured within each condition of the investigation and compared between or among conditions. In the most basic two-phase AB design, the A condition is a baseline or preintervention series/phase and the B condition is an intervention series/phase. It is difficult to draw valid causal inferences from traditional two-phase AB designs because the lack of replication in such designs makes it more difficult to rule out alternative explanations for the observed effect (Kratochwill & Levin, in press). Furthermore, repeating an AB design across several cases in separate or independent studies would typically not allow for drawing valid inferences from the data (Note: this differs from multiple baseline designs, described below, which introduce the intervention at different points in time). The Standards require a minimum of four A and B phases, such as the ABAB design. There are three major classes of SCD that incorporate phase repetition, each of which can accommodate some form of randomization to strengthen the researcher’s ability to draw valid causal inferences (see Kratochwill & Levin, in press, for discussion of such randomization applications). These design types include the ABAB design (as well as the changing criterion design, which is considered a variant of the ABAB design), the multiple baseline design, and the alternating treatments design. Valid inferences associated with the ABAB design are tied to the design’s structured repetition. The phase repetition occurs initially during the first B phase, again in the second A phase, and finally in the return to the second B phase (Horner et al., 2005). This design and its effect replication standard can be extended to multiple repetitions of the treatment (e.g., ABABABAB) and might include multiple treatments in combination that are introduced in a repetition sequence as, for example, A/(B+C)/A/(B+C)/A (see Table 1). In the case of the changing criterion design, the researcher begins with a baseline phase and then schedules a series of criterion changes or shifts that set a standard for participant performance over time. The criteria are typically pre-selected and change is documented by outcome measures changing with the criterion shifts over the course of the experiment.
6 TABLE 1 EXAMPLE SINGLE-CASE DESIGNS AND ASSOCIATED CHARACTERISTICS Representative Example Designs Characteristics Simple phase change designs [e.g., ABAB; BCBC and the changing criterion design].* (In the literature, ABAB designs are sometimes referred to as withdrawal designs, intrasubject replication designs, or reversal designs) Complex phase change [e.g., interaction element: B(B+C)B; C(B+C)C] Changing criterion design In these designs, estimates of level, trend, and variability within a data series are assessed under similar conditions; the manipulated variable is introduced and concomitant changes in the outcome measure(s) are assessed in the level, trend, and variability between phases of the series, with special attention to the degree of overlap, immediacy of effect, and similarity of data patterns in similar phases (e.g., all baseline phases). In these designs, estimates of level, trend, and variability in a data series are assessed on measures within specific conditions and across time. In this design the researcher examines the outcome measure to determine if it covaries with changing criteria that are scheduled in a series of predetermined steps within the experiment. An A phase is followed by a series of B phases (e.g., B1, B2, B3…BT), with the Bs implemented with criterion levels set for specified changes. Changes/ differences in the outcome measure(s) are assessed by comparing the series associated with the changing criteria. Alternating treatments (In the literature, alternating treatment designs are sometimes referred to as part of a class of multi-element designs) Simultaneous treatments (in the literature simultaneous treatment designs are sometimes referred to as concurrent schedule designs). In these designs, estimates of level, trend, and variability in a data series are assessed on measures within specific conditions and across time. Changes/differences in the outcome measure(s) are assessed by comparing the series associated with different conditions. In these designs, estimates of level, trend, and variability in a data series are assessed on measures within specific conditions and across time. Changes/differences in the outcome measure(s) are assessed by comparing the series across conditions. Multiple baseline (e.g., across cases, across behaviors, across situations) In these designs, multiple AB data series are compared and introduction of the intervention is staggered across time. Comparisons are made both between and within a data series. Repetitions of a single simple phase change are scheduled, each with a new series and in which both the length and timing of the phase change differ across replications. Source: Adapted from Hayes (1981) and Kratochwill & Levin (in press). To be reproduced with permission. * A represents a baseline series; “B” and “C” represent two different intervention series.
7 Another variation of SCD methodology is the alternating treatments design, which relative to the ABAB and multiple baseline designs potentially allows for more rapid comparison of two or more conditions (Barlow & Hayes, 1979; Hayes, Barlow, & Nelson-Gray, 1999). In the typical application of the design, two separate interventions are alternated following the baseline phase. The alternating feature of the design occurs when, subsequent to a baseline phase, the interventions are alternated in rapid succession for some specified number of sessions or trials. As an example, Intervention B could be implemented on one day and Intervention C on the next, with alternating interventions implemented over multiple days. In addition to a direct comparison of two interventions, the baseline (A) condition could be continued and compared with each intervention condition in the alternating phases. The order of this alternation of interventions across days may be based on either counterbalancing or a random schedule. Another variation, called the simultaneous treatment design (sometimes called the concurrent schedule design), involves exposing individual participants to the interventions simultaneously, with the participant’s differential preference for the two interventions being the focus of the investigation. This latter design is used relatively infrequently in educational and psychological research, however. The multiple baseline design involves an effect replication option across participants, settings, or behaviors. Multiple AB data series are compared and introduction of the intervention is staggered across time. In this design, more valid causal inferences are possible by staggering the intervention across one of the aforementioned units (i.e., sequential introduction of the intervention across time). The minimum number of phase repetitions needed to meet the standard advanced by Horner et al. (2005) is three, but four or more is recognized as more desirable (and statistically advantageous in cases in which, for example, the researcher is applying a randomization statistical test). Adding phase repetitions increases the power of the statistical test, similar to adding participants in a traditional group design (Kratochwill & Levin, in press). The number and timing of the repetitions can vary, depending on the outcomes of the intervention. For example, if change in the dependent variable is slow to occur, more time might be needed to demonstrate experimental control. Such a circumstance might also reduce the number of phase repetitions that can be scheduled due to cost and logistical factors. Among the characteristics of this design, effect replication across series is regarded as the characteristic with the greatest potential for enhancing internal and statistical-conclusion validity (see, for example, Levin, 1992). Well-structured SCD research that embraces phase repetition and effect replication can rule out major threats to internal validity. The possible threats to internal validity in single-case research include the following (see also Shadish et al., 2002, p. 55): 1. Ambiguous Temporal Precedence: Lack of clarity about which variable occurred first may yield confusion about which variable is the cause and which is the effect.
8 Embedded in the SCD Standards is a criterion that the independent variable is actively manipulated by the researcher, with measurement of the dependent variable occurring after that manipulation. This sequencing ensures the presumed cause precedes the presumed effect. A SCD cannot meet Standards unless there is active manipulation of the independent variable.3 Replication of this manipulation-measurement sequence in the experiment further contributes to an argument of unidirectional causation (Shadish et al., 2002). Effect replication, as specified in the Standards, can occur either through within-case replication or multiple-case replication in a single experiment, or by conducting two or more experiments with the same or highly similar intervention conditions included. The Standards specify that the study must show a minimum of three demonstrations of the effect through the use of the same design and procedures. Overall, studies that can meet standards are designed to mitigate the threat of ambiguous temporal precedence. 2. Selection: Systematic differences between/among conditions in participant characteristics could cause the observed effect. In most single-case research, selection is generally not a concern because one participant is exposed to both (or all) of the conditions of the experiment (i.e., each case serves as its own control, as noted in features for identifying a SCD in the Standards). However, there are some conditions under which selection might affect the design’s internal validity. First, in SCDs that involve two or more between-case intervention conditions comprised of intact “units” (e.g., pairs, small groups, and classrooms), differential selection might occur. The problem is that the selected units might differ in various respects before the study begins. Because in most single-case research the units are not randomly assigned to the experiment’s different intervention conditions, selection might then be a problem. This threat can further interact with other invalidating influences so as to confound variables (a methodological soundness problem) and compromise the results (an evidence credibility problem). Second, the composition of intact units (i.e., groups) can change (generally decrease in size, as a result of participant attrition) over time in a way that could compromise interpretations of a treatment effect. This is a particular concern when within-group individual participants drop out of a research study in a treatment-related (nonrandom) fashion (see also No. 6 below). The SCD Standards address traditional SCDs and do not address between-case group design features (for Standards for group designs, see the WWC Handbook). Third, in the multiple baseline design across cases, selection might be an issue when different cases sequentially begin the intervention based on “need” rather than on a randomly determined basis (e.g., a child with the most serious behavior problem among several candidate participants might be selected to receive the treatment first, thereby weakening the study’s external validity). 3 Manipulation of the independent variable is usually either described explicitly in the Method section of the text of the study or inferred from the discussion of the results. Reviewers will be trained to identify cases in which the independent variable is not actively manipulated and in that case, a study Does Not Meet Standards.
9 3. History: Events occurring concurrently with the intervention could cause the observed effect. History is typically the most important threat to any time series, including SCDs. This is especially the case in ex post facto single-case research because the researcher has so little ability to investigate what other events might have occurred in the past and affected the outcome, and in simple (e.g., ABA) designs, because one need find only a single plausible alternative event about the same time as treatment. The most problematic studies, for example, typically involve examination of existing databases or archived measures in some system or institution (such as a school, prison, or hospital). Nevertheless, the study might not always be historically confounded in such circumstances; the researcher can investigate the conditions surrounding the treatment and build a case implicating the intervention as being more plausibly responsible for the observed outcomes relative to competing factors. Even in prospective studies, however, the researcher might not be the only person trying to improve the outcome. For instance, the patient might make other outcome-related changes in his or her own life, or a teacher or parent might make extra-treatment changes to improve the behavior of a child. SCD researchers should be diligent in exploring such possibilities. However, history threats are lessened in single-case research that involves one of the types of phase repetition necessary to meet standards (e.g., the ABAB design discussed above). Such designs reduce the plausibility that extraneous events account for changes in the dependent variable(s) because they require that the extraneous events occur at about the same time as the multiple introductions of the intervention over time, which is less likely to be true than is the case when only a single intervention is done. 4. Maturation: Naturally occurring changes over time could be confused with an intervention effect. In single-case experiments, because data are gathered across time periods (for example, sessions, days, weeks, months, or years), participants in the experiment might change in some way due to the passage of time (e.g., participants get older, learn new skills). It is possible that the observed change in a dependent variable is due to these natural sources of maturation rather than to the independent variable. This threat to internal validity is accounted for in the Standards by requiring not only that the design document three replications/demonstrations of the effect, but that these effects must be demonstrated at a minimum of three different points in time. As required in the Standards, selection of an appropriate design with repeated assessment over time can reduce the probability that maturation is a confounding factor. In addition, adding a control series (i.e., an A phase or control unit such as a comparison group) to the experiment can help diagnose or reduce the plausibility of maturation and related threats (e.g., history, statistical regression). For example, see Shadish and Cook (2009). 5. Statistical Regression (Regression toward the Mean): When cases (e.g., single participants, classrooms, schools) are selected on the basis of their extreme scores, their scores on other measured variables (including re-measured initial variables) typically will be less extreme, a psychometric occurrence that can be confused with an intervention effect.
10 In single-case research, cases are often selected because their pre-experimental or baseline measures suggest high need or priority for intervention (e.g., immediate treatment for some problem is necessary). If only pretest and posttest scores were used to evaluate outcomes, statistical regression would be a major concern. However, the repeated assessment identified as a distinguishing feature of SCDs in the Standards (wherein performance is monitored to evaluate level, trend, and variability, coupled with phase repetition in the design) makes regression easy to diagnose as an internal validity threat. As noted in the Standards, data are repeatedly collected during baseline and intervention phases and this repeated measurement enables the researcher to examine characteristics of the data for the possibility of regression effects under various conditions. 6. Attrition: Loss of respondents during a single-case time-series intervention study can produce artifactual effects if that loss is systematically related to the experimental conditions. Attrition (participant dropout) can occur in single-case research and is especially a concern under at least three conditions. First, premature departure of participants from the experiment could render the data series too short to examine level, trend, variability, and related statistical properties of the data, which thereby may threaten data interpretation. Hence, the Standards require a minimum of five data points in a phase to meet evidence standards without reservations. Second, attrition of one or more participants at a critical time might compromise the study’s internal validity and render any causal inferences invalid; hence, the Standards require a minimum of three phase repetitions to meet evidence standards. Third, in some single-case experiments, intact groups comprise the experimental units (e.g., group-focused treatments, teams of participants, and classrooms). In such cases, differential attrition of participants from one or more of these groups might influence the outcome of the experiment, especially when the unit composition change occurs at the point of introduction of the intervention. Although the Standards do not automatically exclude studies with attrition, reviewers are asked to attend to attrition when it is reported. Reviewers are encouraged to note that attrition can occur when (1) an individual fails to complete all required phases of a study, (2) the case is a group and individuals attrite from the group or (3) the individual does not have adequate data points within a phase. Reviewers should also note when the researcher reports that cases were dropped and record the reason for that (for example, being dropped for nonresponsiveness to treatment). To monitor attrition through the various phases of single-case research, reviewers are asked to apply a template embedded in the coding guide similar to the flow diagram illustrated in the CONSORT Statement (Moher, Schulz, & Altman, 2001) and adopted by the American Psychological Association for randomized controlled trials research (APA Publications and Communications Board Working Group on Journal Article Reporting Standards, 2008). See Appendix A for the WWC SCD attrition diagram. Attrition noted by reviewers should be brought to the attention of principal investigators (PIs) to assess whether the attrition may impact the integrity of the study design or evidence that is presented. 7. Testing: Exposure to a test can affect scores on subsequent exposures to that test, an occurrence that can be confused with an intervention effect.
11 In SCDs, there are several different possibilities for testing effects—in particular, many measurements are likely to be “reactive” when administered repeatedly over time. For example, continuous exposure of participants to some curriculum measures might improve their performance over time. Sometimes the assessment process itself influences the outcomes of the study, such as when direct classroom observation causes change in student and teacher behaviors. Strategies to reduce or eliminate these influences have been proposed (Cone, 2001). In single-case research, the repeated assessment of the dependent variable(s) across phases of the design can help identify this potential threat. The effect replication standard can enable the researcher to reduce the plausibility of a claim that testing per se accounted for the intervention effect (see Standards). 8. Instrumentation: The conditions or nature of a measure might change over time in a way that could be confused with an intervention effect. Confounding due to instrumentation can occur in single-case research when changes in a data series occur as a function of changes in the method of assessing the dependent variable over time. One of the most common examples occurs when data are collected by assessors who change their method of assessment over phases of the experiment. Such factors as reactivity, drift, bias, and complexity in recording might influence the data and implicate instrumentation as a potential confounding influence. Reactivity refers to the possibility that observational scores are higher as a result of the researcher monitoring the observers or observational process. Observer drift refers to the possibility that observers may change their observational definitions of the construct being measured over time, thereby not making scores comparable across phases of the experiment. Observational bias refers to the possibility that observers may be influenced by a variety of factors associated with expected or desired experimental outcomes, thereby changing the construct under assessment. Complexity may influence observational assessment in that more complex observational codes present more challenges than less complex codes with respect to obtaining acceptable levels of observer agreement. Numerous recommendations to control these factors have been advanced and can be taken into account (Hartmann, Barrios, & Wood, 2004; Kazdin, 1982). 9. Additive and Interactive Effects of Threats to Internal Validity: The impact of a threat can be added to that of another threat or may be moderated by levels of another threat. In SCDs the aforementioned threats to validity may be additive or interactive. Nevertheless, the “Criteria for Designs that Meet Evidence Standards” and the “Criteria for Demonstrating Evidence of a Relation between an Independent and an Outcome Variable” have been crafted largely to address the internal validity threats noted above. Further, reviewers are encouraged to follow the approach taken with group designs, namely, to consider other confounding factors that might have a separate effect on the outcome variable (i.e., an effect that is not controlled for by the study design). Such confounding factors should be discussed with PIs to determine whether the study Meets Standards.
12 D. THE SINGLE-CASE DESIGN STANDARDS The PI within each topic area will: (1) define the independent and outcome variables under investigation,4 (2) establish parameters for considering fidelity of intervention implementation,5 and (3) consider the reasonable application of the Standards to the topic area and specify any deviations from the Standards in that area protocol. For example, when measuring self-injurious behavior, a baseline phase of fewer than five data points may be appropriate. PIs might need to make decisions about whether the design is appropriate for evaluating an intervention. For example, an intervention associated with a permanent change in participant behavior should be evaluated with a multiple baseline design rather than an ABAB design. PIs will also consider the various threats to validity and how the researcher was able to address these concerns, especially in cases in which the Standards do not necessarily mitigate the validity threat in question (e.g., testing, instrumentation). Note that the SCD Standards apply to both observational measures and standard academic assessments. Similar to the approach with group designs, PIs are encouraged to define the parameters associated with “acceptable” assessments in their protocols. For example, repeated measures with alternate forms of an assessment may be acceptable and WWC psychometric criteria would apply. PIs might also need to make decisions about particular studies. Several questions will need to be considered, such as: (a) Will generalization variables be reported? (b) Will follow-up phases be assessed? (c) If more than one consecutive baseline phase is present, are these treated as one phase or two distinct phases? and (d) Are multiple treatments conceptually distinct or multiple components of the same intervention? SINGLE-CASE DESIGN STANDARDS These Standards are intended to guide WWC reviewers in identifying and evaluating SCDs. The first section of the Standards assists with identifying whether a study is a SCD. As depicted in Figure 1, a SCD should be reviewed using the ‘Criteria for Designs that Meet Evidence Standards’, to determine those that Meet Evidence Standards, those that Meet Evidence Standards with Reservations, and those that Do Not Meet Evidence Standards. Studies that meet evidence standards (with or without reservations) should then be reviewed using the ‘Criteria for Demonstrating Evidence of a Relation between an Independent Variable and a Dependent Variable’ (see Figure 1).6 This review will result in a sorting of SCD studies into three groups: those that have Strong Evidence of a Causal Relation, those that have Moderate Evidence of a Causal Relation, and those that have No Evidence of a Causal Relation. 4 Because SCDs are reliant on phase repetition and effect replication across participants, settings, and researchers to establish external validity, specification of the intervention materials, procedures, and context of the research is particularly important within these studies (Horner et al., 2005). 5 Because interventions are applied over time, continuous measurement of implementation is a relevant consideration. 6 This process results in a categorization scheme that is similar to that used for evaluating evidence credibility by inferential statistical techniques (hypothesis testing, effect-size estimation, and confidence-interval construction) in traditional group designs.
FIGURE 1 FIGURE 1 PROCEDURE FOR APPLYING SCD STANDARDS: FIRST EVALUATE DESIGN, PROCEDURE FOR APPLYING SCD STANDARDS: FIRST EVALUATE DESIGN, THEN IF APPLICABLE, EVALUATE EVIDENCE THEN IF APPLICABLE, EVALUATE EVIDENCE 13 Evaluate the Design Meets Evidence Standards Meets Evidence Standards with Reservations Does Not Meet Evidence Standards Conduct Visual Analysis for Each Outcome Variable Strong Evidence Moderate Evidence No Evidence Effect-Size Estimation
14 A. SINGLE-CASE DESIGN CHARACTERISTICS SCDs are identified by the following features: • An individual “case” is the unit of intervention and the unit of data analysis. A case may be a single participant or a cluster of participants (e.g., a classroom or community). • Within the design, the case provides its own control for purposes of comparison. For example, the case’s series of outcome variables prior to the intervention is compared with the series of outcome variables during (and after) the intervention. • The outcome variable is measured repeatedly within and across different conditions or levels of the independent variable. These different conditions are referred to as “phases” (e.g., baseline phase, intervention phase).7 The Standards for SCDs apply to a wide range of designs, including ABAB designs, multiple baseline designs, alternating and simultaneous treatment designs, changing criterion designs, and variations of these core designs. Even though SCDs can be augmented by including one or more independent comparison cases (i.e., a comparison group), in this document the Standards address only the core SCDs and are not applicable to the augmented independent comparison SCDs. B. CRITERIA FOR DESIGNS THAT MEET EVIDENCE STANDARDS If the study appears to be a SCD, the following rules are used to determine whether the study’s design Meets Evidence Standards, Meets Evidence Standards with Reservations or Does Not Meet Evidence Standards. In order to Meet Evidence Standards, the following design criteria must be present: • The independent variable (i.e., the intervention) must be systematically manipulated, with the researcher determining when and how the independent variable conditions change. If this standard is not met, the study Does Not Meet Evidence Standards. 7 In SCDs, the ratio of data points (measures) to the number of cases usually is large so as to distinguish SCDs from other longitudinal designs (e.g., traditional pretest-posttest and general repeated-measures designs). Although specific prescriptive and proscriptive statements would be difficult to provide here, what can be stated is: (1) parametric univariate repeated-measures analysis cannot be performed when there is only one experimental case; (2) parametric multivariate repeated-measures analysis cannot be performed when the number of cases is less than or equal to the number of measures; and (3) for both parametric univariate and multivariate repeated-measures analysis, standard large-sample (represented here by large numbers of cases) statistical theory assumptions must be satisfied for the analyses to be credible (see also Kratochwill & Levin, in press, Footnote 1).
15 • Each outcome variable must be measured systematically over time by more than one assessor, and the study needs to collect inter-assessor agreement in each phase and on at least twenty percent of the data points in each condition (e.g., baseline, intervention) and the inter-assessor agreement must meet minimal thresholds. Inter-assessor agreement (commonly called interobserver agreement) must be documented on the basis of a statistical measure of assessor consistency. Although there are more than 20 statistical measures to represent inter-assessor agreement (see Berk, 1979; Suen & Ary, 1989), commonly used measures include percentage agreement (or proportional agreement) and Cohen’s kappa coefficient (Hartmann, Barrios, & Wood, 2004). According to Hartmann et al. (2004), minimum acceptable values of inter-assessor agreement range from 0.80 to 0.90 (on average) if measured by percentage agreement and at least 0.60 if measured by Cohen’s kappa. Regardless of the statistic, inter-assessor agreement must be assessed for each case on each outcome variable. A study needs to collect inter-assessor agreement in all phases. It must also collect inter-assessor agreement on at least 20% of all sessions (total across phases) for a condition (e.g., Baseline, Intervention.).8 If this standard is not met, the study Does Not Meet Evidence Standards. • The study must include at least three attempts to demonstrate an intervention effect at three different points in time or with three different phase repetitions. If this standard is not met, the study Does Not Meet Evidence Standards.9 Examples of designs meeting this standard include ABAB designs, multiple baseline designs with at least three baseline conditions, alternating/simultaneous treatment designs with either at least three alternating treatments compared with a baseline condition or two alternating treatments compared with each other, changing criterion designs with at least three different criteria, and more complex variants of these designs. Examples of designs not meeting this standard include AB, ABA, and BAB designs.10 • For a phase to qualify as an attempt to demonstrate an effect, the phase must have a minimum of three data points.11 o To Meet Standards a reversal /withdrawal (e.g., ABAB) design must have a minimum of four phases per case with at least 5 data points per phase. 8 If the PI determines that there are exceptions to this Standard, they will be specified in the topic area or practice guide protocol. These determinations are based on the PIs content knowledge of the outcome variable. 9 The three demonstrations criterion is based on professional convention (Horner, Swaminathan, Sugai, & Smolkowski, under review). More demonstrations further increase confidence in experimental control (Kratochwill & Levin, 2009). 10 Although atypical, there might be circumstances in which designs without three replications meet the standards. A case must be made by the WWC PI researcher (based on content expertise) and at least two WWC reviewers must agree with this decision. 11 If the PI determines that there are exceptions to this standard, these will be specified in the topic area or practice guide protocol. (For example, extreme self-injurious behavior might warrant a lower threshold of only one or two data points).
16 To Meet Standards with Reservations a reversal /withdrawal (e.g., ABAB) design must have a minimum of four phases per case with at least 3 data points per phase. Any phases based on fewer than three data points cannot be used to demonstrate existence or lack of an effect. o To Meet Standards a multiple baseline design must have a minimum of six phases with at least 5 data points per phase. To Meet Standards with Reservations a multiple baseline design must have a minimum of six phases with at least 3 data points per phase. Any phases based on fewer than three data points cannot be used to demonstrate existence or lack of an effect. • An alternating treatment design needs five repetitions of the alternating sequence to Meet Standards. Designs such as ABABBABAABBA, BCBCBCBCBC, and AABBAABBAABB would qualify, even though randomization or brief functional assessment may lead to one or two data points in a phase. A design with four repetitions would Meet Standards with Reservations, and a design with fewer than four repetitions Does Not Meet Standards. C. CRITERIA FOR DEMONSTRATING EVIDENCE OF A RELATION BETWEEN AN INDEPENDENT VARIABLE AND AN OUTCOME VARIABLE For studies that meet standards (with and without reservations), the following rules are used to determine whether the study provides Strong Evidence, Moderate Evidence, or No Evidence of a causal relation. In order to provide Strong Evidence, at least two WWC reviewers certified in visual (or graphical) analysis must verify that a causal relation was documented. Specifically this is operationalized as at least three demonstrations of the intervention effect along with no non-effects by12 • Documenting the consistency of level, trend, and variability within each phase • Documenting the immediacy of the effect, the proportion of overlap, the consistency of the data across phases in order to demonstrate an intervention effect, and comparing the observed and projected patterns of the outcome variable • Examining external factors and anomalies (e.g., a sudden change of level within a phase) If a SCD does not provide three demonstrations of an effect, then the study is rated as No Evidence. If a study provides three demonstrations of an effect and also includes at least one demonstration of a non-effect, the study is rated as Moderate Evidence. The following characteristics must be considered when identifying a non-effect: 12 This section assumes that the demonstration of an effect will be established through “visual analysis,” as described later. As the field reaches greater consensus about appropriate statistical analyses and quantitative effect-size measures, new standards for effect demonstration will need to be developed.
17 – Data within the baseline phase do not provide sufficient demonstration of a clearly defined pattern of responding that can be used to extrapolate the expected performance forward in time assuming no changes to the independent variable – Failure to establish a consistent pattern within any phase (e.g., high variability within a phase) – Either long latency between introduction of the independent variable and change in the outcome variable or overlap between observed and projected patterns of the outcome variable between baseline and intervention phases makes it difficult to determine whether the intervention is responsible for a claimed effect – Inconsistent patterns across similar phases (e.g., an ABAB design in which the first time an intervention is introduced the outcome variable data points are high, the second time an intervention is introduced the outcome variable data points are low, and so on) – Comparing the observed and projected patterns of the outcome variable between phases does not demonstrate evidence of a causal relation When examining a multiple baseline design also consider the extent to which the time in which a basic effect is initially demonstrated with one series (e.g. first five days following introduction of the intervention for participant #1) is associated with change in the data pattern over the same time frame in the other series of the design (e.g. same five days for participants #2, #3, #4). If a basic effect is demonstrated within one series and there is a change in the data patterns in other series, the highest possible design rating is Moderate Evidence. If a study has either Strong Evidence or Moderate Evidence, then effect-size estimation follows. D. VISUAL ANALYSIS OF SINGLE-CASE RESEARCH RESULTS13 Single-case researchers traditionally have relied on visual analysis of the data to determine (a) whether evidence of a relation between an independent variable and an outcome variable exists; and (b) the strength or magnitude of that relation (Hersen & Barlow, 1976; Kazdin, 1982; Kennedy, 2005; Kratochwill, 1978; Kratochwill & Levin, 1992; McReynolds & Kearns, 1983; Richards, Taylor, Ramasamy, & Richards, 1999; Tawney & Gast, 1984; White & Haring, 1980). An inferred causal relation requires that changes in the outcome measure resulted from manipulation of the independent variable. A causal relation is demonstrated if the data across all phases of the study document at least three demonstrations of an effect at a minimum of three different points in time (as specified in the Standards). An effect is documented when the data pattern in one phase (e.g., an intervention phase) differs more than would be expected from the 13 Prepared by Robert Horner, Thomas Kratochwill, and Samuel Odom.
18 data pattern observed or extrapolated from the previous phase (e.g., a baseline phase) (Horner et al., 2005). Our rules for conducting visual analysis involve four steps and six variables (Parsonson & Baer, 1978). The first step is documentation of a predictable baseline pattern of data (e.g., student is reading with many errors; student is engaging in high rates of screaming). If a convincing baseline pattern is documented, then the second step consists of examining the data within each phase of the study to assess the within-phase pattern(s). The key question is to assess whether there are sufficient data with sufficient consistency to demonstrate a predictable pattern of responding (see below). The third step in the visual analysis process is to compare the data from each phase with the data in the adjacent (or similar) phase to assess whether manipulation of the independent variable was associated with an “effect.” An effect is demonstrated if manipulation of the independent variable is associated with predicted change in the pattern of the dependent variable. The fourth step in visual analysis is to integrate all the information from all phases of the study to determine whether there are at least three demonstrations of an effect at different points in time (i.e., documentation of a causal or functional relation) (Horner et al., in press). To assess the effects within SCDs, six features are used to examine within- and between-phase data patterns: (1.) level, (2.) trend, (3.) variability, (4.) immediacy of the effect, (5.) overlap, and (6.) consistency of data patterns across similar phases (Fisher, Kelley, & Lomas, 2003; Hersen & Barlow, 1976; Kazdin, 1982; Kennedy, 2005; Morgan & Morgan, 2009; Parsonson & Baer, 1978). These six features are assessed individually and collectively to determine whether the results from a single-case study demonstrate a causal relation and are represented in the “Criteria for Demonstrating Evidence of a Relation between an Independent Variable and Outcome Variable” in the Standards. “Level” refers to the mean score for the data within a phase. “Trend” refers to the slope of the best-fitting straight line for the data within a phase and “variability” refers to the range or standard deviation of data about the best-fitting straight line. Examination of the data within a phase is used (a) to describe both the observed pattern of a unit’s performance and (b) to extrapolate the expected performance forward in time assuming no changes in the independent variable were to occur (Furlong & Wampold, 1981). The six visual analysis features are used collectively to compare the observed and projected patterns for each phase with the actual pattern observed after manipulation of the independent variable. This comparison of observed and projected patterns is conducted across all phases of the design (e.g., baseline to treatment, treatment to baseline, treatment to treatment, etc.). In addition to comparing the level, trend, and variability of data within each phase, the researcher also examines data patterns across phases by considering the immediacy of the effect, overlap, and consistency of data in similar phases. “Immediacy of the effect” refers to the change in level between the last three data points in one phase and the first three data points of the next. The more rapid (or immediate) the effect, the more convincing the inference that change in the outcome measure was due to manipulation of the independent variable. Delayed effects might actually compromise the internal validity of the design. However, predicted delayed effects or gradual effects of the intervention may be built into the design of the experiment that would then influence decisions about phase length in a particular study. “Overlap” refers to the proportion of data from one phase that overlaps with data from the previous phase. The smaller the proportion of overlapping data points (or conversely, the larger the separation), the more compelling the
19 demonstration of an effect. “Consistency of data in similar phases” involves looking at data from all phases within the same condition (e.g., all “baseline” phases; all “peer-tutoring” phases) and examining the extent to which there is consistency in the data patterns from phases with the same conditions. The greater the consistency, the more likely the data represent a causal relation. These six features are assessed both individually and collectively to determine whether the results from a single-case study demonstrate a causal relation. Regardless of the type of SCD used in a study, visual analysis of: (1) level, (2) trend, (3) variability, (4) overlap, (5) immediacy of the effect, and (6) consistency of data patterns across similar phases are used to assess whether the data demonstrate at least three indications of an effect at different points in time. If this criterion is met, the data are deemed to document a causal relation, and an inference may be made that change in the outcome variable is causally related to manipulation of the independent variable (see Standards). Figures 1–8 provide examples of the visual analysis process for one common SCD, the ABAB design, using proportion of 10-second observation intervals with child tantrums as the dependent variable and a tantrum intervention as the independent variable. The design is appropriate for interpretation because the ABAB design format allows the opportunity to assess a causal relation (e.g., to assess if there are three demonstrations of an effect at three different points in time, namely the B, A, and B phases following the initial A phase). Step 1: The first step in the analysis is to determine whether the data in the Baseline 1 (first A) phase document that: (a) the proposed concern/problem is demonstrated (tantrums occur too frequently) and (b) the data provide sufficient demonstration of a clearly defined (e.g., predictable) baseline pattern of responding that can be used to assess the effects of an intervention. This step is represented in the Evidence Standards because if a proposed concern is not demonstrated or a predictable pattern of the concern is not documented, the effect of the independent variable cannot be assessed. The data in Figure 1 in Appendix B demonstrate a Baseline 1 phase with 11 sessions, with an average of 66 percent throwing tantrums across these 11 sessions. The range of tantrums per session is from 50 percent to 75 percent with an increasing trend across the phase and the last three data points averaging 70 percent. These data provide a clear pattern of responding that would be outside socially acceptable levels, and if left unaddressed would be expected to continue in the 50 percent to 80 percent range. The two purposes of a baseline are to (a) document a pattern of behavior in need of change, and (b) document a pattern that has sufficiently consistent level and variability, with little or no trend, to allow comparison with a new pattern following intervention. Generally, stability of a baseline depends on a number of factors and the options the researcher has selected to deal with instability in the baseline (Hayes et al., 1999). One question that often arises in single-case design research is how many data points are needed to establish baseline stability. First, the amount of variability in the data series must be considered. Highly variable data may require a longer phase to establish stability. Second, if the effect of the intervention is expected to be large and demonstrates a data pattern that far exceeds the baseline variance, a shorter baseline with some instability may be sufficient to move forward with intervention implementation. Third, the quality of measures selected for the study may impact how willing the researcher/reviewer is to accept the length of the baseline. In terms of addressing an unstable baseline series, the researcher has the options of: (a) analyzing and reporting the source of variability; (b) waiting to
20 see whether the series stabilizes as more data are gathered; (b) considering whether the correct unit of analysis has been selected for measurement and if it represents the reason for instability in the data; and (d) moving forward with the intervention despite the presence of baseline instability. Professional standards for acceptable baselines are emerging, but the decision to end any baseline with fewer than five data points or to end a baseline with an outlying data point should be defended. In each case it would be helpful for reviewers to have this information and/or contact the researcher to determine how baseline instability was addressed, along with a rationale. Step 2: The second step in the visual analysis process is to assess the level, trend, and variability of the data within each phase and to compare the observed pattern of data in each phase with the pattern of data in adjacent phases. The horizontal lines in Figure 2 illustrate the comparison of phase levels and the lines in Figure 3 illustrate the comparison of phase trends. The upper and lower defining range lines in Figure 4 illustrate the phase comparison for phase variability. In Figures 2–4, the level and trend of the data differ dramatically from phase to phase; however, changes in variability appear to be less dramatic. Step 3: The information gleaned through examination of level, trend, and variability is supplemented by comparing the overlap, immediacy of the effect, and consistency of patterns in similar phases. Figure 5 illustrates the concept of overlap. There is no overlap between the data in Baseline 1 (A1) and the data in Intervention 1 (B1). There is one overlapping data point (10 percent; session 28) between Intervention 1 (B1) Baseline 2 (A2), and there is no overlap between Baseline 2 (A2) and Intervention 2 (B2). Immediacy of the effect compares the extent to which the level, trend, and variability of the last three data points in one phase are discriminably different from the first three data points in the next. The data in the ovals, squares, and triangles of Figure 6 illustrate the use of immediacy of the effect in visual analysis. The observed effects are immediate in each of the three comparisons (Baseline 1 and Intervention 1, Intervention 1 and Baseline 2, Baseline 2 and Intervention 2). Consistency of similar phases examines the extent to which the data patterns in phases with the same (or similar) procedures are similar. The linked ovals in Figure 7 illustrate the application of this visual analysis feature. Phases with similar procedures (Baseline 1 and Baseline 2, Intervention 1 and Intervention 2) are associated with consistent patterns of responding. Step 4: The final step of the visual analysis process involves combining the information from each of the phase comparisons to determine whether all the data in the design (data across all phases) meet the standard for documenting three demonstrations of an effect at different points in time. The bracketed segments in Figure 8 (A, B, C) indicate the observed and projected patterns of responding that would be compared with actual performance. Because the observed data in the Intervention 1 phase are outside the observed and projected data pattern of Baseline 1, the Baseline 1 and Intervention 1 comparison demonstrates an effect (Figure 8A). Similarly, because the data in Baseline 2 are outside of the observed and projected patterns of responding in Intervention 1, the Intervention 1 and Baseline 2 comparison demonstrates an effect (Figure 8B). The same logic allows for identification of an effect in the Baseline 2 and Intervention 2
21 comparison. Because the three demonstrations of an effect occur at different points in time, the full set of data in this study are considered to document a causal relation as specified in the Standards. The rationale underlying visual analysis in SCDs is that predicted and replicated changes in a dependent variable are associated with active manipulation of an independent variable. The process of visual analysis is analogous to the efforts in group-design research to document changes that are causally related to introduction of the independent variable. In group-design inferential statistical analysis, a statistically significant effect is claimed when the observed outcomes are sufficiently different from the expected outcomes that they are deemed unlikely to have occurred by chance. In single-case research, a claimed effect is made when three demonstrations of an effect are documented at different points in time. The process of making this determination, however, requires that the reader is presented with the individual unit’s raw data (typically in graphical format) and actively participates in the interpretation process. There will be studies in which some participants demonstrate an intervention effect and others do not. The evidence rating (Strong Evidence, Moderate Evidence, or No Evidence) accounts for mixed effects. E. RECOMMENDATIONS FOR COMBINING STUDIES When implemented with multiple design features (e.g., within- and between-case comparisons), SCDs can provide a strong basis for causal inference (Horner et al., 2005). Confidence in the validity of intervention effects demonstrated within cases is enhanced by replication of effects across different cases, studies, and research groups (Horner & Spaulding, in press). The Single-Case Designs Standards Panel recommends that the threshold for including a body of SCDs in an intervention report is as follows:14 1. A minimum of five SCD research papers examining the intervention that Meet Evidence Standards or Meet Evidence Standards with Reservations 2. The SCD studies must be conducted by at least three different research teams at three different geographical locations 3. The combined number of experiments (i.e., single-case design examples) across the papers totals at least 20 14 These are based on professional conventions. Future work with SCD meta-analysis can offer an empirical basis for determining appropriate criteria and these recommendations might be revised.
22 F. EFFECT-SIZE ESTIMATES FOR SINGLE-CASE DESIGNS15 Effect-size estimates are available for most designs involving group comparisons, and in meta-analyses there is widespread agreement about how these effect sizes (ES) should be expressed, what the statistical properties of the estimators are (e.g., distribution theory, conditional variance), and how to translate from one measure (e.g., a correlation) to another (e.g., Hedges’ g). This is not true for SCDs; the field is much less well-developed, and there are no agreed-upon methods or standards for effect size estimation. What follows is a brief summary of the main issues, with a more extensive discussion in an article by Shadish, Rindskopf, and Hedges (2008). Several issues are involved in creating effect size estimates. First is the general issue of how to quantify the size of an effect. One can quantify the effect for a single case, or for a group of cases within one study, or across several SCD studies. Along with a quantitative ES estimate, one must also consider the accuracy of the estimate; generally the issues here are estimating a standard error, constructing confidence intervals, and testing hypotheses about effect sizes. Next is the issue of comparability of different effect sizes for SCDs. Finally the panel considers comparability of ES estimates for SCDs and for group-based designs. Most researchers using SCDs still base their inferences on visual analysis, but several quantitative methods have been proposed. Each has flaws, but some methods are likely to be more useful than others; the panel recommends using some of these until better methods are developed. A number of nonparametric methods have been used to analyze SCDs (e.g., Percentage of Nonoverlapping Data [PND], Percentage of All Nonoverlapping Data [PAND], or Percent Exceeding the Median [PEM]). Some of these have been accompanied by efforts to convert them to parametric estimators such as the phi coefficient, which might in turn be comparable to typical between-groups measures. If that could be done validly, then one could use distribution theory from standard estimators to create standard errors and significance tests. However, most such efforts make the erroneous assumption that nonparametric methods do not need to be concerned with the assumption of independence of errors, and so the conversions might not be valid. In such cases, the distributional properties of these measures are unknown, and so standard errors and statistical tests are not formally justified. Nonetheless, if all one wanted was a rough measure of the approximate size of the effect without formal statistical justification or distribution theory, selecting one of these methods would make sense. However, none of these indices deal with trend, so the data would need to be detrended16 with, say, first-order differencing before 15 Prepared by David Rindskopf and William Shadish. 16 When a trend is a steady increase or decrease in the dependent variable over time (within a phase), such a trend would produce a bias in many methods of analysis of SCD data. For example, if with no treatment, the number of times a student is out of her seat each day for 10 days is 20, 19, 18, 17, 16, 15, 14, 13, 12, 11, this is a decreasing trend. If a “treatment” is introduced after the fifth day, so that the last 5 days’ data are during a treatment phase, some methods would find the treatment very effective. For example, all of the measurements after the treatment are lower than any of the measurements before the treatment, apparently showing a strong effect. To correct for the effect of trend (i.e., to “detrend” the data), one can either subtract successive observations (e.g., 19-
23 computing the index. One could combine the results with ordinary unweighted averages, or one could weight by the number of cases in a study. Various parametric methods have been proposed, including regression estimates and multilevel models. Regression estimates have three advantages. First, many primary researchers are familiar with regression so both the analyses and the results are likely to be easily understood. Second, these methods can model trends in the data, and so do not require prior detrending of the data. Third, regression can be applied to obtain an effect size from a single case, whereas multilevel models require several cases within a study. But they also come with disadvantages. Although regression models do permit some basic modeling of error structures, they are less flexible than multilevel models in dealing with complex error structures that are likely to be present in SCD data. For multi-level models, many researchers are less familiar with both the analytic methods and the interpretation of results, so that their widespread use is probably less likely than with regression. Also, practical implementation of multilevel models for SCDs is technically challenging, probably requiring the most intense supervision and problem-solving of any method. Even if these technical developments were to be solved, the resulting estimates would still be in a different metric than effect-size estimates based on between-group studies, so one could not compare effect sizes from SCDs to those from group studies. A somewhat more optimistic scenario is that methods based on multilevel models can be used when data from several cases are available and the same outcome measure is used in all cases. Such instances do not require a standardized effect-size estimator because the data are already in the same metric. However, other technical problems remain, estimators are still not comparable with those from between-groups studies (see further discussion below), and such instances tend to be rare across studies. The quantitative methods that have been proposed are not comparable with those used in group-comparison studies. In group studies, the simplest case would involve the comparison of two groups, and the mean difference would typically be standardized by dividing by the control group variance or a pooled within-group variance. These variances reflect variation across people. In contrast, single-case designs, by definition, involve comparison of behavior within an individual (or other unit), across different conditions. Attempts to standardize these effects have usually involved dividing by some version of a within-phase variance, which measures variation of one person’s behavior at different times (instead of variation across different people). Although there is nothing wrong statistically with doing this, it is not comparable with the usual between-groups standardized mean difference statistic. Comparability is crucial if one wishes to compare results from group designs with SCDs. That being said, some researchers would argue that there is still merit in computing some effect size index like those above. One reason is to encourage the inclusion of SCD data in recommendations about effective interventions. Another reason is that it seems likely that the rank ordering of most to least effective treatments would be highly similar no matter what effect (continued) 20, 18-19, etc.) and compile these in a vector within a phase (one cannot subtract from the final observation and so it is excluded) which is called differencing, or use statistical methods that adjust for this trend.
24 size metric is used. This latter hypothesis could be partially tested by computing more than one of these indices and comparing their rank ordering. An effect-size estimator for SCDs that is comparable to those used in between-groups studies is badly needed. Shadish et al. (2008) have developed an estimator for continuous outcomes that is promising in this regard, though the distribution theory is still being derived and tested. However, the small number of cases in most studies would make such an estimate imprecise (that is, it would have a large standard error and an associated wide confidence interval). Further, major problems remain to be solved involving accurate estimation of error structures for noncontinuous data—for example, different distributional assumptions that might be present in SCDs (e.g., count data should be treated as Poisson distributed). Because many outcomes in SCDs are likely to be counts or rates, this is a nontrivial limitation to using the Shadish et al. (2008) procedure. Finally, this method does not deal adequately with trend as currently developed, although standard methods for detrending the data might be reasonable to use. Hence, it might be premature to advise the use of these methods except to investigate further their statistical properties. Until multilevel methods receive more thorough investigation, the panel suggests the following guidelines for estimating effect sizes in SCDs. First, in those rare cases in which the dependent variable is already in a common metric, such as proportions or rates, then these are preferred to standardized scales. Second, if only one standardized effect-size estimate is to be chosen, the regression-based estimators are probably best justified from both technical and practical points of view in that SCD researchers are familiar with regression. Third, the panel strongly recommends doing sensitivity analyses. For example, one could report one or more nonparametric estimates (but not the PND estimator, because it has undesirable statistical properties) in addition to the regression estimator. Results can then be compared over estimators to see if they yield consistent results about which interventions are more or less effective. Fourth, summaries across cases within studies and across studies (e.g., mean and standard deviation of effect sizes) can be computed when the estimators are in a common metric, either by nature (e.g., proportions) or through standardization. Lacking appropriate standard errors to use with the usual inverse-variance weighting, one might report either unweighted estimators or estimators weighted by a function of either the number of cases within studies or the number of time points within cases, although neither of these weights has any strong statistical justification in the SCD context.
25 REFERENCES APA Publications and Communications Board Working Group on Journal Article Reporting Standards, (2008). Reporting standards for research in Psychology: Why do we need them? What might they be? American Psychologist, 63, 839-851. Barlow, D. H., & Hayes, S. C. (1979). Alternating treatments design: One strategy for comparing the effects of two treatments in a single subject. Journal of Applied Behavior Analysis, 12, 199–210. Berk, R. A. (1979). Generalizability of behavioral observations: A clarification of interobserver agreement and interobserver reliability. American Journal of Mental Deficiency, 83, 460–472. Cone, J. D. (2001). Evaluating outcomes: Empirical tools for effective practice. Washington, DC: American Psychological Association. Fisher, W., Kelley, M., & Lomas, J. (2003). Visual aids and structured criteria for improving visual inspection and interpretation of single-case designs. Journal of Applied Behavior Analysis, 36, 387–406. Furlong, M., & Wampold, B. (1981). Visual analysis of single-subject studies by school psychologists. Psychology in the Schools, 18, 80–86. Hartmann, D. P., Barrios, B. A., & Wood, D. D. (2004). Principles of behavioral observation. In S. N. Haynes and E. M. Hieby (Eds.), Comprehensive handbook of psychological assessment (Vol. 3, Behavioral assessment) (pp. 108-127). New York: John Wiley & Sons. Hayes, S. C. (1981). Single-case experimental designs and empirical clinical practice. Journal of Consulting and Clinical Psychology, 49, 193–211. Hayes, S. C., Barlow, D. H., Nelson-Gray, R. O. (1999). The scientist practitioner: Research and accountability in the age of managed care (2nd ed.). Needham Heights, MA: Allyn & Bacon. Hersen, M., & Barlow, D. H. (1976). Single-case experimental designs: Strategies for studying behavior change. New York: Pergamon. Horner, R. H., Carr, E. G., Halle, J., McGee, G., Odom, S., Wolery, M. (2005). The use of single subject research to identify evidence-based practice in special education. Exceptional Children 71(2), 165–179. Horner, R., & Spaulding, S. (in press). Single-Case Research Designs. Encyclopedia. Springer.
26 Horner, R., Swaminathan, H., Sugai, G., & Smolkowski, K. (in press). Expanding analysis of single case research. Washington, DC: Institute of Education Science, U.S. Department of Education. Kazdin, A. E. (1982). Single-case research designs: Methods for clinical and applied settings. New York: Oxford University Press. Kazdin, A. E. (in press). Single-case research designs: Methods for clinical and applied settings (2nd ed.). New York: Oxford University Press. Kennedy, C. H. (2005). Single-case designs for educational research. Boston: Allyn and Bacon. Kratochwill, T. R. (Ed.). (1978). Single subject research: Strategies for evaluating change. New York: Academic Press. Kratochwill, T. R. (1992). Single-case research design and analysis: An overview In T. R. Kratochwill & J. R. Levin (Eds.), Single-case research design and analysis: New directions for psychology and education (pp. 1–14). Hillsdale, NJ: Erlbaum. Kratochwill, T. R., & Levin, J. R. (Eds.). (1992). Single-case research design and analysis: New directions for psychology and education. Hillsdale, NJ: Erlbaum. Kratochwill, T. R., & Levin, J. R. (In press). Enhancing the scientific credibility of single-case intervention research: Randomization to the rescue. Psychological Methods. Levin, J. R. (1994). Crafting educational intervention research that’s both credible and creditable. Educational Psychology Review, 6, 231–243. Levin, J. R., O’Donnell, A. M., & Kratochwill, T. R. (2003). Educational/psychological intervention research. In I. B. Weiner (Series Ed.) and W. M. Reynolds & G. E. Miller (Vol. Eds.). Handbook of psychology: Vol. 7. Educational psychology (pp. 557–581). New York: Wiley. Moher, D., Schulz, K. F., & Altman, D. G. (2001). The CONSORT statement: Revised recommendations for improving the quality of reports of parallel-group randomized trials. Annals of Internal Medicine, 134, 657–662. Morgan, D., & Morgan R., (2009). Single-case research methods for the behavioral and health sciences. Los Angles, Sage Publications Inc. McReynolds, L. & Kearns, K. (1983). Single-subject experimental designs in communicative disorders. Baltimore: University Park Press. Odom, S.L., Brantlinger, E., Gersten, R., Horner, R. H., Thompson, B., & Harris, K. (2005). Research in special education: Scientific methods and evidence-based practices. Exceptional Children 71(2), 137–148.
27 Parsonson, B., & Baer, D. (1978). The analysis and presentation of graphic data. In T. Kratchowill (Ed.) Single Subject Research (pp. 101–166). New York: Academic Press. Richards, S. B., Taylor, R., Ramasamy, R., & Richards, R. Y. (1999). Single subject research: Applications in educational and clinical settings. Belmont, CA: Wadsworth. Shadish, W. R., Cook, T. D., & Campbell, D. T. (2002). Experimental and quasi-experimental designs for generalized causal inference. Boston: Houghton Mifflin. Shadish, W. R., & Cook, T. D. (2009). The renaissance of field experimentation in evaluating interventions. Annual Review of Psychology, 60, 607–629. Shadish, W. R., Rindskopf, D. M. & Hedges, L. V. (2008). The state of the science in the meta-analysis of single-case experimental designs. Evidence-Based Communication Assessment and Intervention, 3, 188–196. Suen, H. K., & Ary, D. (1989). Analyzing quantitative behavioral observation data. Hillsdale, NJ: Erlbaum. Tawney, J. W., & Gast, D. L. (1984). Single subject research in special education. Columbus, OH: Merrill. What Works Clearinghouse. (2008). Procedures and standards handbook (version 2.0). Retrieved July 10, 2009, from http://ies.ed.gov/ncee/wwc/references/idocviewer/doc.aspx?docid=19&tocid=1 White, O. R., & Haring, N. G. (1980). Exceptional teaching (2nd ed.). Columbus, Ohio: Charles E. Merrill.
APPENDIX A ATTRITION DIAGRAM
ATTRITION DIAGRAM Assessed for Eligibility (individual n = …, group n = …, individuals within groups n =…) Allocated to intervention(individual n = …, group n = …, individuals within groups n =…) • Received allocated intervention (individual n = …, group n = …, individuals within groups n =…) • Did not receive allocated intervention (individual n = …, group n = …, individuals within groups n =…) • Refused to participate (individual n = …, group n = …, individuals within groups n =…) • Other reasons (give reasons) (individual n = …, group n = …, individuals within groups n =…) ABAB, Multiple Baseline and Alternating Treatment Designs • Received required number of phases (or alternations) (individual n = …, group n = …, individuals within groups n =…) • Discontinued intervention (give reasons) (individual n = …, group n = …, individuals within groups n =…) ABAB and Multiple Baseline Designs • Had adequate number of data points (individual n = …, group n = …, individuals within groups n =…) • Did not have a minimum number of data points (give reasons) (individual n = …, group n = …, individuals within groups n =…) • Did not meet inclusion criteria (individual n = …, group n = …, individuals within groups n = …) Excluded (individual n = …, group n = …, individuals within groups n =…) Note whether the case (i.e., unit of analysis) is an individual or a group
APPENDIX B VISUAL ANALYSIS
VISUAL ANALYSIS 020406080100Percentage of Intervals with TantrumSessionsIntervention 1Intervention 2Baseline 1020406080100Level SessionsIntervention 1Intervention 2020406080100TrendSessionsIntervention 1Intervention 2Figure 1: Depiction of an ABAB Design Figure 2: An example of assessing level within four phases of an ABAB design. Figure 3: An example of assessing trend in each phase of an ABAB design. 31
VISUAL ANALYSIS (continued) 020406080100SessionsIntervention 2Baseline2Intervention 1Baseline1Variability020406080100Overlap SessionsIntervention 1Baseline1020406080100Immediacy of EffectSessionsFigure 5: Consider Overlap Between Phases Figure 4: Assess Variability within Each Figure 6: Examine the Immediacy of Effect with Each 32
VISUAL ANALYSIS (continued) 020406080100SessionsConsistency020406080100SessionsIntervention 1Intervention 2Baseline 1020406080100SessionsIntervention 1Intervention 2Baseline 1Figure 7: Examine Consistency Across Similar Phases Figure 8 A: Examine Observed and Projected Comparison BL 1 to Int 1 Figure 8 B: Examine Observed and Projected Comparison Int 1 to BL 2 33
VISUAL ANALYSIS (continued) 34 020406080100SessionsIntervention 1Intervention 2Baseline 1Figure 8 C: Examine Observed and Projected Comparison BL 2 to Int 2
Single-caseexperimentaldesigns:Characteristics,changes,andchallengesAlanE.KazdinYaleUniversityTacticsofScientificResearch(Sidman,1960)providesavisionarytreatiseonsingle-casedesigns,theirsci-entificunderpinnings,andtheircriticalroleinunderstandingbehavior.Sincethefoundationalbasewasprovided,single-casedesignshaveproliferatedespeciallyinareasofapplicationwheretheyhavebeenusedtoevaluateinterventionswithanextraordinaryrangeofclients,settings,andtargetfoci.Thisarticlehighlightscorefeaturesofsingle-caseexperimentaldesigns,howkeyandancillaryfeaturesofthedesignshaveevolved,thespecialstrengthsofthedesigns,andchallengesthathaveimpededtheirintegrationinmanyareaswheretheircontributionsaresorelyneeded.Thearticleendsbyplacingthemethodologicalapproachinthecontextofotherresearchtraditions.Inthisway,thediscussionmovesfromthespecificdesignstowardfoundationsandphilosophyofscienceissuesinkeepingwiththestrengthsofthepersonandbookwearehonoring.Keywords:characteristics,changes,challengesTacticsofScientificResearch(Sidman,1960)artic-ulatestheunderpinningsandfoundationsofsingle-caseexperimentaldesigns.Thebookexplainedwhatscienceistryingtoaccomplishandhowthestudyoftheindividualsubjectpro-videsanexcellentwaytoaccomplishthat.Itissig-nificantthatSidmanemphasizedprocessesofscienceandtheunderpinningsofmethodologyratherthanafixedsetofproceduresorvariationsofthearrangementsordesigns.Tobesure,hecommentedonfacetsofthedesignsthatarecen-tral(e.g.,establishingexperimentalcontrol,steadystates,replicationofeffects,andtheself-correctivenatureofscience).Bythetimethebookwaspublished,theviabilityoftheapproachhadalreadyproventobeofuseinyearsoflabo-ratoryresearch(e.g.,Skinner,1938,1956),apointthatSidmannoted.ThemessagesinTacticswereenhancedfurtherbySidman’smanysub-stantivecontributions(e.g.,free-operantavoid-ance,stimuluscontrol,stimulusequivalence),whichprovidedrichexamplesofexperimentalcontroloverbehaviorandtheyieldfromfocusingontheindividualcase.Astothebookanditsimpactandroleinbehavioralpsychology,morethanoneauthorhasreferredtoTacticsastheBibleamongoperantconditioners(Fuqua,1990;Skinner,1984).Itishardtoimproveonthatexcepttosayitisabookaboutscienceandscientificresearchaswellthatarticulatelydescribeswhatwearetryingtoaccomplish,why,andhowinourrolesasscientists.FromthefoundationsofTactics,fastforwardtosingle-casedesignsincontemporaryresearch.Thereisnowaricharrayofdesignsandproce-duresgoverningtheiruseandimplementation,asevidentinmultipletexts(e.g.,Barlowetal.,2009;Cooperetal.,2020;Gast&Ledford,2018;Kazdin,2021Kratochwill&Levin,2015).Vastliteraturesinbothbasicandappliedresearchhavedrawnheavilyonthesedesignstoelaboratefundamentalprocessesofhumanandnonhumananimalbehaviorandtodeviseandapplyeffectiveinterventionsineducation,counseling,psychology,medicine,rehabilitationandotherareas.InthisarticleIdiscusssingle-casedesignsinthecontextofappliedresearch.Manyofthepointsapplybroadlytothedesignsapartfromthesubstan-tivefocusofagivenstudy.Yet,thefocusonappliedareasunderscoresthewidespreaduseandpotentialforextensionstomanydisciplinesanddomainsofeverydaylife.Thisarticlehigh-lightsthedesignsandtheiressentialfeatures,thelogicofhowcausalinferencesaredrawn.Howthedesignshaveevolved,thespecialstrengthsofthedesigns,andchallengesthathaveimpededtheirintegrationinmanyareaswheretheircontributionsaresorelyneeded.Thearticleplacessingle-casemethodsinthecontextofotherresearchtraditionsandconveysthelimitsofrestrictingresearchtoanyoneapproach.Addresscorrespondenceto:AlanE.Kazdin,PhD,ABPP,DepartmentofPsychology,YaleUniversity,2HillhouseAve-nue,NewHaven,[email protected]:10.1002/jeab.638JournaloftheExperimentalAnalysisofBehavior2021,115,56–85NUMBER1(JANUARY)©2020SocietyfortheExperimentalAnalysisofBehavior56
Single-CaseExperimentalDesignsOverviewTherearedifferenttermstorefertothesedesignsthatcanfosterconsiderableconfusion.Tobegin,“single-case,”“N=1,”“N-of-1,”and“single-patienttrials”implythatonlyonesub-jectisincludedinaninvestigation.Oftenthisistrue,butitisnotaninherentfeatureofthedesigns.Indeed,single-casedesignsinappliedsettingshavealonghistoryofevaluatinginter-ventionsinwhichmultipleschools,classrooms,andstudentsparticipateandinwhichtheactualorpotentialparticipantsincludehun-dreds,thousands,orevenmorethanamillionparticipants(e.g.,Coxetal.,2000;Fournieretal.,2004;McSweeney,1978;Parsonsetal.,1987;Schnelleetal.,1978).Theterm“intrasubjectdesign”and“intrasubject-replicationdesign”aresometimesused.Thesetermsaccuratelyreflectthefactthatthemethodologyusuallyfocusesonperfor-manceofthesameperson(s)overtime.Yet,thetermsarepartiallymisleadingbecausesomeofthedesignsdependonlookingattheeffectsofinterventionsacrosssubjectsorinter-subject.Someusesofthedesignincommunitysettings(e.g.,withafocusontheproportionofpeoplewhotextwhiledriving,whousereus-ableshoppingbags,whowearprotectivemasks)donotmuchcareaboutoratleastmea-surewhothesubjectisandwhetherthebehav-iorofanyindividualsubjectchanges.“Intensivedesigns,”yetanotherterm,suggeststhatoneorfewindividualsmaybestudiedmoreintensively(e.g.,manyassessmentoccasions)incontrasttofewassessmentsinabetween-groupstudy.Yet,theterm“intensive”isambiguousandhastheunfortunateconnotationthattheinvestigatorisworkingintensivelytostudythesubject,whichprobablyistruebutisbesidethepoint.“Case”canfosteritsownsetofobjectionsintheterm“single-casedesigns.”Acasecanbeoneindividual,oneschool,business,organization,country,andobviouslyhasnoinherentbound-aries.Yet,thewordhasallsortsofbaggagebecauseitconjuresupacasestudyandwiththatananecdotalcasestudy(e.g.,àlaFreud),whichmightbeafascinatingnarrativedescription,butusuallyomitsobjectivemeasures,controls,orbasestodrawvalidinferences.Inaddition,“case”ispartofanotherareaofwork,namely“thecasestudy”approach,whichisrecognizedasawayofpresentinginformationandhasnoconnectiontoexperimentalresearchmethods(e.g.,Gary,2016;Hyzy,2017).Similarly,qualita-tiveresearch,arigorousmethodinitsownright,oftenusestheterm“case”torefertoitsapproach(e.g.,DeChesnay,2017),andthattoodistractsfromtheuseof“case”insingle-caseexperimentaldesigns.Overall,single-casedesignsprovidearicharrayofexperimentalarrangementsthatcanevaluateindividualsaswellaslargegroups.Thisislosttotheuninitiatedbecausetoomuchisembeddedin“single”and“case”thatmisleadorconjureupverydifferentapproaches.KeyCharacteristicsTwomaincharacteristicsaredefiningoressentialfeaturesofsingle-casedesigns.First,thedesignsrequireongoingorcontinuousassessmentovertime.Measuresareadminis-teredonmultipleoccasionswithinseparatephases.Second,interventioneffectsarerepli-catedusuallywithinthesameparticipant(s)overtime.Thedesignsdifferintheprecisewayinwhichinterventioneffectsarerepli-cated,buteachdesigntakesadvantageofrepeatedassessmentovertimeandevaluationofperformanceunderdifferentconditions.Ongoingorcontinuousassessmentofperfor-manceofoneormoreindividualsovertimeisinsharpcontrasttobetween-groupinterven-tionresearchwherethereusuallyisapre-andpostinterventionassessment.Conclusionsaboutinterventioneffectsingroupresearcharebasedprimarilyoncomparisonsofmeansatthefinalassessment.Insingle-casedesigns,theassessmentiscontinuedthroughoutthecourseofthestudyandthisinformationisusedtomakeandtestpredictionsusingthesubjectsastheirowncontrols.Overthecourseofaninves-tigation,thereusuallyarevariousphasesorperiodsoftimeconsistingofseveraloccasions(e.g.,days)inwhichaspecificconditionisineffect.Thedifferentphasesandthepatternsofbehaviortowhichtheyleadareusuallyrepli-catedinsomewayinthedesigns.Thedesignsdependonstableratesofperformancewithinagivenphase.Thisreferstolittleornotrend(slope)andlittlevariabilityinthedata.11Themethodologyalsoincludesfeaturesrelatedtoassessment,reliability,validity,accuracy,andagreementofmeasures,evaluationofinterventionintegrity,andothers(seeKazdin,2021).57Single-CaseExperimentalDesigns
Drawingonthesebasiccharacteristics,therearemanydifferentsingle-casedesigns.Thedesignsarespecificwaysinwhichthesituationisarrangedtodemonstrateafunctionalrela-tionbetweenanexperimentalmanipulationorinterventionandbehaviorchange.Table1highlightsmajordesigns.Ineachcase,therearemultiplevariations,sothatthedesigntypeismoreofafamilyofdesignsratherthanasin-gle,fixedvariation.Duration,numberofphases,andnumberofinterventionsallcanvarywithinasingletypeofdesign(Kazdin,2021).Moreover,componentsfromthediffer-entdesignsarefrequentlycombinedtostrengthentheinferencessothatthevastrangeofvariationsaredifficulttocount.LogicoftheDesignsEachofthedesignsfunctionsbymakingandtestingpredictions,withvariationsonhowthisisdone,ineachcaserelyingonthekeycharacteristics.Thelogicofhowthisworksisnicelyillustratedbymentioningonevariation,namely,theABABdesign.Inthebasicvaria-tion,theeffectsofanintervention(B)areevaluatedbyalternatingthebaselinecondition(Aphase),whennointerventionisineffect,withtheinterventioncondition(Bphase).Inthedesign,theAandBphasesarerepeatedtocompletethefourphases.Theinitialphasebeginswithbaselineobser-vations(ongoingassessment)whenbehaviorisobservedunderconditionsbeforeaninterven-tion)isimplemented.Thisphaseiscontinueduntiltherateofthebehaviorappearstobestable.Thisphaseandtheotherphaseshavenofixedorpresetduration;thedurationisbasedontheeaseofidentifyingapatterninthedata,referredtobySidman(1960)assteadystates.Baselineobservationsservetwopurposes,namely,todescribethecurrentlevelofbehaviorandtopredictwhatbehaviorwouldbelikeinTable1OverviewofSelectedDesignsinSingle-CaseResearchVariousofDesignsBriefDescriptionABABDesignsTypically,interventioneffectsofaninterventionareevaluatedbyalternatingthebaselinecondition(Aphase),whennointerventionisineffect,withtheinterventioncondition(Bphase).TheAandBphasesarerepeatedtocompletethefourphases.Multiple-BaselineDesignsDataarecollectedseparatelyonthreeormorebehaviorsorthebehaviorsofthreeormoreindividuals,orbehaviorinthreeormoresituations.(Aminimaloftwoispossiblebutinfrequentlyused.)Theeffectsoftheinterventionaredemonstratedbyintroducing(applying)theinterventiontodifferentbaselinesatdifferentpointsintime.MultielementDesignsThedesignbeginswithabaselinephased,followedbyaphaseinwhichtwoormoreinterventionsareevaluatedbutinthesamephase.Theseparateinterventionsareassociatedorconsistentlypairedwithdistinctstimulusconditions.MultipleTreatmentDesignsThedesignbeginswithabaselinephase,followedbyaphaseinwhichtwoormoreinterventionsareevaluatedinthesamephase.Theinterventionsarenotconsistentlypairedwithspecificstimulusconditions.Rather,theyarebalanced(systematicallyvaried)acrossthoseperiodsorconditions.Therearevariations(alternatingtreatmentdesign,randomizationdesign)inwhothetreatmentsarebalancedandalternatedintheinterventionphase.Changing-CriterionDesignsThedesignbeginswithabaselinephase,followedbyaninterventionphaseinwhichaspecificcriterionissetforperformance(e.g.,toearnareinforcer).Overthecourseoftheinterventionphasethecriterionisalteredtorequirehigher(improved)levelsofperformance.Eachcriterionlevelisinplaceforabriefperiodtoallowperformancetomeetthatlevelbeforeshiftingtothenext.Note.Eachdesignisafamily(group,set)ofdesignswithmultiplevariations.Ineachdesign,thelogicofpredictingper-formanceandtestingpredictionisachievedinawaythatdrawsonthecharacteristics(continuousassessment,multiplephases)andinwhichtheeffectoftheexperimentalmanipulationorinterventionisreplicatedinsomeway.Intextsandarticlesonsingle-casedesigns,thereareinconsistenciesintheuseoftermsbywhichtorefertovariousdesigns,especiallythemultitreatmentandmultielementdesignsandtheirvariations(seeKazdin,2021).AlanE.Kazdin58
theimmediatefutureifnointerventionwereimplemented.Thedescriptionofbehaviorbeforetheinterventionisobviouslynecessarytoconveythenatureandscopeoftheprob-lem.Fromthestandpointofthedesign,thecrucialfeatureofbaselineisthepredictionofbehaviorinthefuture.Astablerateofbehav-iorisneededtoprojectwhatbehaviorwouldprobablybelikeintheimmediatefuture.Figure1showshowtheinformationabouttherateofbehaviorisused.Duringtheinitialbaseline(A)phase,thelevelofbehaviorisassessed(solidline),andthislineisprojectedtopredictthelevelofbehaviorintothefuture(dashedline).Whenaprojectioncanbemadewithsomedegreeofconfidence,theinterven-tion(B)phaseisimplemented.Theinterventionphasehassimilarpurposestothebaselinephase,namely,todescribecur-rentperformanceandtopredictperformanceinthefutureifconditionswereunchanged.However,thereisanaddedpurposeoftheinterventionphase.Inthebaselinephase,apredictionwasmadeaboutfutureperfor-mance.Intheinterventionphase,theinvesti-gatorcantestwhetherperformanceduringtheinterventionphase(phaseB,solidline)departsfromtheprojectedlevelofbaseline(phaseB,dashedline).Ineffect,baselineobservationswereusedtomakeapredictionaboutperformance.Duringthefirstinterven-tion(B)phase,datacantestthatprediction.Dothedataduringtheinterventionphasedepartfromtheprojectedlevelofbaseline?Iftheanswerisyes,thisshowsthatthereisachangeinperformance.InFigure1,perfor-mancechangedduringthefirstinterventionphase.Atthispointinthedesign,itisnotentirelyclearthattheinterventionwasrespon-sibleforthechange.Otherfactors,oftenreferredtoasthreatstoexperimentalvalidity(e.g.,coincidentalhistoricaleventsinthepar-ticipant’slife,maturationalprocesses),mightbeproposedtoaccountforchangeandcan-notbeconvincinglyruledout(Kazdin,2017).Generally,justthefirsttwo(AB)phasesmaynotmakeaveryplausiblecasethattheinter-ventionratherthansomeotherinfluenceorartifactledtochange.WeneedatleastthesecondAphase(tohaveABA)tocarryoutthethreefunctionsIhavenoted:describe,predict,andtesttheprediction.Inthethirdphase(thesecondAofABA),theinterventionofteniswithdrawn,andtheFigure1HypotheticalDataforanABABDesignNote.Thesolidlinesineachphasereflecttheactualdata.Thedashedlinesindicatetheprojectedorpredictedlevelofperformancefromthepreviousphase.Pleaseseethetextforhowthedataineachphaseareusedtodescribe,predict,andtesttheprediction(s).59Single-CaseExperimentalDesigns
conditionsofbaselinearerestored.Thissec-ondAphasehasthreepurposes,asIjustmen-tioned.Thetwopurposescommontotheotherphasesareincluded,namely,todescribecurrentperformanceandtopredictwhatper-formancewouldbelikeinthefutureifthisphasewerecontinued.Athirdpurposeislikethatoftheinterventionphase,namely,totestthepredictionfromapriorphase.Letusbreakthisdownabit.Onepurposeoftheinterven-tionphasewastomakeapredictionofwhatperformancewouldbelikeinthefutureiftheconditionsremainunchanged(seedashedline,secondAphase).ThesecondAphaseteststoseewhetherthislevelofperformanceinfactoccurred.BycomparingthesolidanddashedlinesinthesecondAphase,itisclearthatthepredictedandobtainedlevelsofper-formancediffer.Thus,thechangethatoccurssuggeststhatsomethingalteredperformancefromitsprojectedcourse.ThereisonefinalanduniquepurposeofthesecondAphase.ThefirstAphasemadeapredictionofwhatperformancewouldbelikeinthefuture(thedashedlineinthefirstBphase).Thiswasthefirstpredictioninthedesign,andlikeanyprediction,itmaybeincor-rect.ThesecondAphaserestoresthecondi-tionsofbaselineandcantestthefirstprediction.Ifbehaviorhadcontinuedwithoutanintervention,wouldithavecontinuedatthesamelevelastheoriginalbaselineorwouldithavechangedmarkedly?ThesecondAphaseexamineswhetherperformancewouldhavebeenatornearthelevelpredictedoriginally.AcomparisonofthesolidlineofthesecondAphasewiththedashedlineofthefirstBphase,inFigure1,showsthatthelines(ofthetwoAphases)reallyarenodifferent.Thus,perfor-mancepredictedbytheoriginalbaselinephasewasgenerallyaccurate.Performancewouldhaveremainedatthislevelwithouttheinter-vention.InthefinalphaseoftheABABdesign,theinterventionisreinstatedagain.Thissec-ondBphaseservesthesamepurposesasthepreviousphase,namelytodescribeperfor-mance,totestwhetherperformancedepartsfromtheprojectedlevelofthepreviousphase,andtotestwhetherperformanceisthesame(replicated)aspredictedfromthepreviousinterventionphase.Inshort,thelogicoftheABABdesignanditsvariationsconsistsofmakingandtestingpredic-tionsaboutperformanceunderdifferentconditionsandreplicatingtheeffectswiththedifferentphasechanges.Essentially,dataintheseparatephasesprovideinformationaboutpre-sentperformance,predicttheprobableleveloffutureperformance,andtesttheextenttowhichpredictionsofperformancefromprevi-ousphaseswereaccurate.Byrepeatedlyalteringexperimentalconditionsinthedesign,therearemultipleopportunitiestocomparephasesandtotestwhetherperformanceisalteredbytheinterventionandinthiswayestablishexperi-mentalcontrol.Ifbehaviorchangeswhentheinterventionisintroduced,revertstoornearbaselinelevelsaftertheinterventioniswith-drawn,andagainimproveswhentreatmentisreinstated,thenthepatternofresultssuggestsratherstronglythattheinterventionwasrespon-sibleforchange.Extraneousinfluencesrarelyareparsimoniousasexplanationsofthepatternofthedataacrossphases.Usually,themostplausibleexplanationisthattheinterventionanditswithdrawalaccountedforchanges.Ofnecessity,thisisabriefversionofwhatthedesignaccomplishes.ThevariationsofABABdesignsandtheotherdesignsinTable1includemultiplewaysinwhichbehav-iorsarestudiedacrossandwithinphases.However,ineachcasethelogicissimilar,namely,describing,predicting,andtestingpredictions.Replicationoftheinterventioneffectisincludedinsomewayandtheoverallpatternacrossthedifferentphasescanmakecleariffunctionalcontrolhasbeenachieved.EvolutionandChangeAsthedesignshaveexpandedandevolved,sotoohavetheircharacteristicsincludingthosethatmightberegardedascorefeatures.Ihigh-lightseveralnotonlytoconveythesechangesbutalsotounderscorehowtheoptionshaveincreasedtheopportunitiestoextendthedesignsinscience,practice,andsociety.DecisionMakingandSteadyStatesSingle-casedesignsusuallyincludeseparatephases(e.g.,baseline,intervention).Decisionsaboutwhentochangephasesarebasedonachievingasteadystateinagivenphase,inlargeparttoaddressthedescribe,predict,andtestfeaturesInotedpreviously.Steadystateusuallytranslatestolittleornoslope(trend)inthedirectionofimprovementandrelativelylittlevariabilityintheobservedbehavior.AlanE.Kazdin60
Steadystatescontinuetobetheruleasabasisforchangingphases,butthishasbeenexpandedwithotheroptions.Aprimeexampleistheadditionofrandomi-zationtosingle-casedesigns.Theterm“ran-domization”insingle-casedesignshasbeenusedinatleastthreeoverlappingways:asapro-cedurethatcanbeincorporatedintodifferentdesigns,asatypeofdesign,andasasetofstatis-ticaltests.Asaprocedure,randomizationcanincludedecidingrandomly(throughanumberstableorgenerator)whentochangephases(e.g.,ABAB,changingcriteriondesigns),whoreceivestheinterventionandinwhatorder(e.g.,multiple-baselinedesign),orwhatinter-ventionisintroducedonagivendayorperiod(e.g.,multielementdesigns).Asadesign,ran-domizationisavariationofamultielementandmultitreatmentdesigninwhichthevariousinterventionsareimplementedforthesamesubject.Whichinterventiontointroduceatagivenpointisdeterminedrandomly.Asastatis-ticalprocedure,therearearangeofstatisticalteststhatcanbeusedonceanassumptionismetthatkeyprocedureswereimplementedran-domly(e.g.,Craig&Fisher,2019;Heyvaert&Ohghena,2014;Levinetal.,2019;Tanious&Onghena,2019).Inrelationtodecisionmakingwithinasingle-casestudy,randomizationisintendedtostrengthentheinferencesbyassuringthatanydecisionmakingisnotbiasedinsomewaybyhumanjudgment.Thebenefitistheimplausi-bilityofanybiasinselectingwhentomakeachange.Thecostisthatitmaybethatthesteadystateisnotevident.Thatis,itmighthavebeenprudentfortheinvestigatortobecon-trolledbythedataratherthanbyarandom-numbersalgorithm.Therearecompromiseoptionssuchasrandomizationafteratleastthreeormorenumberofdaysinwhichthebehaviorhasnotexceededonestandarddevia-tionaboveorbelowthemeanofthatphase.Otheroptionscanbecontrived,ifthedecisionpointisrandomafterwhateverthosenuancedrequirementsmaybe.Itisimportanttomen-tionthischangebecauseitisafundamentalmodificationofrelyingsolelyonsteadystates.Asecondtypeofchangehasbeenthedevel-opmentofaplethoraofconcreterecommenda-tions,guidelines,andrulestoaiddecisionmakingasthesedesignsareusedinappliedset-tings.Theseguidelinesandrecommendationsforquitespecificfacetsofthesingle-casedesignsrelatedtoassessment(e.g.,Artmanetal.,2012;Fiske&Delmolino,2012),theusesofvariousdesigns(Coon&Rapp,2018),graph-ingthedata(Kubinaetal.,2017;Ledfordetal.,2019),evaluationoftreatmentintegrity(e.g.,Collier-Meeketal.,2017),criteriaforvisualinspection(e.g.,Lane&Gast,2014),andwhatstatisticalanalysestouseorbewaryof(e.g.,Manolov&Solanas,2018;Moeyaertetal.,2017;Parker&Brossart,2003;Shadish,2014b),amongothers.Atthispoint,thereisnosinglesetofguidelinesorrecom-mendationsforanyofthesefacetsthatisrou-tinelyused,widelysubscribedto,orembracedasamatterofpublicationpracticeorpolicy.Also,manyofthesespecificguidelinesareabitmore“how-to”or“what-to-do”.Guidelinescanbeusefulbutarenosubstituteforunderstand-ingwhatwearetryingtoaccomplish.ItishereagainwhenwebowtoTacticsforprovidingusthefoundationsandforkeepingthefocusontheunderlyinggoalsandoptionsforachievingthem.SearchforMarkedEffectsInsingle-casedesigns,visualinspectionhasbeenthedominantmethodofevaluatinginter-ventions,basedongraphicaldisplaysofthedata.2Inbasiclaboratoryresearch,visualinspectionhasdramaticprecedentsforprovid-ingverycleareffects.Forexample,researchonreinforcementschedulesallowsustosee2Thisdiscussionfocusesondrawinginferencesaboutthereliabilityofaninterventioneffectassingle-casedesignsareusedinappliedbehavioranalysis.Inthiscon-text,visualinspection,asopposedtostatisticalevaluation,dominatesasthemethodofdataevaluation.Incontrast,basicresearchinbehavioranalysishasreliedheavilyonquantitativeevaluationofthedata,drawingheavilyonmathematicalmodeling(e.g.,Mazur,2006).Spanningdecades,mathematicalformulationshavebeenusedtocharacterizebasicbehavioralandcognitiveprocesses,toderiveandtestpredictions,andtocomparetheutilityandfitofdifferentmodelswithrealandcomputer-simulateddata.Manybasicprocesses(e.g.,punishment,reinforce-mentschedules,delayofreinforcementgradient,decisionmakingandchoice,behavioraleconomicanalyses)havebeenelaborated(e.g.,Caron,2019;Gilroyetal.,2017;Killeen,2011).Moreover,mathmodelstestedinlabora-torysettingshavebeenextendedtoappliedissuestopre-dictsuchbehaviorsassexualactivity,druguseandaddiction,andclassroombehavior.ResearchthatreflectsabroadrangeoftopicstowhichmathmodelshavebeenevaluatedcanbefoundintheJournaloftheExperimentalAnalysisofBehaviorandinspecialseriesmadeavailablebytheSocietyfortheQuantitativeAnalysesofBehavior(https://www.sqab.org/BehavioralProcesses.html).61Single-CaseExperimentalDesigns
changesinsteadystates,powerfulexperimentalcontrol,regularityofbehaviorandreplicationofeffects(e.g.,Ferster&Skinner,1957;Skinner,1938,1956).Withcontinuousassess-ment,establishingcontroloverperformanceandreplicatingthateffectwithinthedesignarefundamentaldeparturesfromtheuseofinfer-entialstatisticsnotjustindataevaluationbutinestablishingthecausalrelationsandelaborat-ingbehavioralprocesses(Perone,1999;Sidman,1960).Theuseofinferentialstatisticstoteaseoutordetectreliablepatternsagainstabackgroundofvariableperformanceinprinci-pleisaverydifferentfocus.Inappliedresearch,statisticalanalysisfordrawinginferenceswasobjectedtoonfurthergrounds.Visualinspectionwasintendedtoserveasafilterorscreeningdevicetoallowonlyclearandpotentinterventionstobeinter-pretedasproducingreliableeffects.Theinsen-sitivityofvisualinspectionfordetectingweakeffectswasviewedasanadvantagebecauseitencouragedinvestigatorstolookforpotentinterventionsortodevelopweakinterventionstothepointthatlargeeffectsareproduced(Baer,1977;Parsonson&Baer,1978,1992).Thegoalcouldbecontrastedwithastatisticalapproachtodataevaluationwhichmaydetectchanceeffects,weakeffects,andstatisticallysig-nificanteffectsbasedonmeansofgroups.Inappliedbehavioranalyses,thecriteriaforinvokingvisualinspectionhavebeenelabo-ratedovertheyears.SomecombinationofthecriteriahighlightedinTable2areused,althoughthereisconsiderablevariationintheiruse.Thegoalremains,namely,tofindstrongandunequivocaleffects.Yetwithaper-spectiveoftime,somechangesininterventionpriorities,extensionofsingle-casedesignstonovelareas,andcurrentemphasesinscience,theoriginalrationaleislessapplicabletoday.First,wehaveknownforsometimeviaana-lysesofpublishedsingle-casedatathatmanystudiesdonotproducestronginterventioneffects(e.g.,Glass,1997;Parkeretal.,2006).Hence,therationaleofusingvisualinspectionasafiltertodetectonlystrongeffectsisacon-ditionthatisnotuniformlymet.Second,thesearchformarkedeffectsisbasedontheviewthatunclearandmarginalTable2FrequentlyUsedCriteriaforInvokingVisualInspectioninAppliedResearchCriterionDefinedMeanChangesacrossPhasesTheaverage(thearithmeticmeanisonlyoneindexof“average”levelofperformance)withinaphaseandthenacomparisonacrossphases.Trend(slope)ChangesacrossPhasesAslopeisusuallyabest-fittingstraightlinetocharacterizethedatawithaphaseandthenalsotocompareacrossphases.ShiftinLevelatPointofPhaseChangeDiscontinuityorabruptchangeinperformanceonthedependentmeasureatthatpointinwhichachangeismadefromonephaseorconditiontoanother.LatencyoftheChangeLatencyreferstotheperiodbetweentheonsetorterminationofonecondition(e.g.,intervention,returntobaseline)andachangeinperformance.Themorecloselyintimethatthechangeoccursafterthephasehasbeenaltered,theclearertheinterventioneffect.OverlapofDataPointsTheproportionofdatapointsinonephase(e.g.,baseline)thatoverlapswithdatapointsfromanotherphase(e.g.,intervention).VariabilityDifferencesacrossPhasesTheextentoffluctuationinthedatawithinaphaseandtheextentofchangesacrossphases.ConsistencyintheOverallPatternTheextenttowhichtheoverallpatternofthedatashowsasimilarpattern(e.g.,expectedpatternacrossallphasesinanABABormultiplebaselinedesign).Replicationoftheeffectwithinthedesign.Note.Thecriteriaforvisualinspectionhavebeenelaboratedinothersources(e.g.,Bartonetal.,2018;Kazdin,2021;Kratochwilletal.,2010;Lane&Gast,2014;Manolov&Vannest,2019;Ninci,2019;Spriggsetal.,2018;Vannest&Ninci,2015).Theterminologysometimesvariesamongresources.Thereisnostandardapplicationofthesecriteriaanditisoftenthecasethatonlyasubsetisapplied.AlanE.Kazdin62
effectswillnotpassthethresholdvisualinspec-tionprovides.Yet,whenjudgesmakeerrorsinvokingvisualinspection,theyaremorelikelytosaythereisaneffectwhenthereisnotoneratherthanfailtodetectexistingeffects(e.g.,Kruegeretal.,2013;Mercer&Sterling,2012;Normand&Bailey,2006;Ximenesetal.,2009).Thatis,visualanalysisisnotalwaysconservative;judges“detect”nonex-istenteffects.Thisisquitetheoppositeofcon-sideringvisualinspectionasafilterandasameansofselectingonlymarkedeffects.Third,lookingformarkedeffectsconfusestheexperimentalandappliedcriteriaforevalu-atingdata(Kazdin,2021;Risley,1970).Theexperimentalcriterionfocusesonthereliabilityofthefindingandwhetherchangecanbeexplainedbyfluctuationsinperformance,pre-existingpatternsinthedata,andsourcesofbiasorchanceeffects.Forthiscriterion,thestrengthoftheeffectisnotcriticalperse.Insharpcontrast,theappliedcriterionreferstowhethertheimpactissolargeastomakeapal-pabledifference.Itispossiblethatrelativelysmallchanges(experimentalcriterion)havehugeappliedimpact(e.g.,improvesmaritalsat-isfactionenoughtotraversearoughperiod,improvesqualityoflife).Ontheotherside,itispossiblethataninterventioncanproduceareliableeffectandmeettheexperimentalcrite-rionwithouthavinggenuineimpactonthelifeofanindividualornotasufficientimpacttomaketheneededdifference.Perhapswehavereducedtheperson’scigarettesmokingfromtwopackstoonepackaday,orself-injuriousbehavior(e.g.,self-cutting,piercing)from10tofivetimesperday.Thegraphmightshowthatvisualcriteriaareeasilymet,andweconcludethechangewasreliable.Butclearlymoreisneededfromanappliedperspective.Overlookingweakbutreliableeffectscanhaveunfortunateconsequences.Thepossibilityexiststhatinterventionswhenfirstdevelopedmayhaveweakeffects.Itwouldbeunfortunateiftheseinterventionswereprematurelydiscardedbeforetheycouldbedevelopedfurtherandpossiblybecomeinterventionswithstrongeffects.Inaddi-tion,interventionswithweakeffectswhenscaledinthepopulationcanhavehugeimpact.Weakinterventionsoftenarelowcostandmoreeasilyscalabletoapopulationandcanmakeadiffer-enceinmentalandphysicalhealthandratesofmortality(Kazdin,2018b).Insofarasthestrin-gentcriteriaofvisualinspectiondiscouragethepursuitofinterventionsthatdonothavepotenteffects,itmaybeadetrimenttodevelopingeffec-tiveaswellasscalableinterventions.Finally,thereisarenewedconcerninsci-encemoregenerallyaboutpublicationbiasinthesenseofpromotingandpublishingonlythoseeffectsthatarepositive(e.g.,Bakker&Wicherts,2011;Simmonsetal.,2011).Selec-tivelyreportingresultsintroducesbiasesinindividualstudiesaswellasbodiesoflitera-ture.Forexample,single-casestudiesthatarepublishedhavelargereffectsthanthosethatarenotpublished(Sham&Smith,2014).Thissuggeststhepossibilityofawell-knownbiasinscienceingeneraltopublish“positive”find-ings(i.e.,resultsthat“cameout’)moreoftenthanthosefindingsthatarelessclearorwithso-called“negative”results.Inrelationtothepresentdiscussion,ifsingle-casedataevalua-tionmethodsweretofilteroutweakeffects,thisverymuchslantstheliterature.Wewanttolearnaboutinterventioneffects,whetherstrong,weak,positiveornegativefromwell-designedstudies.Withholdingweakornonex-istenteffectsalsomeansthatmanyresearchersmightpursuethesameinterventionsorvaria-tionsnotknowingthattheseinterventionshaveconsistentlyproducedlittleeffectsbasedonseveralcompletedbutunavailableandunpublishedstudies.Fromabroadersciencebase,theoriginalrationaleforusingvisualinspectionasastringentfilterislessinkeep-ingwithcontemporaryconcernsaboutthescopeofbiasevidentinreportedstudies.Thisisnotanargumentagainsttheuseofvisualinspectionbutratheradeparturefromafacetoftherationaleforitsuseinappliedresearch.StatisticalAnalysesoftheDataTheuseofinferentialstatisticsasameansfordecidingtheeffectsofanexperimentalmanipu-lationorinterventioninsingle-casedesignshasbeenobjectedtoinprincipleandinpractice(e.g.,Hopkinsetal.,1998;Parsonson&Baer,1978;Sidman,1960;Skinner,1956).Thegoalistounderstandvariabilityinperformanceandestablishexperimentalcontrol.Comparingmeansamonggroupsthatreceivedifferentcon-ditionsisnotadirectwaytoexamineperfor-manceofindividualsorbehavioralprocesses.Despitetheobjections,statisticalteststoevaluateinterventioneffectshaveoftenbeenusedinbehavioranalysisandforsometime(Baron,1999).Theanalysesgowellbeyond63Single-CaseExperimentalDesigns
comparinggroupmeansanddrawonavarietyofwaystounderstandthedata(e.g.,generalizedlinearmodeling,multilevelmodeling,andBayes-iananalysis)(Young,2018).3Evenso,therehavebeenchangesintheuseanddevelopmentofsta-tisticaltestsinappliedresearch.Ihighlightthereasons,althougheachisaweightytopicinitsownright(seeKazdin,2021).First,invokingvisualinspectionisnotinvari-ablystraightforward.Often,naïveandwell-experiencedresearchersinbehavioranalysisdonotreliablyinvokevisualinspectioncriteriaorselectivelyinvokesomecriteriafortheirconclusionsbutignoreothers(e.g.,Dilleretal.,2016;Nincietal.,2015;Normand&Bailey,2006;Wolfeetal.,2016).Inaddition,thedifferentcriteria,suchasthoseprovidedinTable2,donotalwaysleadtothesamecon-clusion.Somecriteriawouldleadtothecon-clusionthatthereisaneffect,butotherswouldnot.Second,weknowthattherecanbetrendsinsingle-casedatathatareinvisibletothenakedeye.Thatis,therearespecialcharacter-isticsofsingle-casedatathatcannotbedetectedvisuallyandthereforecannotbecon-sideredwheninvokingvisualinspection.Briefly,twoofthesearethefactthatthedatafromoneoccasiontothenextcanbecorre-latedinvaryingdegrees.Thisphenomenon,referredtoasserialdependence(autocorrela-tion),cannotbeeasily“seen”andrequiressta-tisticalevaluationtodetect(e.g.,Howardetal.,2015;Ximenesetal.,2009).However,datathatarecorrelatedinthiswayareassoci-atedwithevenlessagreementwheninvokingvisualinspection.Relatedly,trendsinthedata(e.g.,baseline)arenotallstraightlinesthatareeasilydetected.Sometrendscanbeidenti-fiedonlythroughstatisticalevaluationofthedata.Hiddentrendscanobscureormisleadwhentryingtodetermineiftheinterventionproducedareliablechange.Third,asstatisticshavebeeningreateruse,therangeofstatisticaltestsforsingle-casedesignshasexpandedgreatly(Kazdin,2021).Scoresoftestsandmanyreviewsareavailableevaluatingthesamedatawithvarioustests.Amongtheconclusionsthathaveemergedisoneworthunderscoringhere,namely,differentstatisticaltests,whenappliedtothesamesingle-casedata,oftenleadtoquitedifferentconclu-sions.Thesediscrepanciesaredueinparttoemphasisofthetestsondifferentaspectsofthedata(e.g.,means,trends,variability,lastdatapointinthephase),characteristicsofthedata(e.g.,toofewobservationstoestimateparame-tersforthestatistics,lowstatisticalpower),andassumptionsaboutthedata.Thisisespeciallyinterestinginpartbecauseoneimpetusforturningfromvisualinspectiontostatisticaltestswastoovercometheambiguitiesthatvisualinspectionoftenraises.Statisticaltestsaddtheirownsetsofchallenges,althoughslightlydiffer-entfromthoseofvisualinspection.Afinalreasonfortheincreaseduseofstatis-ticspertainstotheadoptionofsingle-casedesignswelloutsidethetraditionofbehavioranalyses(e.g.,Kravitzetal.,2014;Schork&Goetz,2017).Forexample,inmedicinethedesignshaveenjoyedincreaseduse.Yet,inmedicalandbiologicalresearchstatisticaleval-uationinthequantitativeresearchtraditionistheruleordefaultmethodofdataanalysisratherthantheexception,sosingle-casedesignsarelesswellconnectedtheretovisualinspection.ForthereasonsIhavenoted,useofstatisticalanalysesofsingle-casedatahasincreased.Whensingle-casedesignswereemerging,statis-ticalsignificancetestingwastheenemyandconsideredtoprovidedataandcomparisons(ofgroupmeans)oflittleuse.Onecanstillfindthatviewinthebehavioranalysislitera-ture.Yet,ameasuredviewisthatbothvisualinspectionandstatisticalsignificancetestinghavesubjectivecomponents,havepotentialsourcesofbiasastheyareinvokedandreported,andcangivecomplementaryaswellasdifferentpicturesofthedata.Andforsingle-caseresearch,wedonotwishtolosesightofexperimentalcontrolofthebehaviorofindivid-uals.Statisticaltestsforthesingle-casedesignsvaryinwhichthispriorityismaintained.Thereisonechangeinstatisticalevaluationinsingle-casedesignsthatdeservesspecialcomment,namely,theincreaseduseofmeta-analysis.Meta-analysisisawayofreviewingandintegratingempiricalstudiesonagiventopicbytranslatingtheresultsofdiversestud-ies(e.g.,changesondifferentoutcome3Severalexcellentseriesofarticleshavefocusedondataevaluationandstatisticalanalysesofsingle-casedatainappliedresearch(e.g.,Burns,2012;Evansetal.,2014;Machalicek&Horner,2018;Maggin&Chafouleas,2013;Magginetal.,2017;Shadish,2014a;TheBehaviorAnalyst,1999;Young,2019).AlanE.Kazdin64
measures,differencesamonggroups)toacommonmetric(effectsize).Meta-analysisoffersthreemajoradvantages.First,itallowsthereviewer(meta-analyst)todrawconclu-sionsaboutthefindingsinagivenareaandtoquantifythestrengthofeffects.Second,onecanaskquestionsaboutthedatafrommanystudiescombinedthatwerenotaddressedinanyoneoftheindividualstudiesincludedinthemeta-analysis.Thisisachievedbygroupingstudiesorfacetsofstudiesandrelating(cod-ing)thesegroupingvariablestoeffectsize.Finally,andderivedfromthepriorpoint,meta-analysescanidentifydeficitsandunder-investigatedareasofresearch.Indrawingcon-clusions,itbecomesclearthattoofewstudiesareavailabletoaddressaspecificquestion(e.g.,doseoftheintervention,responsivenessbysexorethnicitywhenthatmaybecritical).Meta-analysesarepervasiveinscienceinmanydisciplines.Ininterventionresearch,theyarethecommonwaytoreviewandintegrateanempiricalliterature.Theuseofmeta-analysisforsingle-caseexperimentsisnotnew(e.g.,Burns,2012;Scruggsetal.,1988;Whiteetal.,1989).However,theapproachhasbeenslowtodevelopinrelationtosingle-casedesignsandforunderstandablereasons.First,inprinciple,meta-analysisisanevenfurtherstepawayfromthedataofindividualsandtheprocessofchangethanusinginferentialstatis-tics.Thiswouldbephilosophicallyamoveinthe“wrong”(unproductive)direction.Thatis,ourcomplaintwithmeta-analysiswouldnotonlyreflectconcernsaboutgroupmeans(ofindividualstudies)butessentiallymeansofmeans(i.e.,meansofeffectsizes).Second,therearepracticalchallenges.Theseincludeconvertingvisualinspectiondatatoametric(effectsize)thatcanbeusedtocombinediffer-entstudies,makingkeydecisionsaboutwhatfacetofthedata(e.g.,changesinmeans,trends)toevaluate,decidinghowtoevaluatethosedata(e.g.,combiningallAphasesandallBphasesornot),andmanagingthechallengesofcomputinganeffectsizeindifferentdesigns(e.g.,whattocomputefromamultiple-baselineorchanging-criteriondesign?).Evenso,therehavebeenseveralmeta-analysesofsingle-casedesigns,illustrationsofthedifferentwayssuchanalysescanbeconducted,andaidsintheformofsoftwareandweb-basedapplications(e.g.,Beretvas&Chung,2008;Declercqetal.,2020;Magginetal.,2011;2017).Severaloptionsremainforhowtocomputeeffectsizesfromsingle-casedataaswellashowtocarryouttheanalysesonceeffectsizesareidentified(e.g.,Maggin&Chafouleas,2013;Parkeretal.,2011;Shadish,2014a).Andnotalltheoptionsleadtothesameconclusions.Novelvariationscontinuetoemerge(e.g.,Tarlow&Brossart,2018;Ugilleetal.,2012).Theincreasedattentiontometa-analysisinsingle-caseresearchiscriticallyimportant.Deci-sionsmadeaboutwhattreatmentsorinterven-tionpracticesareevidence-basedareusuallybasedonmeta-analyticintegrationofmanystudies(Kazdin,2018b).Single-casedesignsareroutinelyomittedfromlargereviewsofinter-ventions.Apartfromthescholarlypointofcom-prehensivelyincludingallpertinentresearch,therearepracticalissuesaswell.Conclusionsreachedfromsuchreviewshaveimplicationsforwhatinterventionsareregardedaseffective(bygovernmentandnongovernmentagencies)andlikelytobereimbursed.Inaddition,meta-analysisispartofthecommonlanguageorEsperantoofscienceandifonedoesnotspeakthelanguageorisnotveryfluent,oneisabitofaforeignerandanoutsider.Itisintheinterestofanymethodologicalapproach(e.g.,single-casedesigns,qualitativeresearch,mixed-methodsresearch)aswellasanysubstantivetopictodrawonmeta-analysisandtoberecognized.FormalGuidelinesforSingle-CaseResearchAnothersignificantdevelopmentinsingle-casedesignshasbeenthedevelopmentoffor-malguidelinesforconducting,reporting,andevaluatingresearch.Researchhasalonghistoryofethicalstandardsandguidelinestoprotecttherightsofparticipantsandinmaintainingtheintegrityofscience(e.g.,effortstofacilitatereplication,managingconflictofinterest,elimi-natingtheselectivereportingofresults).Addtothattheincreasedconcerninrecentyearsofthereproducibilityofresearchfindingsandensuringthatstudiesarecarefullydescribedsothattheycanbereplicated(Camereretal.,2018;Francis,2012;Hantula,2019;OpenScienceCollaboration,2015).Mostoftheconcernsaboutconductingandreportingresearchhaveemergedfromgroupstudiesinthequantitativetradition.Thecon-cernsencompassresearchwithincompleteinformation,checkeredpractices,biasesinandselectivereportingofresults,andrather65Single-CaseExperimentalDesigns
starkanddisastrousinstancesoffraud(e.g.,vaccinationsastheputativecauseofautismspectrumdisorder;seeKazdin,2017).Alltheissuesareapplicabletoresearch,single-caseandothertraditions.Forexample,inappliedsingle-caseresearch,basicinformationoftenisomittedsuchasexactlywhotheparticipantswere(e.g.,subjectanddemographicvariables),howtheywererecruited,whoadministeredtheintervention,theextentoftheirtraining,whethertheinteg-rityorexecutionoftheinterventionwasassessed,andkeydetailsaboutthedataevalua-tion(e.g.,Frylingetal.,2012;Kubinaetal.,2017;Vannestetal.,2018;Tate,Perdices,Rosenkoetter,McDonaldetal.,2016).Selectivereportinghasbeenidentifiedasaproblemaswell.Forexample,asurveyofsingle-caseresearchersfoundthatbetween4%–15%omit(i.e.,delete)oneortwocasesfromtheirdataiftheeffectsizeissmallandthensubmitthearti-clewithoutthesecasesforpublication(Shadishetal.,2016).Noneofthecharacteristicslead-ingtoguidelinesareespeciallycharacteristicormorecharacteristicofsingle-casedesigns.Avarietyofformalguidelineshavebeeninuseandgoverntheconductandreportingofstudies,primarilyforgroupstudies.Inthecon-textofinterventionresearch,originallyinmedi-cine,butnowinrelationtodiverseareas,theConsolidatedStandardsofReportingTrials(CONSORT;Schultzetal.,2010)arethemostfamiliar.Theseguidelinesconsistofachecklistofessentialitemsthatoughttobeincludedinanyrandomizedcontrolledtrialoftreatment.Beginningintheearly1990s,effortsbegantomakerecommendationsforreportingofstudiesandfromthattheCONSORTguidelinesemerged.Theyhavebeenadoptedbyhundredsofprofessionaljournalsencompassingmanydisci-plinesandcountries(seewww.consort-statement.org/).TheCONSORTguidelineshavebeendevisedprimarilyforclinicaltrialsinmedicalresearchbuthaveextendedwellbeyondthatandareroutinelyusedinclinicaltrialsofpsychosocialinterventions.Inmosttreatmenttrialsofpsycho-logicalinterventions,journalsrequireadherencetotheseguidelinesorvariationsthatareclosetothem.Therearenowmanyotherguidelinesthatareinthesameveinandextendwellbeyondinterventionresearch(e.g.,Aalbersbergetal.,2018;Appelbaumetal.,2018).ClinicalTrials.Gov(https://clinicaltrials.gov/)providesanothermodeltoguideresearch.Thisconsistsofpreregistrationofastudythatrequiresauthorstoconveytheirplanforconductingresearchandanalyzingthedata.Theinvestigatormaymakechangesduringastudyasissuesemerge,sotheinvesti-gator’splanisnotnecessarilysetinstone.Changesrequiremakingcleartherationaleandwhyadeparturewaswarranted.Overall,preregistrationallowsfortherangeofpartici-pantsinresearch(investigators,peerreviewers,journaleditors,fundingagencies,policymakers,thepublicatlarge)todeter-minewhethertheresearch,whencompleted,hasdeviatedfromthepreregisteredplan,andifso,inwhatways.Preregistrationofresearchisnowcommonacrossmanyfundingagenciesandjournals(Noseketal.,2018).ClinicalTrials.Govisalargedatabasethatincludesprivatelyandpubliclyfundedstudiesofinvestigationsthroughouttheworld.Indeed,thisisthelargestclinicaltrialsdatabaseandasofthiswritingover350,000studiesareregisteredandincludestudiesfromall50statesintheUnitedStatesand216countries.Whenclinicaltrialscompareinterventionsoraninterventionagainstacontrolgroup,fundingagencies(e.g.,NationalInstitutesofHealth),internationalorganizations(e.g.,WorldHealthOrganization),andover5,500journals(theInternationalCommitteeofMedicalJournalEditors,http://icmje.org/journals-following-the-icmje-recommendations/)requireinvestigatorstoregistertheirclinicaltri-alsinadvanceofthestudy.Althoughvariousguidelineshavemadeprovi-sionsforreportingsingle-caseresearch,theyhavenotbeenadoptedwidely.Asarule,suchguidelinesdonotrecognizethemultiplefea-turesofthedesignsincludingfundamentalissuessuchasthefactthatdecisions(e.g.,changingphases,addinginterventions)oftenareroutinelymadeduringthestudyratherthanfixedinadvance.Fluidityduringthedesignwouldstillpermitpreregistrationbecausehowthedecisionsaremadeandguide-linesthatwerefollowedcouldbereadilyspeci-fied.Yet,theguidelineswerenotdevelopedwiththespecialfeaturesofthesedesignsinmind.Indeed,whentheguidelinesaddresssingle-casedesigns,thefocususuallyisoutsideofthemostcommonuses.Forexample,aspe-cificextensionoftheCONSORTguidelines,referredtoastheCONSORTExtensionforN-of-1Trials,emphasizesmedicaltrials(e.g.,focusonhealth,useofpharmacologicalAlanE.Kazdin66
interventions,washout[baselinelike]periodswithinthedesigns;e.g.,Shamseeretal.,2015;Vohraetal.,2015).Theextensionmetaclearneedinmedicalresearchbutwasnotintendedtoaddressthescopeofsingle-caseresearchdesignsandtheirmorecommonuseinbehav-ioranalysis.Severalguidelinesandstandardsforcon-ductingandreportingsingle-casedesignshavebeenpublished(e.g.,Horneretal.,2005;Kratochwilletal.,2010;Woleryetal.,2011)andtheseincludecallsforpreregistration(e.g.,Johnson&Cook,2019).Someofthesefocusonresearchinmedicine,health-care,andrelatedareas(e.g.,nutrition;Kravitzetal.,2014;Schork&Goetz,2017);othersfocusoncriteriaforevaluatingalreadycompletedstudies(e.g.,WhatWorksClearinghouse;2020).Nosin-glesetofguidelineshasemergedorhasbeenwidelyadoptedforsingle-caseresearch.ArelativelynewdevelopmentisTheSingle-CaseReportingGuidelineinBehavioralInter-ventions(SCRIBE)andwasinfluencedbytheCONSORTextensionofseparateguidelinesforsingle-casedesigns(Tate,Perdices,Rose-nkoetter,McDonaldetal.,2016;Tate,Per-dices,Rosenkoetter,Shadishetal.,2016).Theprocessofdevelopingtheguidelines(e.g.,multipleiterationswithinputofmulti-pleexperts,surveys,worldwideinput)andtheformat(e.g.,checklistofwhatinvestigatorsaretoinclude)followstheCONSORTproce-dures.Theguidelinesfocusonfoursingle-casedesignsandinadditioncoveravarietyofissuesrelatedtoassessmentandinter-observeragreement,evaluationoftreatmentintegrity,datapresentationandanalysis,potentialconflictofinterest,andothers.Apartfromtheobvioussignificanceofimprovingtheconsistencyandqualityofreporting,thepresenceoftheseguidelinesincreasesthewaysinwhichsingle-casedesignsretaintheiridentitybutareinkeep-ingwithstandardsinthequantitativetradi-tionwheresuchguidelineshavebeeninplaceforsometime.OtherguidelinesIhavementionedforgroupresearch(CONSORT,ClinicalTrials.Gov)havebeenadoptedinter-nationallyandarewellknownbyresearchersandinvokedbyjournals.ItistooearlytoknowiftheSCRIBEguidelineswillenjoythatstatus,buttheirdevelopmentisaqualitativeleapforwardintheevolutionofsingle-casedesigns.GeneralCommentsIhaveaddressedsomeofthedevelopmentsintheevolutionofsingle-casedesigns.SincetheappearanceofTacticsin1960,therehasbeenaproliferationofdesigns,proceduresfortheirimplementation,andmethodsofdataevaluation.Seeminglycoreissues(e.g.,howtomakedecisionswithinthedesign,howtoeval-uatethedata)haveexpanded.Whatisespeciallynoteworthyisthattheevo-lutionincludeschangesthatwillhelpintegratesingle-caseresearchintothelargerscientificenterprise.Theuseofinferentialstatisticaltests,computationofeffectsizes,integrationoffindingsfromsingle-casedesignsintometa-analyses,andformalguidelinesforthecon-duct,evaluation,andreportingofresearcharepivotalchanges.Perhapsthemostsalientpointisthatthetraditionsofsingle-caseresearchandsciencemoregenerallyprofitwhenallempiricalfindingscanbeintegrated(e.g.,viameta-analyses).Findingsfromsingle-casedesignswarrantinclusioninpartbecauseofthestrengthanddiversityofinterventionsthathavebeendevelopedbasedonsuchdesigns.Assingle-casedesignsevolve,theirintegrationandgreateracceptancearelikelytoincrease.SpecialStrengthsofSingleCaseDesignsSingle-caseexperimentaldesignscandrawstrongcausalinferences.Thishasproventobethecaseinexperimentalandappliedareasofresearch.Apartfromthisobviouspoint,thedesignshavespecialstrengthsthatcouldmakethemenjoymuchgreateruse.UnacknowledgedNeedofEvaluationInvirtuallyeverysettingineverydaylife,thereareprogramsorinterventionsdesignedtohelp.Inschools(elementarythroughuni-versity),interventionsareaimedatimprovingacademicperformanceandincreasingpartici-pationinsomeactivities(e.g.,engaginginsportsandvolunteeractivities)andreducingtheoccurrenceofothers(e.g.,bullying,drugabuse,sexualharassment).Inhospitals,inter-ventionsareaimedatreducingmedicalerrors(e.g.,surgicalpractices)andincreasingsafetypractices(e.g.,handwashingamongstaff)toreducethespreadofillness.Inbusiness,inter-ventionsaredesignedtoimprovehealth(via67Single-CaseExperimentalDesigns
exerciseprograms),productivity,safety,andmorale.Ineverydaylife,newsoftware(apps)emergetoimproveeverydayactivities(e.g.,reminders,sleepscheduling,exercise),happinessandpositivewell-being,aswellasphysicalandmentalhealth.Otherinterven-tionsaredirectedtoreduceracialbiasesanddiscrimination,improvecompliancewiththelaws(e.g.,speedlimit,nottextingwhiledriv-ing),andsupportbroadsocialgoals(e.g.,notlittering,behavinginwaysthatpromoteasus-tainableenvironment).Doanyoftheseinterventionshaveimpact?Weknowwellthataninterventioncanhavethreeoutcomes:noeffect,anegativeeffectwherepeoplebecomeworse,andapositiveeffect.4Thereareplentyofexampleswherewellintentionedandplausibleinterventionprogramshavemadeindividualsworseandevenincludesuchseeminglyinnocuousinter-ventionsasmeditation(e.g.,Anderssonetal.,2019;Cebollaetal.,2017;Crawfordetal.,2016;Dodgeetal.,2006;Petrosinoetal.,2013).Theoverwhelmingmajorityofprogramsinusesimplyhavenoempiricalevaluation.Continuousorongoingassessmentfromsingle-casedesignscanprovideinformationastowhetherchangehasoccurredinanypro-gramandofcourseinwhatdirection.Inaddi-tion,thedesignscanevaluatethebasisofthatchangeinwaysthataremuchmorefeasiblethantheusualdesignsconsideredtobeessen-tialforprogramevaluation.Specifically,ran-domizedcontrolledtrialsareregardedasthe“goldstandard”forevaluatinginterventionsinvirtuallyallappliedareas(e.g.,education,counseling,clinicalpsychology,rehabilitation,medicine,andnursing).Typically,suchtrials(between-groupstudies)includerandomassignmentofparticipantstointerventionandcontrolconditions(e.g.,no-interventioncon-trolorsomeothercomparisongroup),pre-andpostinterventiontesting,multiplemea-sures,rigorouscontrolovertheadministrationoftheintervention,andholdingconstantorcontrollingfactorsthatmayinterferewithdrawingconclusions(Kazdin,2017).Yet,obstaclesforthefrequentorroutineuseofrandomizedtrialsareplentifulandencom-pass:feasibility(e.g.,usuallywecannotreassignindividualsinasettingrandomly),ethicalissues(e.g.,withholdinginterventionsforsomebutnotothers),methodologicalcon-straints(e.g.,usuallywedonothaveasuffi-cientlylargesampletoprovideastatisticallypowerfultest),andcost(e.g.,personnel,timetocompletetheevaluation).5Single-casedesignsprovideviablealterna-tives.RepeatedassessmentoverAandBphases(e.g.,ABdesign)canbecollectedacrossdifferentstudents,patients,hospitalunits,anddepartmentswithinanorganization.Ofcourse,iftheimplementationoftheinter-vention(B)isstaggeredsoeveryone,unit,sec-tion,orsettingdoesnotreceivetheinterventionatthesametime,thisbecomesamultiple-baselinedesign(concurrentornon-concurrent).Mostsettingshaveagroupalreadyinplacethatcanserveasthepartici-pant;thatgroupcanbeevaluatedasifitwereanindividual.Whatisplottedinthedataistheperformanceofthegroupasawhole(e.g.,percentageofworkersinthesettingwhoengageinsafetypractices,rateofhomeworkcompletionamongstudents,numberofhomesinacommunitythatuseenvironmen-tallyfriendlylighting).Wesometimesconsiderevaluationasalux-ury.Thatisinvariablytrueifmountingaran-domizedcontrolledtrialisrequired.Yetevaluationismuchmorereadilypossiblewithsingle-casedesignsandwithnosacrificeofrigorifthatispartofthegoaltoo.Giventhediverseoutcomesofprograms,twoofwhichare:notveryhelpful,andtheresourcesthatareusedandpossiblywasted(e.g.,time,funds,andpersonnel),therearebothethicalandpracticalreasonstoevaluatewhenpossible.Single-casedesignshavetheabilityandflexi-bilitytoevaluateabroadrangeofprogramsindiversesettings.4Theresultsofaninterventioncanhaveallthreeeffectsbutwithdifferentindividualsincludedinthegroup.Also,effectscanbeindegreesandmorenuanced.Forexample,theprogrammayleadtoimprovementsamongmostorevenallparticipants,butthechangesmighthavenoprac-ticalvalue.5Myownwork,andthatofmanyinvestigatorslikeme,hasconsistedofafewdecadesofrandomizedcontrolledtrialstodevelopevidence-basedtreatments.Eachstudytakes3-5yearstocomplete(notcountingfollow-up),isveryexpensivetomount(e.g.,staffing,patientrecruit-ment,subsidizingcostofdeliveringservices),andrequiresongoingfundingoverextendedperiods(seeKazdin,2018a).AlanE.Kazdin68
On-GoingFeedbackWhiletheInterventionisinEffectInsingle-caseresearch,thecontinuousfeed-backfromthedataandfluiddecision-makingwhiletheinterventionisinplayhavedistinctadvantages.Thedesignsallowforevaluationofwhethertheinterventionisachievingchangeandwhetherthechangeisatthelevelwedesirewhiletheinterventionisinplace.Therepeatedassessmentduringtheinterventionphasemakesthedesignsquiteuserfriendlytotheinvestigator(teacher,doctor,orotherper-sonresponsiblefortheintervention)andtheclient(personorgroupintendedtobenefit).Ifsomethingisnotworkingornotworkingsuffi-cientlywellornotworkingwithallthepartici-pants,theinvestigatorcanmakethechangeandcontinuetoevaluatewhetherchangecomesaboutrightawaywithoutwaitingtoseemediocreornoimpactatposttestassessment,aswouldcommonlybethecaseinarandom-izedcontrolledtrial.Thisisnottrivialandindeedunitesresearchandclinicalgoals.Asanillustration,inoneprojectthegoalwastotrainsixboysandgirls(6-7yearsofage)nottoplaywithhandguns(Miltenbergeretal.,2004).Arealbutdisabledhandgunwasused;assessmentswerecompletedathomeandatschoolwhilethechildwasleftalonewiththegun.Assessmentswerevideotapedandlaterscoredfortheextenttowhicheachchildengagedintheappropriatebehaviorsona0to3-pointscaleinwhich0=touchingthegunand3=nottouchingthegun,leavingtheroom,andtellinganadultaboutthegun.Inamultiple-baselinedesignacrosschildren,abehavioralskillstrainingprogramwasusedthatprovidedinstructions,modeling,rehearsal,andfeedback,allconductedinasimulatedtrainingsituationratherthanathomeoratschool.Ongoingassessmentrev-ealedthatafewsessionswereveryeffectiveinalteringthebehaviorofthreeofthesixchil-dren;theeffectsoftrainingcarriedovertohomeorschool.Forthethreewhodidnotrespond,anadditionalintervention(moreintensivepracticeandrehearsal,inschoolandinsitutraining)improvedtwoofthem.Athirdcondition(addinganincentive)wasrequiredforthefinalchildtoachievethegoal.Thisdemonstrationillustratesaveryspecialstrengthofsingle-casedesigns.Interventionswereimplementedandevaluated.Decisionsweremadebasedontheincomingdataandnewinterventionswereaddedtoachievethedesiredoutcome.Pre–postdatafromabetween-groupstudymighthaveshownthatthefirstintervention(behavioralskillspro-gram)worked(astatisticallysignificantdiffer-enceifcomparedtoano-interventioncontrolcondition).Yetweseethattheinterventionwouldhaveleftasignificantproportionofpeoplestranded,thatis,nochangeinthedesiredoutcome.Decisionmakingwhiledataareincomingisanenormousadvantageofsingle-casedesigns.6Apartfromtheinformationprovided,single-casedesignsallowforthegradualorsmall-scaleimplementationoftheinterven-tion.Withoneorafewcases,onecanimple-menttheinterventionandseeinapreliminarywaywhetherthisishavinganeffect.Thisallowstheinvestigatortomodifytheinterventiononasmallscaleifneededbeforeapplyingtheinterventiontotheentireclass,school,orotherlargerscalesetting.Ifthereisastronginterventioneffectinthesmall-scaleapplicationwithoneorafewpar-ticipants,thisdoesnotguaranteethattheeffectwillextendacrossallbaselines.Butthepointhereisthatfirststartingoutonamodestscale(e.g.,withoneortwoindividuals,class-rooms,orcommunities)helpstheinvestigatorpreviewtheimpactoftheinterventionaswellasmasterimplementationandsomeofthepracticalissuesthatmayrelatetoitseffective-ness.Morebroadly,sometimesinterventionsareconductedonastate,province,ornationalscale(e.g.,toreducecigarettesmoking,obe-sity,andteenpregnancyandtoimprovenutri-tionortoincreasephysicalactivity;https://www.cdc.gov/winnablebattles/).Suchpro-gramsofteninvolvemajorefforts,resources,andaplethoraofpracticalissuesasimplementedandcraftedtodifferentlocalconditions.“Rollingout”suchinterventionsinastaggeredfashionwithrepeatedbaseline6Groupdesignsoccasionallyallowforflexibilityanddecisionmakingduringtheinvestigationwithanarrange-mentreferredtoasAdaptiveDesigns(Pallmannetal.,2018;USFood&DrugAdministration,2019).Thesedesignsareinfrequentlyusedincomparisontothefixedpre–posttestrandomizedcontrolledtrial.Theirdelinea-tioningroupresearchisimportantinrelationtothisarti-clebecausetheyrecognizetheimportanceofflexibilityandefficiencyinmakingdecisionsinresponsetoincom-inginformation.Ofcourse,thisfeatureisoneofthestrengthsofsingle-casedesignsastheyareroutinelyused.69Single-CaseExperimentalDesigns
andinterventionphases(multiple-baselinedesigns)expandstherangeofopportunitiestoevaluateandtomakeimprovementsbeforeaninterventionisextendedwidely.Thewayofimplementinginterventionsinsingle-casedesignsandusingincomingdatatomakechangesasneededareveryspecialstrengthsofthedesigns.WeCareAboutIndividualsFormostquestionsthatguideoureverydaylife,wecareverymuchabouttheindividual.Itisinterestingtouspersonallyandintellectuallytoaskaboutthegroupdata.Forexample,wehearaboutanewtreatment(e.g.,forobesity,diabetes,cancer,bloodpressure,orhairloss).Doesitwork?Isachangeinthetreatmentasso-ciatedwithrealchangeorisitanotheroneofthosebait-and-switchtelevisionadsthatsay,“clinicalevidenceshows”?Butevenifthereisgenuineevidence,doesthatapplytomeoristheresomemoderatorthatmakesthatunlikely?Asindividuals,wedonotknowwherewestandinrelationtothelargegroupdata.Inotetheobvious,namely,thatweowesomanyadvancesinclinicalwork(e.g.,psychologicalandmedicalscience)tobetween-groupresearchmethodstohelpunderscorethepointaboutindividuals.Inourdailylivesitisaboutindividuals(single-cases)—ourselvesandourlovedonesandfri-ends(humanandnonhumananimals)andnotaboutgroupdata.Asanillustration,con-sidertherealscenarioofmyannualphysicalexam.Theexamcouldeasilyturnintoameth-odologicalbrawl.(Itakeheavymedicationandhavemyimpersonaltrainerwithmeinthewaitingroomtohelpmerestrainmyself.)Onceintheroomwithmyphysicianbuttowardtheendofmy9-minappointment(yes,Istretcheditoutwithacoupleofmade-upquestions),wehaveanamiableexchangethatisprettymuchlikethisgroupversussingle-casediscussion:MyPhysician:“Youprobablyoughttohavethatmedicaltestinaboutfiveyears,justtomakesureyoudon’thave…[reader—insertyourfavoriteseriousdisorder—myphysicianrotatesseveralvariationsofcancers,heartdis-ease,diabetes,andmaybedementia,Ikeepforgettingifhementionsthatone].”Me:“Thatsoundsserious,maybeIshouldhavethetestnow.”MyP:“Actuallythedata(hemeansgroupdata)showthatyouprobablydonotneedthatmedicaltestbecausetherateofthatproblemisprettylowformostpeopleyourage.”Me:“Justspeakinggenerally,isitpossiblethatIamoneofthecasesinthegroupthatgetsthatdiseaseatmyagenow?”MyP:“Oh,yesofcourse,butnotverylikely.”Me:“Iwouldreallylikethetest,becausewhathappenedtothebiggroupmightnotbewhathappenstome.Also,thetestdoesn’thurt(mymedicalrecordhasmyofficialdiag-nosisasMedicalCoward)andtheinformationcouldhelpmewithoneofmypersonalpriori-ties(stayingalive).”MyP:“Uh–youareprobablyfinewithoutit,butinfiveyearsitwouldbeprettyimportant.”Well,Iseethatmynextappointmentishere—Godwilling,let’scontinuethisdiscus-sionifyoumakeittoyournextannualphysical.”TheastutereaderwillnotehowrestrainedIwasinchallengingthegroupdata.Iheldbackon,“Werethefindingsreplicated,whatethnicandculturalgroupswereincluded,werethereanymoderators,werethefindingsbasedonone-ortwo-tailedstatisticaltests,whatpreciselywasthemiss(falsepositive)rate,thatis,notidentifyingpeoplemyagewhoseproblemwasnotdetectedandhappentodie?andsoon.Ontheotherside,myphysicianisconstrainedbythehealthservice.Heisnotfreetoauthorizeatestunlessthereissomespecialreasontodosoprimarilybecauseofcost,includingmoneyfortheserviceforwhichheworks,butalsopersonalcostintermsofstressorsideeffectstome.Sometimesamoderatorisknownandservesasthatspecialreasontojustifyatestsoonerthanusual.Forexample,ifthereisafamilyhistoryofthisorthatdisorder,thetestmaybeauthorizedsoonerormoreoftenbecausepeo-plewiththathistoryaremuchmorelikelytohavetheproblem.Flashesofhonestyholdmeback.Iwanttotellmydoctor,“Thefactisthatmyfamilyhistoryincludeseverything,sopleaseletusdoallthenoninvasivetestsearlierthantheoverallgroupdatasuggest.”Myfam-ilyhistorystatementcanbesupportedconcep-tuallyandempirically.Myfamilyhistory(andyourstoo),goesbacktothefirstparentintheworld(andonecouldpushforpriortohumans)anditislikelythatwiththealmostinfinitenumberofrelativesfromthentomeAlanE.Kazdin70
now,many,most,orallthediseasesoccurred.Toreturntothepoint:Irespectthegroupdata,someofmybestfriendsevencollectthestuff,butthesedatadonottellmewhetherIwillbeoneofthosecasesandapersonwhowouldhaveprofitedfromtakingthemedicaltestalittleearly.Mystoryconveysthepoint.Criticalquestions—lifeanddeathquestions—areoftenifnotalmostalways,aboutindividuals.Groupstudiescouldroutinelylookatindi-vidualsbutrarelydo.Thisischangingabitinrecognitionthatscrutinyoftheindividualcanleadtoinsights,especiallyinrelationtoper-sonalizedmedicine.Consideranexample.Ingroupstudiesofdrugs,individualswhorespondwellarereferredtoasexceptionalresponders(Kaiser,2013;Printz,2015).Theseareindividualswhoinfactrespondextremelywell(e.g.,tumorsaregone,andthebenefitsofthetreatmentaremaintained).Fromthestandpointofgroupresearchandcomparisonoftreatmentwithsomecontrolconditions,treatmentoverallmayhave“failed.”Thatis,thetreatmentgroupisnotbetteroff(statisti-cally)thanthecomparisongroup.Yet,movingfromtheoverallgroupresultstothoseexcep-tionalresponders,thatis,individuals,canleadtogreatinsightsabouttheproblem,inthiscasetumors,andtheirtreatment.Whatisitabouttheseexceptionsthatmadethemrespondwelltoatreatmentthatwasineffec-tiveformostpeople?Somefactor(moderator)mustworkinconjunctionwiththatotherwiseineffectivetreatmenttomakeitveryeffective.Inthisexample,ageneticvariationwasfoundinthetumorthatcharacterizedtheexceptions.Consequently,thetreatmentwasthenappliedtootherswiththatvariationandtreatmentwaseffective(Iyeretal.,2012).Withoutthatfactor,treatmentdidnotworkverywellandwiththatfactoritworkedextremelywell.Thisisahugefindinginrela-tiontopersonalizedmedicineandtreatmentofindividuals.Wealmostthrewawayaneffec-tivetreatmentfromlookingatthegroupcom-parisons(andstatisticalsignificance)becausemostpeopleinthegroupdidnotrespond.Westillneedtreatment(s)forthoseindividualsofcourse.Yet,studyingexceptions(individuals)yieldedimportantinsightsthataffectmanypeople.Whilethereisnostandarddefinitionofexceptionsthatcouldapplytoallareasofinterventionresearch,theyreflectoutliersthatareextremeinsomewayinrelationtogroupaverages.Insingle-caseresearch,withsmallsamplesizes,itismoredifficulttoidentifyextremesinresponding,althoughnotallsuchresearchusessmallsamples.Inanycase,afteridentifyingexceptionswhorespondwell,wecannowdirectindividualstotreatmentsfromwhichtheyarelikelytoprofitandperhapsbyidentifyingfactorsthatmaybealteredtomakemoreindividualsresponsivetotreatment.Moregenerally,thestudyofindividualscangreatlyadvanceourunderstandingofunderly-ingprocessesthatrelatetotheunexpectedandexpectedoutcomes.Inmostintervention-groupstudies,investigatorsreportthegroupdatawithoutacarefulanalysisofindividualswhorespondindifferentways.Thestudyoftheindividualisincreasinginattentionandimportance.Personalizedmedi-cineandotherareas(e.g.,personalizednutri-tion,exercise,andpsychotherapy)continuetobegoalsofrefiningapplicationsofinterven-tions.Addtothattheincreasedavailabilityofapplications(apps)ineverydaythathavethemeansofcollectingdatainrealtime,usingthatdatatopromptinterventions,andexam-iningwhetherthatinterventionhasbeenhelp-fulonanindividual-by-individualbasis.Allthiscanbefedbacktoacentralsourceformoni-toringandchecking.Thereisthecapacitytoevaluatemanydifferentinterventionsforphys-icalandmentalhealthusingtechnologythatindividualshavewiththem(e.g.,smartwatches,phones,andclothing).Single-casedesignsareobviouslywellsuitedtoevaluatepersonalizedinterventions.ExtendingtheReachoftheDesigns:Challenges,Obstacles,andMythsHighlightingthestrengthsandspecialfea-turesofsingle-casedesignsarguefortheirmorefrequentuseinresearchandprogramevaluation.Yet,severalchallenges,obstacles,andmythsinterferewithadoptionandexten-sionsofthedesigns.Thechallengesbeginwithambiguityofwhatthedesignsentail,basedonthemanytermsusedtorefertothem,asImentionedearlier.Thereareofcourseothers.AssociatedFeaturesRepeatedassessmentovertimeandreplica-tionofinterventioneffects,usuallywithinthe71Single-CaseExperimentalDesigns
sameparticipant(s)overtime,areessentialfeaturesofsingle-casedesigns.Othercharac-teristicsthatmayseemtobeessentialactuallyarenot.Thesecharacteristicsareimportanttohighlightbecausethedesignsmayberejectedornotachievethebreadthofusebasedonassumptionsaboutwhattheymustentail.FocusonOvertBehaviorItmaybebelievedthatsingle-caseresearchmustevaluateovertbehavior.Theassociationofsingle-caseresearchwithassessmentofovertbehavioriseasilyunderstandablefromahis-toricalstandpoint.Single-caseresearchgrewoutoftheresearchonthebehavioroforgan-isms(e.g.,Skinner,1938).Behaviorwasdefinedinexperimentalresearchasovertper-formanceonsuchmeasuresasfrequencyorrateofresponding(e.g.,numberoftimesaleverwaspressed).Thelawfulnessofrelationswithdifferentexperimentalmanipulationswaseasilyseeninthislaboratoryparadigm.YetitisimportanttonotethatSidman(1990)explic-itlyacknowledgedthatthemethodologywas“notrestrictedtothestudyofbehavior”(p.187).Evenso,thismessagemaynothavereachedthosewhomightviewthisasanimpedimenttoadoptingthedesigns.Assingle-casedesignswereextendedinappliedsettings(e.g.,schools,hospitals,nurs-inghomes,communities),assessmentofovertbehaviorhascontinuedtobeassociatedwiththemethodology.Yetsingle-caseresearchdesignsarenotnecessarilyrestrictedtoovertperformance.Considerself-reportmeasuresandespeciallyself-reportofsubjectiveexperi-ence.Theuseofsuchmeasuresmightseemhereticalwhenmentionedinthesamepara-graphassingle-casedesigns.Yet,twolargebod-iesofresearchmightbecitedtoallayconcernsaboutself-reportandsubjectivemeasures.First,therearestrongempiricalliteraturesonsub-jectiveexperience(e.g.,ofstress,happiness,subjectivewell-being,loneliness)asmeasuredbyself-report.Manyindicesofsubjectiveexpe-riencearerelatedtothefunctioningoftheimmunesystemandgeneexpressionandpre-dictphysicalandmentalhealthandlongevity(e.g.,Dieneretal.,2017;Frey,2011;Hawkley&Cacioppo,2010;Slavich&Cole,2013).Second,self-reportmeasuresthatcanbeadministeredinanongoingwaytoprovidesingle-casedataareavailableandhavebeenthoroughlyevaluated(e.g.,reliability,validity,withthousandsofindividuals).Forexample,inthecontextofpsychotherapy,measuresusedfortheindividualclienthavebeenadministeredrepeatedlyoverthecourseoftreatmentencompassingavarietyofclinicalproblems(Goodmanetal.,2013;Lambert,2015;Lutzetal.,2015).Individual-izedassessmentusingsuchmeasuresimprovestreatmentoutcomebyprovidingdataonpro-gressthatcanguidedecisionmaking(Poston&Hanson,2010;Simonetal.,2013).Inbrief,measuresthateitherarenotbehav-iorintheusualsenseorexpandthescopeofwhat“counts”asbehavior(responsestoques-tionnaires)canbereadilyusedinsingle-casedesigns.Self-reportisoneexampleortypeofmeasures.Thedesignsrequirecontinuousassessment,measuresthatcanreflectchange,andmeasuresthatmeetcriteriaforreliabilityandvalidity,asapplicabletothattypeofmea-sure.Single-casedesigns,asotherdesigns,donotinherentlyincludeonetypeoffocusordependentmeasure,althoughitmaywellfavoroneclassoveranotheroronespecialtyareaofresearch(e.g.,behavioranalysis).DataEvaluationbyVisualInspectionSingle-casedesignsrelyprimarilyonvisualinspectionratherthanstatisticalanalysestodrawinferencesaboutthereliabilityofchange.Yet,visualinspectionisnotanessen-tialcomponent.Experimentalarrangements(thedesigns)donotautomaticallydictatethemethodofdataanalyses(e.g.,statisticaltestsorvisualinspection).Onecanreadilyseethisinothercontexts:Qualitativeresearch(thewaysofapproachingthesubjectmatter)veryoftenusequantitative(statisticalevaluation)toexaminethedata.Again,withsingle-casedesigns,ifanyoneconsideringdesignsharborsanobjectiontovisualinspection,thatisnotareasonnottousethedesigns.Arelatedconcernmightbethefactthatnovelstatisticalmethodsareoftenneededtoevaluatedatafromsingle-casedesigns(Kazdin,2021).However,thisisaseparateissueandpresupposesthatonehasover-cometheviewthatalldataanalysesaremerelyvisualwithouttheputativerigorofstatisticalanalyses.Withtheincreaseduseofeffectsizemeasuresandmeta-analysesinsingle-caseresearch,perhapstheconcernthatonlyvisualinspectioncanbeusedwillbesurmounted.AlanE.Kazdin72
InterventionsDerivedfromOperantConditioningApossiblechallengeindisseminatingsingle-casedesignsistheviewthatthedesignsarerestrictedtoexperimentalmanipulationsorinterventionsderivedfrompsychologyandspe-cificallyfromlearning(operantconditioning).WeknowfromTacticsaswellasthebroaderhistoryofbehavioranalysis,operantcondition-ingandsingle-casedesignsdevelopedtogether,andthesubstantivecontentoftheformerwasinextricablyboundwiththeevaluativetech-niquesofthelatter(Kazdin,1978).Thiscon-nectionhascontinued,andnotbychance,andreflectsacoreinterestinprocessesofindivid-ualsratherthanmeans(averageperformance)ofgroups(Sidman,1960).Also,inthedomainsofapplication,interventionsdrawnfromoper-antconditioninghaveproventobequiteeffec-tive.Evenso,thereisnonecessaryconnectionbetweensingle-casedesignsandoperantcondi-tioningconceptsortechniques.Severaldifferenttypesofinterventionsderivedfromclinical,social,andcommunitypsychology,medicine,pharmacology,businessandindustry,andotherareasnotcentraltoorderivedfromoperantconditioninghavebeenincludedinsingle-caseresearch(seeKazdin,2021).Becausedesignsrequireevaluationofchangeoncontinuousmeasures,somedepen-dentvariablesthatdonotrespondrelativelyquickly(e.g.,weightlossamongobeseindivid-uals)mightbelessamenabletousebutthisisnotauniqueissueforsingle-caseresearch.Ofteninterimorproxymeasures(e.g.,calories,foodportions)areused.Itisimportanttoemphasizethebreadthofapplicabilitybecausethedesignsarewidelyrelevanttosituationsinwhichthegoalistoaltersomefacetoffunc-tioning.Thedesignsdonotdictateorlimitthesourcesofinterventions.GeneralCommentsSingle-casedesignsinfacthavebeencloselytiedtoeachofthefacetsIhavenotedasnotessential.Onecanreadilyextendandenjoythebenefitsofthedesignsandtheirpowertodem-onstratecausalrelationswithoutthoseotherfeaturesmanyofustakeforgranted(e.g.,interventionsderivedfrombehaviorana-lyses,well-definedtargetbehaviors,visualinspection,andsoon).Happily,thedesignshavenosuchlimitsandtheassociatedfeaturesarejustthat,extremelyvaluableintheirownrightbutnotessential.Iamnotinanywayadvocatingthatthesenonessentialfeaturesbeabandoned.Justtheopposite.Useofthesefeatureshasbeenpartofmajoradvancesinbothbasicandappliedresearch.Intheapplieddomain,averyvisibleexampleisintheareaofthetreatmentofautismspectrumdisorderwhereappliedbehavioranalysisandsingle-caseresearch(drawingonessentialandnonessentialfea-tures)havemadeextraordinarycontributions(e.g.,Eldeviketal.,2009;Matson,2017;Wongetal.,2015).Imentionthefeaturesasnotessentialbecausethosetrainedinbetween-groupresearchcaneasilyidentifyoneofthenonessentialfeaturesandcastasidetheentiremethodologyasnotuseful,rigorous,orindeedscientific.Iwouldencourageadoptionofthepackageofessentialandnonessentialcompo-nentsbecauseoftheirenormouscontributioninidentifyinginterventionsfordiverseclientsindiversesettings.However,Iwouldnotencourageanyofthenonessentialcompo-nentsifthesedissuadedresearchersfromexploringthemethodologyordissuadedadministratorsandprogramplannersfromaddingevaluationtothemanyotherwisewell-intentionedbutroutinelyunevaluatedinter-ventionprograms.GeneralityofFindingsAconcernaboutsingle-caseresearchiswhethertheresultswillbegeneralizablebeyondtheoneorfewcasesincludedinthestudy.Theconcernisbasedonthebeliefthatgroupstudies,withmanymoreparticipants,aremuchmorelikelytoproduceresultsthataregeneralizable.Single-caseresearchgrewoutofanexperimentalphilosophythatattemptstodiscoverlawfulrelationsatthelevelofindividualsratherthantheperformanceofgroups(Kazdin,1978;Sidman,1960).Thereisnoimplicationinthisgoaltosuggestthatpro-cessesthatarestudied,andthefindings,willbeidiosyncratic.Basiclaboratoryresearchwithhumanandnonhumananimalshasalonghistoryinbehav-ioranalysisfocusingonbehaviorwithsinglecases.Weknowfromthisresearchthatfunda-mentalprocessesareverygeneralizable.Anexcellentexamplederivesfromresearchonschedulesofreinforcement(e.g.,Ferster&Skinner,1957).Fundamentalpatternsofrespondingdemonstratedfromreinforcement73Single-CaseExperimentalDesigns
schedulesgeneralizedratherdramaticallyacrossmultiplenonhumananimalspecies(Skinner,1956).Thesescheduleeffectsextendtohumansandeventogroupsofhumans.Onthislatterscore,thebehavioroftheUSCon-gressshowsthesescheduleeffectsintheirper-formanceofpassingbills(Critchfieldetal.,2003;Weisberg&Waldrop,1972).Ofcourse,generalityoffindingsfromsingle-caseresearchandbehavioranalysesextendtomanyotherareasbeyondscheduleeffects(e.g.,Honig&Staddon,1977;Sidman,2004).Thisisnottoimplythatallfindingsofnon-humananimalscarryovertohumansorthatallspeciesrespondalike.Indeed,behavioranalysishashelpedtoelaboratesomeofthedifferencesamonghumanandnonhumananimals(e.g.,Galizio&Bruce,2018;Peroneetal.,1988).However,thereareenoughexam-pleswherebasicprocessesgeneralizequitebroadly.Onealsocouldaddresstheconcernaboutgeneralizabilitybyfocusingonappliedresearch.Asanexample,considertheuseofthetokeneconomytoalterhumanbehavior.7Pioneeringapplicationswerebegunwithhos-pitalizedpsychiatricpatientsinthelate1950sandearly1960s.Bymid-to-late1970sandearly1980s,tokeneconomieshadbeenappliedtoawideagerange(toddlersthrougholderadults),inschoolsatalllevels(e.g.,preschool,elementaryschools,colleges),forindividualswithproblemsrelatedtophysicalhealth,men-talhealth,andintellectualdisabilities),byvari-ousorganizations(e.g.,professionalandamateurathleticteams,themilitary,smallandlargebusinesses),andforavarietyofbehav-iorsinthecommunity(e.g.,conservation,littering,improveddriving;Ayllon&Azrin,1968;Kazdin,1977,1982).Moreover,theinterventioncontinuestobeappliedandpublicationsonitseffectshaveaccelerated(e.g.,Hackenberg,2018;Ivyetal.,2017).Therearemanygroupstudiesnow,buttheworkbeganwithsingle-casedesignsandthefeaturesIhaveoutlinedasitscorecharacteristics.Intermsof“generalizability,”itwouldbedifficulttoimagineanypsychologicalorpsychosocialinterventionthathasbeenaswidelyandeffectivelyappliedacrosssamples,settings,andtargetfociashasthetokenecon-omyandthathasanevidencebase!Thisoneintervention,tome,atleasthandlesanyobjec-tionaboutgeneralityofeffects.Letusgotothemattermoredirectly,thatis,inthecontextinwhichconcernemerges,inwhichthefindingsfromgroupstudieswithmanysubjectsareassumedtohavegreatergeneralizability.Groupresearchdoesnotnec-essarilyyieldgeneralizablefindingsormoregeneralizablefindingsthansingle-caseresearchfornofewerthansixreasons.First,giventhewaytheresultsingroupstudiesareanalyzed(usuallyacomparisonofmeansamonggroups),wehavenoideaabouthowmanyindividualsinthegroupshowedachangeortheeffect(e.g.,improvementorimprovementtoanimportantdegree).Inshort,ingroupstudies,wedonotknowhowmanyparticipantschangedorchangedinawaythatmakesadifference,andtheextenttowhichthemeanforthegrouprepresentsindi-vidualmembersofthegroup.Ambiguityaboutthegeneralityoffindingsfrombetween-groupresearchisnotinherentinthisresearchapproach.However,investigatorsrarelylookattheindividualsubjectdataaswellasthegroupdatatomakeinferencesaboutthegen-eralityofeffectsamongsubjectswithinagivencondition.Second,thegeneralizabilityoffindingsfromgroupresearchhavebeenchallengedbothinhumanandnonhumananimalresearch(e.g.,Guthrie,1997).Recentchallengeshavecontinuedtomakethepoint.Approximately67%ofpsychologystudiesintheUSrelyonundergraduatesassubjects(Arnett,2008).CollegestudentsassubjectshavebeenreferredtoasWEIRD,anacronymforWest-ern,Educated,Industrialized,Rich,andfromDemocraticCultures(Henrichetal.,2010a,b).FindingsobtainedfromWEIRDos,astheyarecalled,donotrepresentindividualsfromotherculturesinfundamentalwaysinsuchdomainsasattributions,memory,reasoningstyle,per-sonality,perception,andothers.ThesemakeanyresultsfromWEIRDosofquestionablegeneralizability.Inshort,generalityisnotamatterofsamplesizeordesign(e.g.,groupstudies,single-casedesigns).Toimplythatone7Atokeneconomyisareinforcementsysteminwhichtokensareearnedforavarietyofbehaviorsandareusedtopurchaseavarietyofback-upreinforcers.Pokerchips,coins,tickets,stars,points,andcheckmarksarecommonlyusedastokens.Thetokensserveasageneralizedcondi-tionedreinforcer.Theyderivetheirvaluefrombeingexchangeableforback-upreinforcers(privileges,trinkets,games,dependingontheageoftheclientsandsetting).AlanE.Kazdin74
typeofresearchproducesmoregeneralizableresultsthananotherisabitofaredherring.Generalizabilityisknowntodependonmanyfacets,designbeinglowonthelist.Third,groupresearchdoesstudymodera-tors,thatis,thosevariablesthatinfluencethemagnitudeanddirectionofagivenfinding.Moderatorscanbedirecttestsofgeneralitywithinastudy.Groupresearchcanformsub-groupsandevaluatesuchvariablesinawaysingle-casedesignscannot,unlessagivensingle-casestudydrawsonlargenumbers.However,groupresearchdoesnotnecessarilydotheevaluationofsubgroupswellenoughtoallowconclusionsaboutgenerality.Alargenumberofparticipantsisneededtoanalyzewhorespondsandthemanyvariablesthatmayaccountforthat.Moststudiesdonotincludesufficientnumbersofparticipantstodotherequisiteanalyses(Kessleretal.,2019).Inbroadstrokes(e.g.,malesversusfemales)onemightbeabletosaywhichgrouprespondedbetter.Here“respondedbetter”relatestogen-eralizabilitybecauseitsuggeststhatthefind-ingsapplytosomepeoplemorethanothers.Yet,theresultsagainfocusongroupmeanssowhensubgroupdifferencesarefound,westilldonotknowwhatproportionofindividualsarerepresentedbythosemeans.Fourthandrelatedly,asresearchersweareencouraged(andformajorfundingagencies,oftenrequired)toincludeindividualsofdif-ferentsexes,genders,ethnicities,andcultures.Thereareimportantscientific,socialjustice,andpoliticalreasonsforthisrecommendation.Letusleavetheweightyandimportantissuesasidefornowtocontinuethenarrowfocusongeneralizationofeffects.Theinclusionofdiverseindividualsinastudyhasfosteredtheviewthatanyfindingsmightbemoregeneral-izablethanwouldbethecaseifonlyone(lessdiverse)groupwereincluded.Thisisnotaninformedview.Merelyincludingmorediversesubjectsalonedoesnotestablish,test,ordem-onstrategenerality.Typically,thereareinsuffi-cientnumbersofvariousgroupsinthestudytotestgenerality,thatis,whetherdiversity,eth-nicity,oridentityactasmoderatorsoftheintervention,leavingasidewhatthosesub-groupdifferencesmeanbynotlookingatindi-vidualdata.Hence,merelyincludingadiversesampleinagroupstudydoesnotbythatfactalonemeantheresultswillbemoregeneraliz-ableacrosssubjectcharacteristics.Inclusionisnotasufficientconditiontoevaluategeneral-ity.Directtestsareneededthatask:Doeseachgrouprespondsimilarly?Iftheydo,wehaveshedlightonthegeneralityoffindingsamongthedifferentgroups.Butthetestsarerarelyrequested(bythefundingagencies),reportedbyinvestigators,andifreportedoftenareweaktests(lowpower)becauseofsubgroupsamplesizes.Inshort,includingamorediversesam-pledoesnotautomaticallymaketheresultsmoregeneralizable.Fifth,participantsinbetween-groupresearcharerarelysampledinsuchawaythattheyarerandomfromalargepopulation.Between-groupstudiesinpsychologyoftenusesrandomassignmentofparticipantstogroupsbutnotrandomselectionofthesamplefromthepopulation(e.g.,allcollegestudents,peoplefromdifferentpartsofthecountry).Randomassignmentisnotespeciallypertinenttogeneralityofeffects,althoughrandomselec-tionis.Thereareexceptionsinstudieswhererandomsamplesaredrawnfromagivencoun-try.Forexample,epidemiologicalstudiessam-plerandomlyindividualsthroughoutcommunities(e.g.,instudyingdisease,eatingpatterns,psychiatricdiagnoses)withthegoalinmindtorepresentthepopulation.Some-timesstudiessamplemanygeographicalloca-tions,eventhoughthesearenotchosenrandomly.Multisiteinterventionstudiespur-poselycarryoutthestudyinseverallocations(e.g.,afewregionsofthecountry).Rarelydopsychologicalandeducationalstudiesuseran-domselectionofcasesorselectionfromdiverselocations.Thus,thegeneralityofgroupresearchtooisinquestion.Finally,between-groupresearchoftenusescarefulinclusionandexclusioncriteriaforselectionofparticipants.Forexample,ifonewishestotestaninterventionforclinicaldepression,noteveryonewhoisdepressedisallowedtoparticipate.Depressionoccursinchildhood,adolescence,andadulthood.Also,therearedifferenttypesandfacetsofdepres-sion(e.g.,seasonalaffectivedisorder,bipolardisorder,andpostpartumdepression,withvar-iationsinbothwomenandmen).Thegroupselectedforastudyislikelytoberestrictedinage(e.g.,adults20-45yearsofage)andtypeofdepression.Moreover,manypeoplewhoaredepressedhaveotherpsychiatricdisorders(e.g.,anxietydisorders,substanceusedisor-der,personalitydisorder;Hasinetal.,2018).75Single-CaseExperimentalDesigns
Thepresenceofmorethanonedisorder(referredtoascomorbidity)canmakeindivid-ualswithadiagnosisofdepressionverydiffer-entfromeachother.Inourhypotheticalstudy,wemightselectonlythoseadultswhohavemajordepressionwithoutotherpsychiat-ricdisorders.Wewantparticipantswhocancometotreatmentforthe10sessionsweareplanning(e.g.,soweonlytakethosewhohavetransportationandwhoarenotsodepressedthattheycannotleavetheirhomesorneedtobehospitalizedormedicated).Somedepressedpatientsaresuicidal;itislikelywewanttoexcludethose,too,andreferthemtoimmediatecare.Thisexamplecouldcontinuetoshowthatbetween-groupstudiesroutinelyscreenwhoparticipates.Indeed,itisverywisetodosobecausethebroadertherangeofsamplecharacteristics,thegreaterthevariabil-ityinthestudy.Variabilityinthesamplecanmakeitmuchmoredifficulttodemonstrateaneffectoftheintervention,whenthereisaneffect(Kazdin,2017).Thatsaid,generalityofanyfindingsto“depressedpeople”wouldbehighlysuspectgiventheintricatescreening.Ingeneral,between-groupresearchinschools,clinics,andothersettingsandforthepurposesofeducation,treatment,rehabilita-tion,counseling,andpreventionoftenselectssampleswithextremecareandexcludesmanyindividualspurposely.Thispractice,whilemethodologicallyprudentforprovidingastrongtestofinterventioneffects,isnotthepathofproducinggeneralizablefindings.Thisisnotacriticismofbetween-groupresearchbutratherapeekattheusuallyunexaminedviewthatgroupstudiesproducefindingsthataregeneralizableormoregeneralizablethanfindingsfromanotherresearchtradition(e.g.,single-caseorqualitativeresearch).Asmycommentsconveyinrelationtogroupresearch,generalizabilityoffindingsishardlysomethingtoflauntinrelationtosingle-caseresearch.Single-casedesigns(asamethodol-ogy)donotinherentlyproducemoreorlessgeneralizableeffects.Findingsobtainedinsingle-casedemonstrationsappeartobehighlygeneralizablebecauseofthetypesofinterven-tionsthatarecommonlyinvestigatedandthefrequentrelianceonbasicprocesses(e.g.,reinforcement,functionalanalyses)thathavewideapplicationtohumanandnon-humananimalbehavior.Thatsaid,whetherafindingisgeneralizableisanempiricalquestionthatrequireslookingatthematterdirectlyinindividualandmeta-analyticstudiesthattesttheextenttowhichanoriginalfindingappliestootherpeople,situations,settings,andcon-texts.Tacticsunderscoredtheimportanceofreplication(directandsystematicreplication)withpreciselythisinmind.Sidman’s(1960)pointsstillstandasourguide.TrainingandEducationalOpportunitiesByandlargeinpsychologyintheUSandmorebroadlyacrossotherareasofscience,thequantitativeapproachtoresearchisnotonlyemphasized,itisoftentheonlyonetaughtinundergraduateandgraduatetraining.Indeed,otherapproaches(single-casedesigns,qualitativeresearch,andmixed-methodsresearch)mightnotevenbementionedinagivengraduatepro-gram.Theneglectofotherapproachesiseasilyexplained.Ifoneistotrainresearchers,theywillneedtobeskilledinquantitativeresearchmethods,themostcommonlyusedapproach.Also,quantitativemethods,asothermethodolo-gies,aredynamicandevolving.Someofthesechangesencompassthedesignsthemselves(e.g.,adaptiveclinicaltrials,randomizedclusterdesigns,effectiveness-implementationhybriddesigns),themeansofimprovingcausalinfer-enceswhentrueexperimentsarenotpossible(e.g.,variationsofnonrandomizeddesigns,useofpropensityscorematching),thewaysofobtainingsamples(e.g.,conveniencevs.purposivesampling,useofcrowdsourcingsubjectpools),theoptionsforassessmentmethods(e.g.,empiricalexperiencesamplingandecologicalmomentaryassessment),andtheavailablestatisticalmethods(e.g.,regressionmodels,useofmachinelearningandartificialintelligence,meansofhandling“bigdata”).Giventhelimiteddegreesoffreedomincoursesthatcanbeofferedandtakeninagraduatetrain-ingprograminpsychology,forexample,itisclosetoimpossibletoequipstudentswiththebasicsandadvancesinthequantitativeresearchapproaches.Withthatascontext,thereisnotusuallyevenacurtsyorcameoappearanceofotherresearchmethodologies.Mycommentsundertherubricofchal-lengesindisseminatingsingle-casedesignsharboranassumptionthatisimportanttoquestion.Thatis,theviewthatthescientificcommunityisverymuchawareofsingle-casedesignsbutobjecttousingthedesignsforoneAlanE.Kazdin76
ormorereasons.Perhapsmorelikely,theremaybenopervasiveobjectionstothedesignssincetheyarerarelyincludedintrainingwhenresearchdesign,assessment,anddataevalua-tionaretaught.Thestrengthsofsingle-casedesigns,highlightedearlier,aremootifthemethodologyisnoteventaught.Perhapsthegreatestchallengeisoneofdis-seminatingthemethodologyinmultiplewaysthatcanreachresearchersintrainingbutalsotheirmentors.Forexample,occasionally,doc-toralstudentswillnotbeallowedorwillhavetotaketheircasetothedepartmentequivalentofthesupremecourttouseasingle-casedesignaspartofamaster’sordissertationthesis.Yes,methodologicaldiscriminationunderscoresthechallengesofourtaskofdisseminationofthedesigns.SummaryandConclusionsSingle-caseexperimentaldesignshaveproventobeextremelyusefulinbasicandappliedresearch.Theapproachpermitscausalinfer-encesasthattermisusedinscienceandmakesnocompromiseinrelationtorigororthebasicgoalsandtenetsofempiricalresearch.Themethodologicalapproachcanbeshowcasedbyplacingsingle-casedesignsinthebroadercon-textofdifferentresearchapproaches.Firstandmostfamiliarisbetween-groupresearch,whichdominatestrainingofstudentsandresearchersinthesocial,biological,andnaturalsciences.Typically,thisinvolvesgroupsandnullhypothesesandstatisticalsignificancetests,centraltoquantitativeresearch.Second,asIhaveelaborated,single-caseresearch,inwhichgroupsarenotessential,nonullhypothesistestingisneeded,anddataareusuallyevalu-atedusingcriteriaofvisualinspection,increas-inglycomplementedbystatisticaltests.Thirdisqualitativeresearchwhichconsistsofsystem-atic,replicable,andrigorouswaysofstudyingindividualsandhumanexperiencemuchmoreintensivelythaneitherbetween-grouporsingle-caseresearch.Qualitativeresearchoftenconsidersasmallnumberofparticipants,eval-uatestheirexperienceinrichdetails,oftenwithlengthynarrativedescriptions,andmayormaynotusestatisticaltechniquestoevalu-atethecontent.Finally,mixed-methodsresearchisyetanotherapproachandcombinesfeaturesofquantitativeandqualitativeresearchbutisviewedasadistinctmethodology.Thevastmajorityofresearchinthesciencesfallswithinthequantitativebetween-grouptra-dition.Iftheworkisinterventionresearch(treatment,prevention,education,servicesinmedicalorrehabilitationsettings),random-izedcontrolledtrialsareviewedastheepit-omeofthistradition.Moreover,obtainingfundingforone’sinterventionresearchfrommajorfundingagenciesandpublishingtheresultsinmosthigh-tierjournalsarenearlyimpossiblewithoutproposingorcompletingarandomizedcontrolledtrial.Quantitativeresearch,andrandomizedtrialsasitsposterchild,accountforenormousgainsindevelop-ingevidence-basedinterventionsinmedicine,nursing,education,counseling,psychother-apy,andrehabilitation.Thedominanceandclosetoexclusivereli-anceonthequantitativebetween-grouptradi-tionconstrainsourknowledge.Thefindingsfromastudyareverymuchinfluencedbythemethodsweuse.Thelevelofanalysis(individ-ual,group,groupsofstudies),thetypesofmea-suresandthenumberofoccasionsonwhichtheyareadministered(e.g.,self-reportatpre-andpost-;behavioralmeasurescontinuouslyassessedovertime;in-depthnarratives),andotherfeatureseitheressentialtoorcorrelatedwithdifferentmethodologies,revealdifferentfacetsofaphenomenon.Evenwithinasinglemethodologicalapproach,findingsmayvaryasafunctionofthemethodsthatareused.Forexample,withinthequantitativetradition,ret-rospectiveassessment,cross-sectionalassess-ment,andlongitudinalstudiesofagiven(i.e.,thesame)phenomenoncanyielddiffer-entfindings(e.g.,Lac&Crano,2009;Salthouse,2010;Takayanagietal.,2014).Mul-tiplemethodologiesandperspectivesareessen-tialwithoutimplyingthatoneisbetterthantheother,althougheachmaybemoreorlesswellsuitedtoaspecificresearchquestionorgoal.Considermethodsofresearch(differentresearchapproaches,assessments,andsoon)asasetoflensesthroughwhichwestudy,view,andunderstandnaturalphenomena.Theresultsweobtaindependveryheavilyonthelensweuse.Considertheanalogywith“real”lenses.Fordecades,theNationalAeronauticsandSpaceAdministration(NASA,2009)intheUS,incollaborationwithothercountries,hashadaGreatObservatoriesProgramthatincludesdifferenttelescopesinspace.Thedif-ferenttelescopeslookatthefull77Single-CaseExperimentalDesigns
electromagneticspectrumorwavelengthsincludingbutalsobeyondthespectrumthatisvisibletohumans(gammarays,X-rays,andinfrared).(ThemostfamiliaristheHubbleSpaceTelescopelaunchedin1990,butthethreeothersintheearlyprogramincludedtheComptonGammaRayObservatory,Chan-draX-RayObservatory,SpitzerSpaceTele-scope.)Thisprogramisnowdecadesoldandcompletelynewtelescopesareinuseandothersreadytobelaunchedthathaveexpandedcapabilities.Forthepresentdiscus-sion,thecriticalpointisthatobservingtheuniverseindifferentwayshasyieldedquitedif-ferentfindings.Thetelescopeswhenpointedtothesameobjectsshowdifferentpicturesandprovidedifferentinformation.Needlesstosay,nooneviewfromonetelescopeisbetterormoreaccurate;eachrevealstherealitytowhichitissensitive.Anyonetelescopewouldbelimiting.Andsoitiswithmethodologicalapproachesandpractices.Wewantdiversityofapproachespreciselybecausewhatoneseesdependsonhowoneislookingandthroughwhatlens.Wewantallthemethodologicalapproachesbroughttobeartounderstandandevaluateallfacetsthatimprovethehumanandnon-humananimalcondition(alongwithplantsandenvironments).Asscientistsfirst,ourcriteriaforresearchmethodsbeginwithsuchfundamentalquestionsas:1.Dothearrangementsandcentralfeaturespermitonetodrawinferencesfreeorrela-tivelyfreefromthemanybiasesthatcanenterintoresearch?2.Canwelearnanythingoftheoreticalorappliedsignificancetoadvanceourunder-standingoftheworld?and3.Aretheeffectsweobtainreplicable?These,andnodoubtrelatedquestions,notonlywelcomemultiplemethodologicalapproachesbutmakesalienttheshort-sightednessofhavingan“in”and“out”groupofmethodologicalapproachesthataretrained,used,andpromoted.Inthisarticle,Ihavefocusedonsingle-casedesigns,notasanadvocateofthedesigns,althoughIamhappytoconcedethatIam,butasapromoterofscience.Single-casedesignshavewideapplicability.Inaddition,thereisanewtimelinesstotheiruse.First,inthepastfewdecades,therehasbeenaproliferationofevidence-basedinter-ventionsinmanydisciplines(e.g.,education,medicineanditsmanybranches,psychology,socialwork,nursing,rehabilitation).Single-casedesignshavecontributedtothedevelop-mentandidentificationoftheseinterventionsandnodoubtwillcontinueatthislevel.Yet,thedesignsalsohaveanothercalling.Willevidence-basedinterventions(e.g.,diverseformsofcognitiveandbehavioraltherapies)exertthedemonstratedeffectwhenextendedfromgrouptrials,ofteninspecialcontrolledsettings,tothemanydifferentreal-lifesettingswherethoseinterventionsareapplied?WilltheyworkwithaspecificindividualwithwhomIamworkingtoaddresshisorherclinicaloreducationalproblem?Single-casedesignspro-videmultipleoptionstoaddressthesematters.FindingsfromgroupresearchareabsolutelypivotaltoadvancesintreatmentofmentalandphysicaldisordersandininterventionsinotherareasImentioned.Yet,itiscriticalnottolosesightofthefactthatwhateverresearchfindingsareobtainedfromwhatevermethod-ologies,wecareabouttheindividual.“Care”isnotalooseorwarmfuzzyclinicaltermbutrathercarenottoharm,carenottogivetheillusionofhelping,caretoprovideone’sbestprofessionally,caretomonitorandevaluatetheeffects,andcaretoutilizelimitedresourceswisely.Whetheranintervention(e.g.,forpsychological,educational,medicalproblem)iseffectiveforanyindividualisnotknowninadvance.Evaluationoftheinterven-tioniscritical,andthisisattheleveloftheindividual.Single-casedesigns,apartfromtheirmethodologicalprowessinresearch,canimprovethequalityofcareandservices.Researchmethodology(e.g.,assessmentandevaluation)isnotatoddswithsensitiveclinicalcarebutinsomewayistheonlywaytoensurethattypeofcare.Second,thereiskeeninterestinpersonal-izedorprecisionmedicine,whichconsistsofafine-grainedwayofidentifyinginterventionsthatcanbetailoredtoindividualsbasedontheirpredictedresponse(PersonalizedMedi-cineCoalition,2019).Breakthroughsintech-nology,useoflargedatabases,andassessmentinrelationtogenetic,biochemical,physiologi-cal,environmental,andpsychologicalprofilesallowonetoidentifycharacteristicsspecifictoindividualsthatmakethemlikelytorespondAlanE.Kazdin78
tointerventions.“Bigdata”isveryusefulinlargeparttoapplypowerfultools(e.g.,machinelearning,artificialintelligence,noveldataanalyses)tocullcombinationsoffactorsthatwillbeusefultopersonalizeandindividualizetreatment(e.g.,Kessleretal.,2019;Khamsi,2020;Luedtkeetal.,2019).Onceanalgorithmisgeneratedthatconveysthebestintervention,withinthelimitsofanydata-analyticmethodandwiththevariability(error),willthatinterventionworkorworkwellwithme?Weneed“tinydata”(myterm)tocheckontheresults.Thatis,designinginterventionsthatmapontoouruniqueprofilesunderscorestheimportanceofevaluationwiththesingle-case(Lillieetal.,2011;Schork,2015;Schork&Goetz,2017).Single-casedesignsareremark-ablywellsuitedtoevaluatinginterventionswiththeindividualandinsomewaysaremoreapplicablethaneveraswemovetomoreper-sonalizedinterventions.Single-casedesignsprovideanenormousarrayofoptionsthatcanbereadilyusedtoevaluatewell-intentionedbutnotyetestablishedinterventions.Thedesignsarenotapanaceaforthewoesofprogramsbasedongoodintentionsandhopeandsometimesthe-ory.Yettheyallowustoevaluateindividualpeople,classrooms,hospitals,communitypro-grams,socialpolicies,andmore.Morethan“just”evaluate,thedesignsprovideinteriminformationalongthewaytoseeiftheinter-ventionprogramisontherighttrackandifnot,whethercorrectionsareneeded.Themethodologyofsingle-casedesignsiswellworkedout,continuestoevolve,andisevenmoreapplicablethantheircurrentusesenjoy.WebeginwithSidman’s(1960)bookontheapproach,foundations,andphilosophyofsci-encetoprovidetheunderpinningsofthedesigns.Therehasnowbeenaleapinuseofthedesignsanddevelopmentandelaborationofthedesigns.Considertheleapfromthentonowinadifferentcontext.WhenLeonardoDaVincidrewtheequiva-lentofthefirsthelicopterin1493,itisunlikelythathestatedtohimselfthat,“Onedaythiswillbeusedbytouriststoseecitiesandotherattractions,bythemilitarytowagewarandtorescuecasualties,byflighthealthcareprofessionalstohelpwithemer-gencymedicalevacuations,byfirefighterstoextinguishfires,andofcoursebywealthypeopletogotoworkortotheirislandsforaweekendgetaway.”WemustassumethatLeonardocouldnothaveknown.Analogously,whenMurraySidman(1960)articulatedthefoundationsofsingle-casedesigns,hemayhaveknownwellthatthedesignswouldcontinuetogeneratebreak-throughsintheexperimentalanalysisofbehavior,someofwhichwerehisown.Butitisunlikelyhestatedtohimselfthat,“Oneday,thesedesignswillbeusedtoevaluateinterven-tionsandprogramsinmultipleandvariedcontexts(e.g.,education,multiplebranchesofmedicine,communitywork,psychology,themilitary,andbusinessandindustry),willincludeanunimaginablearrayofparticipants(e.g.,toddlersthroughtheelderly,athletes,parents,teachers,nurses,doctors,prisoners,paroleofficers,soldiers,psychiatricpatients,individualswhouseandabusedrugs,andresearchmethodologists)andwillspreadtomultipledisciplinesthatencompassthesedomains.Hecouldnothaveknownbutweareindebtedtohimforsettingthecourse.Evenwiththeextraordinaryuseofthedesignstoevaluateinterventions,acentralfeatureofthisarticleandhomagetoSidman,thepotentialandbenefitsofthedesignremaintobeexploited.Thedesignsareessentialtoestab-lishwhethersomanyofoureffortstoeffectchangemakeadifference.WeoweagreatdealtoSidmanforhelpingusreachthispoint.ReferencesAalbersberg,I.J.,Appleyard,T.,Brookhart,S.,Carpenter,T.,Clarke,M.,Curry,S.,Dahl,J.,DeHaven,A.,Eich,E.,Franco,M.,Freedman,L.,Graf,C.,Grant,S.,Hanson,B.,Joseph,H.,Kiermer,V.,Kramer,B.,Kraut,A.,Karn,R.K.…Vazire,S.(2018,February15).Makingsciencetrans-parentbydefault;IntroducingtheTOPStatement.https://doi.org/10.31219/osf.io/sm78tAndersson,G.,Titov,N.,Dear,B.F.,Rozental,A.,&Carlbring,P.(2019).Internet-deliveredpsychologicaltreatments:Frominnovationtoimplementation.WorldPsychiatry,18(1),20-28.https://doi.org/10.1002/wps.20610Appelbaum,M.,Cooper,H.,Kline,R.B.,Mayo-Wilson,E.,Nezu,A.M.,&Rao,S.M.(2018).Journalarticlereportingstandardsforquantitativeresearchinpsy-chology:TheAPAPublicationsandCommunicationsBoardTaskForceReport.AmericanPsychologist,73(1),3-25.https://doi.org/10.1037/amp0000389Arnett,J.J.(2008).Theneglected95%:WhyAmericanpsychologyneedstobecomelessAmerican.AmericanPsychologist,63(7),602-614.https://doi.org/10.1037/0003-066X.63.7.60279Single-CaseExperimentalDesigns
Artman,K.,Wolery,M.,&Yoder,P.(2012).Embracingourvisualinspectionandanalysistradition:Graphinginterobserveragreementdata.RemedialandSpecialEducation,33(2),71-77.https://doi.org/10.1177/0741932510381653Ayllon,T.,&Azrin,N.(1968).Thetokeneconomy:Amotiva-tionalsystemfortherapyandrehabilitation.Appleton-Cen-tury-Crofts.Baer,D.M.(1977).Perhapsitwouldbebetternottoknoweverything.JournalofAppliedBehaviorAnalysis,10,167-172.https://doi.org/10.1901/jaba.1977.10-167Bakker,M.,&Wicherts,J.M.(2011).The(mis)reportingofstatisticalresultsinpsychologyjournals.BehaviorResearchMethods,43(3),666–678.https://doi.org/10.3758/s13428-011-0089-5Barlow,D.H.,Nock,M.K.,&Hersen,M.(2009).Singlecasedesigns:Strategiesforstudyingbehaviorchange(3rded.).PearsonBaron,A.(1999).Statisticalinferenceinbehavioranalysis:Friendorfoe?TheBehaviorAnalyst,22,83–85.https://doi.org/10.1007/BF03391983Barton,E.E.,Lloyd,B.P.,Spriggs,A.D.,&Gast,D.L.(2018).Visualanalysisofgraphicdata.InJ.R.Ledford&D.L.Gast(Eds.),Singlecaseresearchmethod-ology:Applicationsinspecialeducationandbehavioralsci-ences(3rded.,pp.179-214).Routledge.Beretvas,S.N.,&Chung,H.(2008).Areviewofsingle-subjectexperimentaldesignmeta-analyses:Methodo-logicalissuesandpractice.Evidence-BasedCommunica-tionandAssessmentandIntervention,2,129–141.https://doi.org/10.1080/17489530802446302Burns,M.K.(2012).Meta-analysisofsingle-casedesignresearch:Introductiontothespecialissue.JournalofBehavioralEducation,21,175-184.https://doi.org/10.1007/s10864-012-9158-9Camerer,C.F.,Dreber,A.,Holzmeister,F.,Ho,T.H.,Huber,J.,Johannesson,M.,Kirchler,M.,Nave,G.,Nosek,B.A.,Pfeiffer,T.,Altmejd,A.,Buttrick,N.,Chan,T.,Chen,Y.,Forsell,E.,Gampa,A.,Heikensten,E.,Hummer,L.,Imai,T.…Wu,H.(2018).EvaluatingthereplicabilityofsocialscienceexperimentsinNatureandSciencebetween2010and2015.NatureHumanBehaviour,2(9),637-644.https://doi.org/10.1038/s41562-018-0399-zCaron,P.O.(2019).Multilevelanalysisofmatchingbehav-ior.JournaloftheExperimentalAnalysisofBehavior,111(2),183-191.https://doi.org/10.1002/jeab.510Cebolla,A.,Demarzo,M.,Martins,P.,Soler,J.,&Garcia-Campayo,J.(2017).Unwantedeffects:Isthereaneg-ativesideofmeditation?Amulticentresurvey.PloSOne,12(9),e0183137.https://doi.org/10.1371/journal.pone.0183137Collier-Meek,M.A.,Sanetti,L.M.,&Fallon,L.M.(2017).Incorporatingappliedbehavioranalysistoassessandsupporteducators’treatmentintegrity.PsychologyintheSchools,54(4),446-460.https://doi.org/10.1002/pits.22001Coon,J.C.,&Rapp,J.T.(2018).Applicationofmultiplebaselinedesignsinbehavioranalyticresearch:Evi-dencefortheinfluenceofnewguidelines.BehavioralInterventions,33(2),160-172.https://doi.org/10.1002/bin.1510Cooper,J.O.,Heron,T.E.,&Heward,W.L.(2020).Appliedbehavioranalysis(3rded.).PearsonPrenticeHall.Cox,B.S.,Cox,A.B.,&Cox,D.J.(2000).Motivatingsign-agepromptssafetybeltuseamongdriversexitingseniorcommunities.JournalofAppliedBehaviorAnalysis,33,635–638.https://doi.org/10.1901/jaba.2000.33-635Craig,A.R.,&Fisher,W.W.(2019).Randomizationtestsasalternativeanalysismethodsforbehavior-analyticdata.JournaloftheExperimentalAnalysisofBehavior,111(2),309–328.https://doi.org/10.1002/jeab.500Crawford,M.J.,Thana,L.,Farquharson,L.,Palmer,L.,Hancock,E.,Bassett,P.,Clarke,J.,&Parry,G.D.(2016).Patientexperienceofnegativeeffectsofpsy-chologicaltreatment:Resultsofanationalsurvey.TheBritishJournalofPsychiatry,208(3),260-265.https://doi.org/10.1192/bjp.bp.114.162628Critchfield,T.S.,Haley,R.,Sabo,B.,Colbert,J.,&Macropoulis,G.(2003).AhalfcenturyofscallopingintheworkhabitsoftheUnitedStatesCongress.Jour-nalofAppliedBehaviorAnalysis,36,465-486.https://doi.org/10.1901/jaba.2003.36-465DeChesnay,M.(Ed.).(2017).Nursingresearchusingcasestudies:Qualitativedesignsandmethodsinnursing.Springer.Declercq,L.,Cools,W.,Beretvas,S.N.,Moeyaert,M.,Ferron,J.M.,&VandenNoortgate,W.(2020).Multi-SCED:Atoolfor(meta-)analyzingsingle-caseexperi-mentaldatawithmultilevelmodeling.BehaviorResearchMethods,52(1),177-192.https://doi.org/10.3758/s13428-019-01216-2Diener,E.,Heintzelman,S.J.,Kushlev,K.,Tay,L.,Wirtz,D.,Lutes,L.D.,&Oishi,S.(2017).Findingsallpsychologistsshouldknowfromthenewscienceonsubjectivewell-being.CanadianPsychology/PsychologieCanadienne,58(2),87-104.https://doi.org/10.1037/cap0000063Diller,J.W.,Barry,R.J.,&Gelino,B.W.(2016).Visualanalysisofdatainamultielementdesign.JournalofAppliedBehaviorAnalysis,49(4),980-985.https://doi.org/10.1002/jaba.325Dodge,K.A.,Dishion,T.J.,&Lansford,J.E.(Eds.).(2006).Deviantpeerinfluencesinprogramsforyouth:Problemsandsolutions.Guilford.Eldevik,S.,Hastings,R.P.,Hughes,J.C.,Jahr,E.,Eikeseth,S.,&Cross,S.(2009).Meta-analysisofearlyintensivebehavioralinterventionforchildrenwithautism.JournalofClinicalChild&AdolescentPsychology,38(3),439-450.https://doi.org/10.1080/15374410902851739Evans,J.J.,Gast,D.L.,Perdices,M.,&Manolov,R.(2014).Singlecaseexperimentaldesigns:Introduc-tiontoaspecialissueofNeuropsychologicalRehabili-tation.NeuropsychologicalRehabilitation,24(3-4),305–314.https://doi.org/10.1080/09602011.2014.903198.Ferster,C.B.,&Skinner,B.F.(1957).Schedulesofreinforce-ment.Appleton-Century-Crofts.Fiske,K.,&Delmolino,L.(2012).Useofdiscontinuousmethodsofdatacollectioninbehavioralintervention:Guidelinesforpractitioners.BehaviorAnalysisinPrac-tice,5(2),77-81.https://doi.org/10.1007/BF03391826Fournier,A.K.,Ehrhart,I.J.,Glindemann,K.E.,&Geller,E.S.(2004).Interveningtodecreasealcoholabuseatuniversityparties:DifferentialreinforcementAlanE.Kazdin80
ofintoxicationlevel.BehaviorModification,28,167–181.https://doi.org/10.1177/0145445503259406Francis,G.(2012).Thepsychologyofreplicationandrep-licationinpsychology.PerspectivesonPsychologicalSci-ence,7(6),585-594.https://doi.org/10.1177/1745691612459520Frey,B.S.(2011).Happypeoplelivelonger.Science,331,542-543.https://doi.org/10.1126/science.1201060Fryling,M.J.,Wallace,M.D.,&Yassine,J.N.(2012).Impactoftreatmentintegrityoninterventioneffec-tiveness.JournalofAppliedBehaviorAnalysis,45(2),449-453.https://doi.org/10.1901/jaba.2012.45-449Fuqua,R.W.(1990).Tacticsofscientificresearchat30:Somepersonalreflections.TheBehaviorAnalyst,13(2),179-181.https://doi.org/10.1007/BF03392536Galizio,M.,&Bruce,K.E.(2018).Abstraction,multipleexemplartrainingandthesearchforderivedstimulusrelationsinanimals.PerspectivesonBehaviorScience,41(1),45-67.https://doi.org/10.1007/s40614-017-0112-yGary,T.(2016).Howtodoyourcasestudy.SagePublications.Gast,D.L.,&Ledford,J.R.(2018).Singlecaseresearchmeth-odology(3rded.).Routledge.Gilroy,S.P.,Franck,C.T.,&Hantula,D.A.(2017).Thediscountingmodelselector:Statisticalsoftwarefordelaydiscountingapplications.JournaloftheExperimen-talAnalysisofBehavior,107(3),388-401.https://doi.org/10.1002/jeab.257Glass,G.V.(1997).Interruptedtimeseriesquasi-experiments:Complementarymethodsforresearchineducation(2nded.).AmericanEducationalResearchAssociation.Goodman,J.D.,McKay,J.R.,&DePhilippis,D.(2013).Progressmonitoringinmentalhealthandaddictiontreatment:Ameansofimprovingcare.ProfessionalPsy-chology:ResearchandPractice,44(4),231-246.https://doi.org/10.1037/a0032605Guthrie,R.V.(1997).Eventheratwaswhite:Ahistoricalviewofpsychology.Allyn&Bacon.Hackenberg,T.D.(2018).Tokenreinforcement:Transla-tionalresearchandapplication.JournalofAppliedBehaviorAnalysis,51(2),393-435.https://doi.org/10.1002/jaba.439Hantula,D.A.(2019).Editorial:Replicationandreliabilityinbehaviorscienceandbehavioranalysis:Acallforaconversation.PerspectivesinBehaviorScience,42,1–11.https://doi.org/10.1007/s40614-019-00194-2Hasin,D.S.,Sarvet,A.L.,Meyers,J.L.,Saha,T.D.,Ruan,W.J.,Stohl,M.,&Grant,B.F.(2018).Epidemi-ologyofadultDSM-5majordepressivedisorderanditsspecifiersintheUnitedStates.JAMAPsychiatry,75(4),336-346.https://doi.org/10.1001/jamapsychiatry.2017.4602Hawkley,L.C.,&Cacioppo,J.T.(2010).Lonelinessmat-ters:Atheoreticalandempiricalreviewofconse-quencesandmechanisms.AnnalsofBehavioralMedicine,40(2),218-227.https://doi.org/10.1007/s12160-010-9210-8Henrich,J.,Heine,S.J.,&Norenzayan,A.(2010a).MostpeoplearenotWEIRD.Nature,466(7302),29.https://doi.org/10.1038/466029aHenrich,J.,Heine,S.J.,&Norenzayan,A.(2010b).Theweirdestpeopleintheworld.BehavioralandBrainSci-ences,33(2-3),61-83.https://doi.org/10.1017/S0140525X0999152XHeyvaert,M.,&Onghena,P.(2014).Randomizationtestsforsingle-caseexperiments:Stateoftheart,stateofthescience,andstateoftheapplication.JournalofContextualBehavioralScience,3(1),51–64.https://doi.org/10.1016/j.jcbs.2013.10.002Honig,W.K.&Staddon,J.E.R.(Eds.)(1977).Handbookofoperantbehavior.Prentice-Hall.Hopkins,B.L.,Cole,B.L.,&Mason,T.L.(1998).Acri-tiqueoftheusefulnessofinferentialstatisticsinappliedbehavioranalysis.TheBehaviorAnalyst,21(1),125-137.https://doi.org/10.1007/BF03392787Horner,R.H.,Carr,E.G.,Halle,J.,McGee,G.,Odom,S.,&Wolery,M.(2005).Theuseofsingle-subjectresearchtoidentifyevidence-basedpracticeinspecialeducation.ExceptionalChildren,71,165–179.https://doi.org/10.1177/001440290507100203Howard,D.,Best,W.,&Nickels,L.(2015).Optimisingthedesignofinterventionstudies:Critiquesandwaysfor-ward.Aphasiology,29(5),526-562.https://doi.org/10.1080/02687038.2014.985884Hyzy,R.C.(Ed.)(2017).Evidence-basedcriticalcare:Acasestudyapproach.SpringerInternational.Ivy,J.W.,Meindl,J.N.,Overley,E.,&Robson,K.M.(2017).Tokeneconomy:Asystematicreviewofproce-duraldescriptions.BehaviorModification,41(5),708-737.https://doi.org/10.1177/0145445517699559Iyer,G.,Hanrahan,A.J.,Milowsky,M.I.,Al-Ahmadie,H.,Scott,S.N.,Janakiraman,M.,Pirun,M.,Sander,C.,Socci,N.D.,Ostrovnaya,I.,Viale,A.,Heguy,A.,Peng,L.,Chan,T.A.,Bochner,B.,Bajorin,B.,Berger,M.F.,Taylor,B.S.,&Solit,D.B.(2012).Genomesequencingidentifiesabasisforeverolimussensitivity.Science,338(6104),221.https://doi.org/10.1126/science.1226344Johnson,A.H.,&Cook,B.G.(2019).Preregistrationinsingle-casedesignresearch.ExceptionalChildren,86(1),95-112.https://doi.org/10.1177/0014402919868529Kaiser,J.(2013).Rarecancersuccessesspawn“excep-tional”researchefforts.Science,340,263.https://doi.org/10.1126/science.340.6130.263Kazdin,A.E.(1977).Thetokeneconomy:Areviewandevalua-tion.Plenum.Kazdin,A.E.(1978).Historyofbehaviormodification:Experi-mentalfoundationsofcontemporaryresearch.UniversityParkPress.Kazdin,A.E.(1982).Thetokeneconomy:Adecadelater.JournalofAppliedBehaviorAnalysis,15(3),431-445.https://doi.org/10.1901/jaba.1982.15-431Kazdin,A.E.(2017).Researchdesigninclinicalpsychology(5thed.).Pearson.Kazdin,A.E.(2018a).Developingtreatmentsforantisocialbehavioramongchildren:Controlledtrialsanduncontrolledtribulations.PerspectivesonPsychologicalScience,13(5),634-650.https://doi.org/10.1177/1745691618767880Kazdin,A.E.(2018b).Innovationsinpsychosocialinterven-tionsandtheirdelivery:Leveragingcutting-edgesciencetoimprovetheworld’smentalhealth.OxfordUniversityPress.Kazdin,A.E.(2021).Single-caseresearchdesigns:Methodsforclinicalandappliedsettings(3rded.).OxfordUniversityPress.Kessler,R.C.,Bossarte,R.M.,Luedtke,A.,Zaslavsky,A.M.,&Zubizarreta,J.R.(2019).Machinelearningmethodsfordevelopingprecisiontreatment81Single-CaseExperimentalDesigns
ruleswithobservationaldata.BehaviourResearchandTherapy,120,103412.https://doi.org/10.1016/j.brat.2019.103412Khamsi,R.(2020).Computingcancer’sweakspots.Science,368(6496),1174-1177.https://doi.org/10.1126/science.368.6496.1174Killeen,P.R.(2011).Modelsoftracedecay,eligibilityforreinforcement,anddelayofreinforcementgradients,fromexponentialtohyperboloid.BehaviouralProcesses,87(1),57-63.https://doi.org/10.1016/j.beproc.2010.12.016Kratochwill,T.R.,Hitchcock,J.,Horner,R.H.,Levin,J.R.,Odom,S.L.,Rindskopf,D.M.,&Shadish,W.R.(2010).Single-casedesignstechnicaldocu-mentation.RetrievedfromWhatWorksClearinghouse.https://ies.ed.gov/ncee/wwc/Document/229Kratochwill,T.R.,&Levin,J.R.(Eds.).(2015).Single-caseresearchdesignandanalysis(psychologyrevivals):Newdirectionsforpsychologyandeducation.Routledge.Kravitz,R.L.,Duan,N.,Eslick,I.,Gabler,N.B.,&Kaplan,H.C.(2014).DesignandimplementationofN-of-1trials:Auser’sguide.AgencyforHealthcareResearchandQuality,USDepartmentofHealthandHumanServices.Krueger,T.K.,Rapp,J.T.,Ott,L.M.,Lood,E.A.,&Novotny,M.A.(2013).DetectingfalsepositivesinABdesigns:Potentialimplicationsforpractitioners.BehaviorModification,37(5),615-630.https://doi.org/10.1177/0145445512468754Kubina,R.M.,Kostewicz,D.E.,Brennan,K.M.,&King,S.A.(2017).Acriticalreviewoflinegraphsinbehavioranalyticjournals.EducationalPsychologyReview,29(3),583-598.https://doi.org/10.1007/s10648-015-9339-xLac,A.,&Crano,W.D.(2009).Monitoringmatters:Meta-analyticreviewrevealsthereliablelinkageofparentalmonitoringwithadolescentmarijuanause.PerspectivesonPsychologicalScience,4(6),578-586.https://doi.org/10.1111/j.1745-6924.2009.01166.xLambert,M.J.(2015).Outcomeresearch:Methodsforimprovingoutcomeinroutinecare.InO.C.G.Gelo,A.Pritz,&B.Rieken(Eds.),Psychotherapyresearch:Foundations,process,andoutcome(pp.593-610).Springer.Lane,J.D.,&Gast,D.L.(2014).Visualanalysisinsinglecaseexperimentaldesignstudies:Briefreviewandguidelines.NeuropsychologicalRehabilitation,24(3-4),445-463.https://doi.org/10.1080/09602011.2013.815636Ledford,J.R.,&Gast,D.L.(2018).Singlecaseresearchmeth-odology:Applicationsinspecialeducationandbehavioralsciences(3rded.).Routledge.Levin,J.R.,Kratochwill,T.R.,&Ferron,J.M.(2019).Ran-domizationproceduresinsingle-caseinterventionresearchcontexts:(Someof)“therestofthestory”.JournaloftheExperimentalAnalysisofBehavior,112(3),334-348.https://doi.org/10.1002/jeab.558Ledford,J.R.,Barton,E.E.,Severini,K.E.,Zimmerman,K.N.,&Pokorski,E.A.(2019).Visualdisplayofgraphicdatainsinglecasedesignstudies:Systematicreviewandexpertpreferenceanalysis.EducationandTraininginAutismandDevelopmentalDisabilities,54(4),315–327.Lillie,E.O.,Patay,B.,Diamant,J.,Issell,B.,Topol,E.J.,&Schork,N.J.(2011).Then-of-1clinicaltrial:Theulti-matestrategyforindividualizingmedicine?PersonalizedMedicine,8(2),161-173.https://doi.org/10.2217/pme.11.7Luedtke,A.,Sadikova,E.,&Kessler,R.C.(2019).Samplesizerequirementsformultivariatemodelstopredictbetween-patientdifferencesinbesttreatmentsofmajordepressivedisorder.ClinicalPsychologicalScience,7(3),445-461.https://doi.org/10.1177/2167702618815466Lutz,W.,DeJong,K.,&Rubel,J.(2015).Patient-focusedandfeedbackresearchinpsychotherapy:Whereareweandwheredowewanttogo?PsychotherapyResearch,25(6),625-632.https://doi.org/10.1080/10503307.2015.1079661Machalicek,W.,&Horner,R.H.(2018).Specialissueonadvancesinsingle-caseresearchdesignandanalysis.DevelopmentalNeurorehabilitation,21(4),209-211.https://doi.org/10.1080/17518423.2018.1468600Maggin,D.M.,&Chafouleas,S.M.(2013).Introductiontothespecialseries:Issuesandadvancesofsynthesiz-ingsingle-caseresearch.RemedialandSpecialEduca-tion,34(1),3-8.https://doi.org/10.1177/0741932512466269Maggin,D.M.,Lane,K.L.,&Pustejovsky,J.E.(2017).Introductiontothespecialissueonsingle-casesystem-aticreviewsandmeta-analyses.RemedialandSpecialEducation,38(6),323-330.https://doi.org/10.1177/0741932517717043Maggin,D.M.,O’Keeffe,B.V.,&Johnson,A.H.(2011).Aquantitativesynthesisofmethodologyinthemeta-analysisofsingle-subjectresearchforstudentswithdisabilities:1985–2009.Exceptionality,19(2),109-135.https://doi.org/10.1080/09362835.2011.565725Manolov,R.,&Solanas,A.(2018).Analyticaloptionsforsingle-caseexperimentaldesigns:Reviewandapplica-tiontobrainimpairment.BrainImpairment,19(1),18-32.https://doi.org/10.1017/BrImp.2017.17Manolov,R.,&Vannest,K.J.(2019).Avisualaidandobjec-tiveruleencompassingthedatafeaturesofvisualanaly-sis.BehaviorModification.Advanceonlinepublication.https://doi.org/10.1177/0145445519854323Matson,J.L.(Ed.)(2017).Handbookoftreatmentsforautismspectrumdisorder.SpringerInternational.Mazur,J.E.(2006).Mathematicalmodelsandtheexperi-mentalanalysisofbehavior.JournaloftheExperimentalAnalysisofBehavior,85(2),275-291.https://doi.org/10.1901/jeab.2006.65-05McSweeney,A.J.(1978).Effectsofresponsecostonthebehaviorofamillionpersons:ChargingfordirectoryassistanceinCincinnati.JournalofAppliedBehaviorAnalysis,11,47–51.https://doi.org/10.1901/jaba.1978.11-47Mercer,S.H.,&Sterling,H.E.(2012).Theimpactofbaselinetrendcontrolonvisualanalysisofsingle-casedata.JournalofSchoolPsychology,50(3),403-419.https://doi.org/10.1016/j.jsp.2011.11.004Miltenberger,R.G.,Flessner,C.,Gatheridge,B.,Johnson,B.,Satterlund,M.,&Egemo,K.(2004).Eval-uationofbehavioralskillstrainingtopreventgunplayinchildren.JournalofAppliedBehaviorAnalysis,37,513–516.https://doi.org/10.1901/jaba.2004.37-513Moeyaert,M.,Rindskopf,D.,Onghena,P.,&VandenNoortgate,W.(2017).Multilevelmodelingofsingle-casedata:AcomparisonofmaximumlikelihoodandBayesianestimation.PsychologicalMethods,22(4),760–778.https://doi.org/10.1037/met0000136AlanE.Kazdin82
NationalAeronauticsandSpaceAdministration(2009,April).NASA’sGreatObservatories.www.nasa.gov/audience/forstudents/postsecondary/features/F_NASA_Great_Observatories_PS.htmlNinci,J.(2019).Single-casedataanalysis:Apractitionerguideforaccurateandreliabledecisions.BehaviorModification.https://doi.org/10.1177%2F0145445519867054Ninci,J.,Vannest,K.J.,Willson,V.,&Zhang,N.(2015).Interrateragreementbetweenvisualanalystsofsingle-casedata:Ameta-analysis.BehaviorModification,39(4),510-541.https://doi.org/10.1177/0145445515581327Normand,M.P.,&Bailey,J.S.(2006).Theeffectsofcel-erationlinesonvisualdataanalysis.BehaviorModifica-tion,39,295–314.https://doi.org/10.1177/0145445503262406Nosek,B.A.,Ebersole,C.R.,DeHaven,A.C.,&Mellor,D.T.(2018).Thepreregistrationrevolution.ProceedingsoftheNationalAcademyofSciences,115,2600–2606.https://doi.org/10.1073/pnas.1708274114OpenScienceCollaboration(2015).Estimatingtherepro-ducibilityofpsychologicalscience.–,349(6251),aac4716.https://doi.org/10.1126/science.aac4716Pallmann,P.,Bedding,A.W.,Choodari-Oskooei,B.,Dimairo,M.,Flight,L.,Hampson,L.V.,Holmes,J.,Mander,A.P.,Odondi,L.,Sydes,M.R.,Villar,S.S.,Wason,J.M.,Weir,C.J.,Wheeler,G.M.,Yap,C.,&Jaki,T.(2018).Adaptivedesignsinclinicaltrials:Whyusethem,andhowtorunandreportthem.BMCMedicine,16(1),1-15.https://doi.org/10.1186/s12916-018-1017-7Parker,R.I.,&Brossart,D.F.(2003).Evaluatingsingle-caseresearchdata:Acomparisonofsevenstatisticalmethods.BehaviorTherapy,34(2),189-211.https://doi.org/10.1016/S0005-7894(03)80013-8Parker,R.I.,Cryer,J.,&Byrns,G.(2006).Controllingbaselinetrendinsingle-caseresearch.SchoolPsychologyQuarterly,21,418–443.https://doi.org/10.1037/h0084131Parker,R.I.,Vannest,K.J.,&Davis,J.L.(2011).Effectsizeinsingle-caseresearch:Areviewofninenon-overlaptechniques.BehaviorModification,35(4),303–322.https://doi.org/10.1177/0145445511399147Parsons,M.B.,Schepis,M.M.,Reid,D.H.,McCarn,J.E.,&Green,C.W.(1987).Expandingtheimpactofbehavioralstaffmanagement:Alarge-scale,long-termapplicationinschoolsservingseverelyhandicappedstudents.JournalofAppliedBehaviorAnalysis,20,139–150.https://doi.org/10.1901/jaba.1987.20-139Parsonson,B.S.,&Baer,D.M.(1978).Theanalysisandpresentationofgraphicdata.InT.R.Kratochwill(Ed.),Single-subjectresearch:Strategiesforevaluatingchange(pp.101-165).AcademicPress.Parsonson,B.S.,&Baer,D.M.(1992).Thevisualanalysisofdataandcurrentresearchintothestimulicontrol-lingit.InT.R.Kratochwill&J.R.Levin(Eds.),Single-subjectresearchdesignandanalysis(pp.27-52).LawrenceErlbaumAssociates.Perone,M.(1999).Statisticalinferenceinbehavioranaly-sis:Experimentalcontrolisbetter.TheBehaviorAnalyst,22(2),109-116.https://doi.org/10.1007/BF03391988Perone,M.,Galizio,M.,&Baron,A.(1988).Therelevanceofanimal-basedprinciplesinthelaboratorystudyofhumanoperantconditioning.InG.Davey&C.Cullen(Eds.),Humanoperantconditioningandbehaviormodification(p.59-85).JohnWiley&Sons.PersonalizedMedicineCoalition(2019).Personalizedmedi-cine101improvingpatientcareinthe21stcentury.Per-sonalizedMedicineCoalition.www.personalizedmedicinecoalition.org/Userfiles/PMC-Corporate/file/Personalized_Medicine_101_fact-sheet.pdfPetrosino,A.,Turpin-Petrosino,C.,Hollis-Peel,M.E.,&Lavenberg,J.G.(2013).Scaredstraightandotherjuvenileawarenessprogramsforpreventingjuveniledelinquency:Asystematicreview.CampbellSystematicReviews,9(1),1-55.https://doi.org/10.1002/14651858.CD002796.pub2Poston,J.M.,&Hanson,W.E.(2010).Meta-analysisofpsychologicalassessmentasatherapeuticinterven-tion.PsychologicalAssessment,22(2),203-212.https://doi.org/10.1037/a0018679Printz,C.(2015).NCIlaunchesexceptionalrespondersinitiative:Researcherswillattempttoidentifywhysomepatientsrespondtotreatmentsomuchbetterthanothers.Cancer,121(6),803-804.https://doi.org/10.1002/cncr.29311Risley,T.R.(1970).Behaviormodification:Anexperimental-therapeuticendeavor.InL.A.Hamerlynck,P.O.Davidson,&L.E.Acker(Eds.),Behaviormodificationandidealmentalhealthservices.Uni-versityofCalgaryPress.Salthouse,T.A.(2010).Influenceofageonpracticeeffectsinlongitudinalneurocognitivechange.Neuro-psychology,24(5),563-572.https://doi.org/10.1037/a0019026Schnelle,J.F.,Kirchner,R.E.,Macrae,J.W.,McNees,M.P.,Eck,R.H.,Snodgrass,S.,Casey,J.D.,&Uselton,P.H.,Jr.(1978).Policeevalu-ationresearch:Anexperimentalandcost-benefitanalysisofahelicopterpatrolinhigh-crimearea.Jour-nalofAppliedBehaviorAnalysis,11,11–21.https://doi.org/10.1901/jaba.1978.11-11Schork,N.J.(2015).Personalizedmedicine:Timeforone-persontrials.Nature,e520,609–611.https://doi.org/10.1038/520609aSchork,N.J.,&Goetz,L.H.(2017).Single-subjectstudiesintranslationalnutritionresearch.AnnualReviewofNutrition,37,395-422.https://doi.org/10.1146/annurev-nutr-071816-064717Schulz,K.F.,Altman,D.G.,&Moher,D.(2010).CON-SORT2010statement:Updatedguidelinesforreportingparallelgrouprandomizedtrials.AnnalsofInternalMedicine,152(11),726-732.https://doi.org/10.7326/0003-4819-152-11-201006010-00232Scruggs,T.E.,Mastropieri,M.A.,Forness,S.R.,&Kavale,K.A.(1988).Earlylanguageintervention:Aquantitativesynthesisofsingle-subjectresearch.Jour-nalofSpecialEducation,22(3),259-283.https://doi.org/10.1177/002246698802200301Shadish,W.R.(Ed.)(2014a)SpecialIssue:Analysisandmeta-analysisofsingle-casedesigns.JournalofSchoolPsychology,52(2),109-248.https://doi.org/10.1016/j.jsp.2013.11.009Shadish,W.R.(2014b).Statisticalanalysesofsingle-casedesigns:Theshapeofthingstocome.Current83Single-CaseExperimentalDesigns
DirectionsinPsychologicalScience,23(2),139-146.https://doi.org/10.1177/0963721414524773Shadish,W.R.,Zelinsky,N.A.,Vevea,J.L.,&Kratochwill,T.R.(2016).Asurveyofpublicationpracticesofsingle-casedesignresearcherswhentreat-mentshavesmallorlargeeffects.JournalofAppliedBehaviorAnalysis,49(3),656-673.https://doi.org/10.1002/jaba.308Sham,E.,&Smith,T.(2014).Publicationbiasinstudiesofanappliedbehavior-analyticintervention:Aninitialanalysis.JournalofAppliedBehaviorAnalysis,47(3),663-678.https://doi.org/10.1002/jaba.146Shamseer,L.,Moher,D.,Clarke,M.,Ghersi,D.,Liberati,A.,Petticrew,M.,Shekelle,P.,Stewart,L.A.,&thePrisma-PGroup(2015).Pre-ferredreportingitemsforsystematicreviewandmeta-analysisprotocols(PRISMA-P)2015:Elaborationandexplanation.BMJ,349,g7647.https://doi.org/10.1136/bmj.g7647Sidman,M.(1960).Tacticsofscientificresearch.Evaluatingexperimentaldatainpsychology.BasicBooks.Sidman,M.(1990).Tactics:Inreply.TheBehaviorAnalyst,13(2),187-197.https://doi.org/10.1007/BF03392538Sidman,M.(2004).Theanalysisofhumanbehaviorincontext.TheBehaviorAnalyst,27(2),189-195.https://doi.org/10.1007/BF03393179Simmons,J.P.,Nelson,L.D.,&Simonsohn,U.(2011).False-positivepsychology:Undisclosedflexibilityindatacollectionandanalysisallowspresentingany-thingassignificant.PsychologicalScience,22,1359–1366.https://doi.org/10.1177/0956797611417632Simon,W.,Lambert,M.J.,Busath,G.,Vazquez,A.,Berkeljon,A.,Hyer,K.,Granley,M.,&Berrett,M.(2013).Effectsofprovidingpatientprogressfeedbackandclinicalsupporttoolstopsychotherapistsinaninpatienteatingdisorderstreatmentprogram:Aran-domizedcontrolledstudy.PsychotherapyResearch,23(3),287-300.https://doi.org/10.1080/10503307.2013.787497Skinner,B.F.(1938).Thebehavioroforganisms:Anexperi-mentalanalysis.Appleton-Century-Crofts.Skinner,B.F.(1956).Acasehistoryinscientificmethod.AmericanPsychologist,11,221–233.https://doi.org/10.1037/h0047662Skinner,B.F.(1984).Amatterofconsequences:Partthreeofanautobiography.NewYorkUniversityPress.Slavich,G.M.,&Cole,S.W.(2013).Theemergingfieldofhumansocialgenomics.ClinicalPsychologicalScience,1,331-348.https://doi.org/10.1177/2167702613478594Spriggs,A.D.,Lane,J.D.,&Gast,D.L.(2018).Visualrep-resentationofdata.InJ.R.Ledford&D.L.Gast(Eds.),Single-caseresearchmethodology:Applicationsinspecialeducationandbehavioralsciences(3rded.,pp.157-178).Routledge.Takayanagi,Y.,Spira,A.P.,Roth,K.B.,Gallo,J.J.,Eaton,W.W.,&Mojtabai,R.(2014).Accuracyofreportsoflifetimementalandphysicaldisorders:ResultsfromtheBaltimoreEpidemiologicalCatch-mentAreastudy.JAMAPsychiatry,71(3),273-280.https://doi.org/10.1001/jamapsychiatry.2013.3579Tanious,R.,De,T.K.,&Onghena,P.(2019).Amultiplerandomizationtestingprocedureforlevel,trend,vari-ability,overlap,immediacy,andconsistencyinsingle-casephasedesigns.BehaviourResearchandTherapy,119,103414.https://doi.org/10.1016/j.brat.2019.103414Tarlow,K.R.,&Brossart,D.F.(2018).Acomprehensivemethodofsingle-casedataanalysis:InterruptedTime-SeriesSimulation(ITSSIM).SchoolPsychologyQuarterly,33(4),590–603.https://doi.org/10.1037/spq0000273Tate,R.L.,Perdices,M.,Rosenkoetter,U.,McDonald,S.,Togher,L.,Shadish,W.,Horner,R.,Kratochwill,T.,Barlow,D.,Kazdin,A.,Sampson,M.,Shamseer,L.,&Vohra,S.(2016).TheSingle-CaseReportingGuide-lineinBEhaviouralInterventions(SCRIBE):Explana-tionandelaboration.ArchivesofScientificPsychology,4(1),10-31.https://doi.org/10.1037/arc0000027Tate,R.L.,Perdices,M.,Rosenkoetter,U.,Shadish,W.,Vohra,S.,Barlow,D.H.,Horner,R.,Kazdin,A.,Kratochwill,T.,McDonald,S.,Sampson,M.,Shamseer,L.,Togher,L.,Albin,R.,Backman,C.,Douglas,J.,Evans,J.J.,Gast,D.,Manolov,R.,…Wilson,B.(2016).TheSingle-CaseReportingGuide-lineinBEhaviouralInterventions(SCRIBE):State-ment.ArchivesofScientificPsychology,4(1),1-9.https://doi.org/10.1037/arc0000026TheBehaviorAnalyst(1999).Specialsectiononstatisticalinference,22(2).Ugille,M.,Moeyaert,M.,Beretvas,S.N.,Ferron,J.,&VandenNoortgate,W.(2012).Multilevelmeta-analysisofsingle-subjectexperimentaldesigns:Asimulationstudy.BehaviorResearchMethods,44(4),1244-1254.https://doi.org/10.3758/s13428-012-0213-1USFoodandDrugAdministration(2019,November).Adaptivedesignsforclinicaltrialsofdrugsandbio-logicsguidanceforindustry.USDepartmentofHealthandHumanServices,FederalRegistrar.https://www.fda.gov/media/78495/download.Vannest,K.J.,&Ninci,J.(2015).Evaluatinginterventioneffectsinsingle-caseresearchdesigns.JournalofCounseling&Development,93(4),403-411.https://doi.org/10.1002/jcad.12038Vannest,K.J.,Peltier,C.,&Haas,A.(2018).Resultsreportinginsinglecaseexperimentsandsinglecasemeta-analysis.ResearchinDevelopmentalDisabilities,79,10-18.https://doi.org/10.1016/j.ridd.2018.04.029Vohra,S.,Shamseer,L.,Sampson,M.,Bukutu,C.,Schmid,C.H.,Tate,R.,Nikles,J.,Zucker,D.R.,Kravitz,R.,Guyatt,G.,Altman,D.G.,Moher,D.,&theCENTgroup(2015).CONSORTextensionforreportingN-of-1trials(CENT)2015Statement.BMJ,350,h1738.https://doi.org/10.1136/bmj.h1738Weisberg,P.,&Waldrop,P.B.(1972).Fixed-intervalworkhabitsofCongress.JournalofAppliedBehaviorAnalysis,5,93-97.https://doi.org/10.1901/jaba.1972.5-93WhatWorksClearinghouse(2020,January).Standardshandbook(ver.4.1).https://ies.ed.gov/ncee/wwc/handbooksWhite,D.M.,Rusch,F.R.,Kazdin,A.E.,&Hartmann,D.P.(1989).Applicationsofmeta-analysisinindividual-subjectresearch.BehavioralAssessment,11(3),281–296.Wolery,M.,Dunlap,G.,&Ledford,J.R.(2011).Single-caseexperimentalmethods:Suggestionsforreporting.JournalofEarlyIntervention,3(2),103-109.https://doi.org/10.1177/1053815111418235Wolfe,K.,Seaman,M.A.,&Drasgow,E.(2016).InterrateragreementonthevisualanalysisofindividualtiersAlanE.Kazdin84
andfunctionalrelationsinmultiplebaselinedesigns.BehaviorModification,40(6),852-873.https://doi.org/10.1177/0145445516644699Wong,C.,Odom,S.L.,Hume,K.A.,Cox,A.W.,Fettig,A.,Kucharczyk,S.,Brock,M.E.,Plavnik,J.B.,Fleury,V.P.,&Schultz,T.R.(2015).Evidence-basedpracticesforchildren,youth,andyoungadultswithautismspectrumdisorder:Acomprehensivereview.JournalofAutismandDevelopmentalDisorders,45(7),1951-1966.https://doi.org/10.1007/s10803-014-2351-zXimenes,V.M.,Manolov,R.,Solanas,A.,&Quera,V.(2009).Factorsaffectingvisualanalysisinsingle-casedesigns.TheSpanishJournalofPsychology,12(2),823–832.https://hdl.handle.net/2445/29862Young,M.E.(2018).Aplaceforstatisticsinbehavioranal-ysis.BehaviorAnalysis:ResearchandPractice,18(2),193-202.https://doi.org/10.1037/bar0000099Young,M.E.(2019).Modernstatisticalpracticesintheexperimentalanalysisofbehavior:Anintroductiontothespecialissue.JournaloftheExperimentalAnalysisofBehavior,111(2),149-154.https://doi.org/10.1002/jeab.511Received:July7,2020FinalAcceptance:October5,2020EditorinChief:MarkGalizioAssociateEditor:MarkGalizio85Single-CaseExperimentalDesigns
CopyrightofJournaloftheExperimentalAnalysisofBehavioristhepropertyofWiley-Blackwellanditscontentmaynotbecopiedoremailedtomultiplesitesorpostedtoalistservwithoutthecopyrightholder’sexpresswrittenpermission.However,usersmayprint,download,oremailarticlesforindividualuse.
VERSION 5.0 What Works Clearinghouse Procedures and Standards Handbook, Version 5.0 WWC 2022008 U.S. DEPARTMENT OF EDUCATION A Publication of the National Center for Education Evaluation at IES
WHAT WORKS CLEARINGHOUSE PROCEDURES AND STANDARDS HANDBOOK, VERSION 5.0 March 2022 Statistical, technical, and analysis team (STAT) members Jack Buckley AMERICAN INSTITUTES FOR RESEARCH John Ferron UNIVERSITY OF SOUTH FLORIDA Michael S. Garet AMERICAN INSTITUTES FOR RESEARCH Russell Gersten INSTRUCTIONAL RESEARCH GROUP Ben B. Hansen UNIVERSITY OF MICHIGAN Fran Harmon DEVELOPMENT SERVICES GROUP Larry V. Hedges NORTHWESTERN UNIVERSITY Wendy Machalicek UNIVERSITY OF OREGON Rebecca Maynard UNIVERSITY OF PENNSYLVANIA Hiren Nisar 2M RESEARCH Terri Pigott GEORGIA STATE UNIVERSITY Allan Porowski ABT ASSOCIATES James Pustejovsky UNIVERSITY OF WISCONSIN-MADISON David Rindskopf CITY UNIVERSITY OF NEW YORK Jessaca Spybrook WESTERN MICHIGAN UNIVERSITY Emily Tanner-Smith UNIVERSITY OF OREGON Elizabeth Tipton NORTHWESTERN UNIVERSITY Jeffrey Valentine UNIVERSITY OF LOUISVILLE Elias Walsh MATHEMATICA Vivian Wong UNIVERSITY OF VIRGINIA
Staff Molly Cain Sarah Caverly Alicia Garcia Natalya Gnedko-Berry Daniel Hubbard David Miller Joshua Polanin Jordan Rickles Sarah Sahni Lisa Shimmel Joe Taylor Ryan Williams AMERICAN INSTITUTES FOR RESEARCH Danny Swan Emily Tanner-Smith UNIVERSITY OF OREGON Jeffrey Valentine UNIVERSITY OF LOUISVILLE Project OFFICERS Erin Pollard Betsy Wolf Jonathan Jacobson INSTITUTE OF EDUCATION SCIENCE WWC 2022008 U.S. DEPARTMENT OF EDUCATION
U.S. Department of Education Miguel A. Cardona Secretary Institute of Education Sciences Mark Schneider Director National Center for Education Evaluation and Regional Assistance Matthew Soldner Commissioner Elizabeth Eisner Associate Commissioner of Knowledge Use This report was prepared for the Institute of Education Sciences under Contract 91990018C0019 by the American Institutes for Research. The mention of trade names, commercial products, or organizations does not imply endorsement by the U.S. Government. This report is in the public domain. Although permission to reprint this publication is not necessary, the citation should be as follows: What Works Clearinghouse (2022). What Works Clearinghouse Procedures and Standards Handbook, Version 5.0. Washington, DC: U.S. Department of Education, Institute of Education Sciences, National Center for Education Evaluation and Regional Assistance (NCEE). This report is available on the What Works Clearinghouse website at https://ies.ed.gov/ncee/wwc/Handbooks. Alternate Formats The Alternate Format Center (AFC) produces documents in specific alternate formats, such as braille, large print, and electronic format, for employees and members of the public with disabilities. These documents may include agendas, correspondence, course materials, regulations, and materials for public distribution. Contact the AFC for submission guidelines and turnaround times. Hours of operation are Monday through Friday, 8:00 a.m. until 4:00 p.m., Eastern Time. For information, contact the AFC, Tena Witherspoon, [email protected], (202) 260-0818, or Tracey Flythe, [email protected], (202) 260-0852.
103 CHAPTER VI. REVIEWING FINDINGS FROM SINGLE-CASE DESIGN STUDIES Single-case designs (SCDs) are experimental designs with the potential to demonstrate causal effects that generally include a small number of participants. SCD studies must broadly adhere to some of the same guidelines in terms of their outcomes and general eligibility requirements. However, when SCD studies are focused on causal effect estimates, they differ from group designs in how they generate causal effect estimates. As a result, SCD studies require a different review process with different specific standards from group designs. These standards will guide WWC reviewers in identifying and evaluating evidence from SCDs. If a study is eligible for review as an SCD, it is reviewed using the criteria described next to determine whether it receives a research rating of Meets WWC Standards Without Reservations, Meets WWC Standards With Reservations, or Does Not Meet WWC Standards. SCD studies may also contain more than one experiment, and each experiment should receive its own rating. See the section corresponding to each design sub-type for guidance. Additional eligibility requirements for SCDs The eligibility criteria for a WWC review of SCDs are as described in Chapter II, Screening Studies for Eligibility. That is, the study must be made publicly available, released within the 20 years preceding the review, use eligible populations, examine eligible interventions, and have eligible outcomes. In addition, studies that are eligible for review as SCDs are identified by the following features: 1. An individual case is the unit of intervention administration and data analysis. A case is most commonly a single participant. It also may be a cluster of participants, such as a classroom or school. 2. Within the design, the case can provide its own control for purposes of comparison. For example, the case’s series of repeated outcome measurements prior to the intervention is compared with the series of repeated outcome measurements during and after receiving the intervention. 3. The outcome variable is measured repeatedly within and across different conditions. These different conditions are frequently structured as phases, such as the first baseline phase, first intervention phase, second baseline phase, and second intervention phase. Figure 15 displays the simplest form of an SCD, a single individual (case) with one baseline phase and one treatment phase. This simple design is sometimes referred to as an AB design. In SCDs, a phase typically refers to a set of data points from the same condition, observed across time without the interruption of data points from a different condition. When phases are referred to using a string of capital letters, each letter represents a phase from a different condition. For instance, an ABC design would be a design with three phases. The A phase would typically be the baseline phase, the B phase would be an intervention phase, and the C phase would be another intervention phase, either in the form of a modified intervention, an alternative intervention condition, or a maintenance phase. An ABCABC design would be a six-phase design with three conditions that would begin like the ABC design described above, but in the fourth phase it would return to the original baseline A phase, then transition back to another B phase, and finally transition once again to another C phase. Some designs deviate from this typical structure and rapidly alternate interventions within the same experimental phase. See the alternating treatment design section for more information about these designs.
104 Figure 15. Basic single-case design In the example in figure 15, the effect of the intervention can be conceptualized as the change in the outcome between the baseline phase (the A phase) and the treatment phase (the B phase). However, most SCD experts consider this simple form of the SCD to have weak internal validity because the effect of the intervention could be due to some other change that co-occurred with the intervention, such as developmental changes for the participant or changes that took place in the classroom that were unrelated to the intervention. Ensuring that SCDs guard against these threats is an important component of the WWC’s SCD standards. WWC standards apply to a wide range of SCDs, including reversal/withdrawal designs, multiple baseline designs, alternating and simultaneous intervention designs, changing criterion designs, and variations of these core designs like multiple probe designs. These designs, along with standards for combinations for these designs, are described in greater detail later. Reviewing findings from SCDs according to WWC standards The process for reviewing SCD studies that are found eligible for review is presented in figure 16. After a study is found eligible for a WWC review as an SCD, the next step is the same as in other designs and includes reviewing the study’s outcome measures and checking for confounding factors. If none of the outcome measures are consistent with the WWC’s requirements or if the study contains a confounding factor, the study will receive a research rating of Does Not Meet WWC Standards and the review will stop. Outcome measure requirements The WWC’s outcome measure requirements for SCDs are similar to those for randomized controlled trials (RCTs) and quasi-experimental designs (QEDs) described in Chapter III, Outcome measures, including (1) face validity, (2) reliability, (3) not overaligned with the intervention, (4) consistent data collection procedures, and in some instances (5) the independence of the outcome measure. Differences in these requirements for SCDs are described next.
105 Requirement 1: Face validity The requirements for face validity are the same as those for group designs. To show evidence of face validity, an outcome measure must appear to measure what it claims to measure. To demonstrate face validity, a measure must have a clear definition of what it measures, such as a skill, an event, a condition, or an object, and assess that skill or event. For instance, a measure described as a test of reading comprehension that only assesses reading fluency does not demonstrate face validity. Requirement 2: Reliability For a measure to demonstrate reliability, study authors must present evidence that the outcome has acceptably low levels of measurement error. In group designs, study authors typically report measures of internal consistency, temporal stability, or test-retest reliability. In SCDs, outcomes are most frequently direct observations of behavior. For these direct observation outcomes, the most applicable form of reliability is interassessor agreement, also known as inter-rater reliability or interobserver agreement. Although more than 20 statistical measures can represent interassessor agreement (for example, see Berk, 1979; Suen & Ary, 1989), commonly used measures include the percentage or proportional agreement and Cohen’s kappa coefficient, which adjusts for the expected rate of chance agreement (Hartmann et al., 2004). Minimum acceptable values of interassessor agreement are at least .80 if measured by percentage agreement, and at least .60 if measured by Cohen’s kappa (Hartmann et al., 2004). To meet the WWC’s interassessor agreement requirements for direct observation outcomes, the following criteria must be met: 1. The outcome variable must be measured systematically over time by more than one assessor for each case. 2. The study authors must collect interassessor agreement in each phase. 3. The study authors must collect interassessor agreement data for at least 20 percent of the data points. 4. The interassessor agreement must meet the minimum acceptable values for each outcome across all phases and cases (however, the interassessor agreement values are not required to meet minimum acceptable values separately for each case or phase). The raw data from the secondary assessor that were gathered for the purposes of interassessor agreement do not need to be reported. It is enough to report summary measures of interassessor agreement. If a study contains measures that are not direct observations of behavior, such as a test of an academic outcome, then the reliability requirements for these measures will follow the guidelines in Chapter III, Outcome measures. Requirement 3: Not overaligned Overalignment occurs when an outcome measure contains content or materials provided to the cases in one condition but not another. This rule does not apply when material covered by an outcome measure must be explicitly taught, or when an outcome measure is broadly educationally relevant. Content experts can provide advice on whether an outcome has broad educational relevance. These two caveats to the overalignment requirement are particularly important to SCDs, which frequently focus on narrow, specific outcomes that may require explicit teaching, or on daily-living outcomes with educational relevance. The functional skills domain from the Study Review Protocol contains examples such as dressing, preparing and eating food, or hygiene,
106 where the researcher might teach the participant a checklist or a set of steps that need to be repeated and the outcome might be some measure of success at repeating the checklist or steps that were taught to the participant. Figure 16. Single-case design review process for eligible study findings
107 Requirement 4: Consistent data collection procedures Data must be collected in the same manner for the intervention and comparison conditions. If no information is provided, the WWC assumes that data were collected consistently. In the context of SCDs, the reviewer should ensure that the data collection procedures were similar across conditions for a given case. Reviewers should look for details indicating that data were collected in different modes, with different timing, or by using different personnel in the different conditions. In terms of timing, the major concern is whether the data collection takes place at a different time of day between conditions. For instance, if all baseline data points are collected in the morning but the intervention data points all are collected in the afternoon, this would represent inconsistent data collection procedures. However, in many SCDs, the introduction of the intervention is staggered in a time-lagged fashion across participants in the design. Staggered introduction of an intervention that is an intentional element of the design does not represent an issue with inconsistent data collection procedures. Additional consideration: Independence of outcome measure The consideration for independence is unchanged for SCD designs. That is, in some outcome domains as specified in the Study Review Protocol, the WWC will consider whether the measure is independent of the intervention. Confounding factors A confounding factor occurs when a component of the research design is perfectly aligned with either the intervention or comparison condition, across all cases or phases of the experiment. A factor that is aligned with a particular case is not considered a confounding factor because any factor that is completely aligned with a single case will be present in all conditions of the study. The interventionist may be a confounding factor often observed in SCDs. Teachers, parents, or peers—collectively labeled interventionists—can administer the intervention to study cases. However, when study cases experience a different interventionist across baseline and intervention phases of the study, the study has a potential confounding factor. As it can sometimes be difficult to determine whether something is a confounding factor, the examples that follow describe situations for which the interventionist is and is not a confounding factor. Cases might have a different interventionist across the baseline and intervention phases, noted by underline in the examples below. • Example of a confounding factor. One teacher teaches all cases in the baseline condition, and a different teacher teaches all cases in the intervention condition. Baseline Intervention Case 1 Teacher 1 Teacher 2 Case 2 Teacher 1 Teacher 2 Case 3 Teacher 1 Teacher 2 • Example of a confounding factor. One teacher teaches all cases in the baseline condition, and that same teacher and another teacher (or trainer) teaches all cases in the intervention condition.
108 Baseline Intervention Case 1 Teacher 1 Teacher 1 + Teacher 2 Case 2 Teacher 1 Teacher 1 + Teacher 2 Case 3 Teacher 1 Teacher 1 + Teacher 2 There are similar-appearing circumstances that are not confounding factors. • A nonexample of a confounding factor: One teacher teaches all cases in both phases. Baseline Intervention Case 1 Teacher 1 Teacher 1 Case 2 Teacher 1 Teacher 1 Case 3 Teacher 1 Teacher 1 • Nonexample of a confounding factor: Multiple teachers teach different cases; teachers do or do not teach different phases. Baseline Intervention Case 1 Teacher 1 Teacher 1 Case 2 Teacher 2 Teacher 2 Case 3 Teacher 3 Teacher 3 Additional confounding factors include contextual or procedural changes between conditions that might affect outcome measurement and participant responding and are not a component of the intervention of interest. A nonexhaustive list of examples of confounding factors include shifts in experimental session length, changes in the quantity or quality of reinforcement provided to the participant, changes in the environment or setting (for example, one condition takes place in the classroom while another phase takes place on the playground), or changes in the social context that affect opportunities to respond. Evidence review process for eligible findings from SCDs To be considered potential evidence of an intervention’s effectiveness by the WWC, an SCD must meet four standards: data availability, researcher-manipulated independent variable, no residual treatment effects, and design assessment. To receive a research rating of Meets WWC Standards Without Reservations, findings based on some designs must meet requirements for limiting sources of bias. These standards are summarized in figure 16 and detailed in the following sections. Data availability SCD study authors need to provide raw data in graphical or tabular format for their findings to meet WWC standards. Graphical or tabular data must present the raw data that corresponds to the individual observation sessions. Summary data, such as the within-phase mean for each phase, are not sufficient to meet this requirement.
109 Data sharing in the form of plots is standard practice in SCD research. Sharing data allows other researchers to perform their own reanalysis in the form of visual analysis, and access to the raw data allows the WWC to assess whether the study meets WWC standards of internal validity for SCDs, as well as allow for effect size estimation when appropriate. If the data are not available in graphical or tabular format, then the study will receive a research rating of Does Not Meet WWC Standards.29 Researcher-manipulated independent variable The researcher must determine the time at which an individual case transitions between phases or conditions. Researcher control over the timing of the intervention is a crucial element of why these designs can be considered experimental designs and are potentially eligible to receive the WWC’s highest rating. To meet this requirement, there must be evidence that the independent variable was systematically manipulated by the researcher. Although the researcher may operate in consultation with other individuals involved in the conduct of the study, such as parents, teachers, or school administrators, the final choice of when the independent variable conditions change must rest with the researcher. Randomization designs or masked visual analysis designs are considered to have met this requirement, so long as the researcher made the final choices around the procedures or decision rules for phase transitions. If the study does not discuss who manipulated the independent variable, but there is no evidence that it was someone other than the researcher, then reviewers should assume that this standard was met. If there is evidence that someone other than the researcher manipulated the independent variable, then the finding will receive a research rating of Does Not Meet WWC Standards. Residual treatment effects Residual treatment effects are a potential confound in designs with more than one intervention. Alternating treatment designs and other SCDs with an intervening third condition are potentially subject to residual treatment effects. When there are two or more interventions in the intervention phase of an alternating treatment design, the reviewer must examine the study to ensure that there is limited risk of residual treatment effects. When a review team identifies an eligible alternating treatment design experiment that uses two or more interventions, the review team should ask the content expert to assess whether residual treatment effects are likely given the specific interventions or experimental conditions, the timing of the and length of observation sessions, the order of the interventions or experimental conditions, and the outcomes in the experiment. The review team can rely on previous approval of similar conditions and outcomes from the content expert, but the plausibility of residual effects should not be solely informed by the data reported in a given study. The review team will then assign the study for review and pass along the content expert determination to the reviewers. 29 When data are available only in graphical format, the WWC will extract tabular data from the plots. The WWC intends to explore ways of making the extracted tabular data available for researchers interested in performing confirmatory analyses or syntheses of findings from studies reviewed by the WWC. Residual treatment effect Residual treatment effect refers to the form of carryover where the effects of one intervention spill over into observation sessions for a separate intervention or experimental conditions being observed within the same design.
110 Reviewers should raise any additional concerns they have about residual treatment effects as part of their reviews. Reviewers should focus on the plausibility of residual treatment effects based on theoretical and contextual considerations given the research design and intervention characteristics but should not raise concerns based on data reported in the study. If the content expert and reviewer both agree that residual treatment effects likely exist, then the finding is rated Does Not Meet WWC Standards because the measures of effectiveness cannot be attributed solely to the intervention. If the content expert and reviewer disagree, then review team leadership should revisit the issue. If the content expert and reviewer both agree that residual treatment effects are unlikely, then the reviewer should complete the review assuming there are no residual treatment effects. Reversal/withdrawal, multiple baseline, and multiple probe designs generally have longer phases and a longer time between data points than alternating treatment designs. More time will pass between the noncontiguous phases that will be compared (for example, between the first B and second A in an ABCAB reversal/withdrawal design); this feature may make residual treatment effects less important even if they are present. If the reviewer and content expert agree that residual treatment effects are unlikely, or are unlikely to be meaningful, then the reviewer should work with the review team leadership and content experts to identify how best to proceed with the review, focusing only on the intervention of interest and the relevant comparison condition when assigning a research rating—that is, ignoring any third or fourth interventions. If a study finding is judged to have a reasonable likelihood of residual treatment effects, then the finding is rated Does Not Meet WWC Standards. Research design requirements The primary goal of the SCD research design requirements is to ensure that the study was designed in a way that allows for at least three demonstrations of an intervention effect at three different points in time with reasonable certainty that the observed data are sufficient to capture important information about the pattern of responding. The pattern of responding includes information such as the within-phase mean, the within-phase variability, and any increasing or decreasing trend that might be present. The three demonstrations criterion is based on professional convention (Horner et al., 2012). In practice, this means that there must be at least three phase changes between the two conditions being compared within a review, which occur at three different points in time. For reversal/withdrawal designs, this will be at least three phase changes within a case. Three phase changes requires that a case has at least four total phases. For a multiple baseline or multiple probe design, this would be at least three tiers with phase changes at three different times. This standard includes design-specific conventions regarding the number of phases and data points per phase required to meet the three demonstrations criterion. Specific variations of SCDs have additional requirements. These requirements are intended to ensure that the study is designed in a way to support at least three opportunities to demonstrate an intervention effect at three different points in time. When there are a sufficient number of opportunities to demonstrate an intervention effect, but a limited number of data points are available, a study will receive a lower rating. However, in some cases there are a small number
111 of data points but it is still possible to be reasonably certain that the observed data points capture all the necessary information about the phase. Any phase with three or more data points and no within-phase variability represents enough data that reviewers can be reasonably certain that additional data points would not provide additional information. Figure 17 provides some example baseline phases with both zero and low variability. Although data points at the scale minimum or scale maximum are not the only contexts in which there might be zero within-phase variability, these are likely the most common scenarios where zero within-phase variability will be observed. Panels A and B show plots with three data points at the scale minimum and scale maximum, respectively. There is no variability within each phase, and reviewers can be reasonably certain that if the researcher had gathered another data point, it would contain no additional information about the average level or variability within the phase. Panels C and D show examples where there is relatively little variability near the scale minimum or scale maximum. The data points within a phase are similar but do have some variability. There can be less confidence about what the exact value of an additional data point might be and how much variability there is within each phase. Throughout the rest of the WWC standards for SCDs, there are occasions where phases with three or more data points will allow a finding to be rated Meets WWC Standards Without Reservations despite otherwise not having enough data points to meet the described requirements. These occasions are described in the sections regarding treatment reversal/withdrawal designs and multiple baseline/multiple probe designs, and then in a later section on therapeutic baseline trends. When the data are available as a table in the study or from the study authors, reviewers should examine the table to observe whether short phases of three or more data points have exactly the same value for all data points, and therefore meet this requirement. When the raw data are not available as a table or from the study authors, a visual inspection that confirms unambiguous zero variance is sufficient to meet this requirement. In other words, if the reviewer believes the authors intended to convey that all the data points within phase had the exact same value, this is sufficient to meet the requirement.
112 Figure 17. Zero- and low-variability baseline examples Reversal/withdrawal designs (ABK) Phases. Findings from treatment reversal/withdrawal designs must have a minimum of two phases per condition to be eligible to be rated Meets WWC Standards Without Reservations or Meets WWC Standards With Reservations. In the simplest design that compares a baseline condition with an intervention condition, this will require four phases. Any case with fewer than four phases and at least two phases per condition will be eligible to receive a research rating of Does Not Meet WWC Standards. Data points per phase. For treatment reversal/withdrawal design findings to be eligible to be rated Meets WWC Standards Without Reservations, the first baseline phase must have at least six data points,30 and at least two phases per condition must have five or more data points per phase. In the simplest design that compares a baseline and intervention condition, this corresponds to an initial baseline phase with six or more observations, a treatment phase with five or more observations, a return-to-baseline or withdrawal phase with five observations, and a second treatment phase with five or more observations. Additionally, any phase with three or more data points and zero within-phase variability also will count toward the required phases for a finding to be eligible to be rated Meets WWC Standards Without Reservations, including initial baseline phase. 30 A small group of applied and methodological SCD experts agreed that requiring more data in the baseline phase to ensure that reviewers could make accurate judgments was a reasonable requirement.
113 For treatment reversal/withdrawal design findings to be eligible to be rated Meets WWC Standards With Reservations, two phases per condition must have three or more data points per phase. Findings that do not meet either set of requirements will receive a research rating of Does Not Meet WWC Standards. Figure 18 provides a simple reversal/withdrawal design example eligible to receive a research rating of Meets WWC Standards Without Reservations due to meeting the number of phases and number of data points per phase requirements for the reversal/withdrawal design standards. Each participant- or unit-outcome combination in a reversal/withdrawal design is a single experiment, and therefore it should receive separate ratings. Figure 18. Reversal/withdrawal design example Changing criterion designs. The reversal/withdrawal design standards can be applied to changing criterion designs, with a small modification. A changing criterion design is similar in structure to the reversal/withdrawal design, except that each phase change after the initial baseline to treatment phase represents a modified treatment phase with an incremental goal, or criterion, for the behavior of interest. Each baseline or intervention change or criterion change should be considered a phase change. As such, there should be at least three different criterion changes to establish three attempts to demonstrate an intervention effect. In some studies that use a changing criterion design, the researcher may reverse or change the criterion back to a prior level to further establish that the change in criterion was responsible for the outcomes observed on the dependent variable. This should be considered a phase change, as in the reversal/withdrawal design. Multiple baseline/multiple probe designs Phases. Multiple baseline designs must have a minimum of six phases split into two conditions for their findings to be rated Meets WWC Standards Without Reservations or Meets WWC Standards With Reservations. The simplest example of this design has three tiers stacked vertically. Each tier is made up of a baseline and treatment phase (in other words, an A to B comparison). Additionally, transitions from the baseline phase to the intervention phase must have at least three unique timings to ensure that there are three opportunities to demonstrate the
114 intervention effect at three different points in time. Findings with fewer than six phases will be rated Does Not Meet WWC Standards. Data points per phase. For findings from multiple baseline designs to be eligible to be rated Meets WWC Standards Without Reservations, the first baseline phase within each tier must have at least six data points. Additionally, all subsequent phases must have five or more data points per phase. Any phase with three or more data points and zero within-phase variability also will count toward the required phases for a finding to be eligible to be rated Meets WWC Standards Without Reservations, including the first baseline phase in each tier. For multiple baseline design findings to be eligible to be rated Meets WWC Standards With Reservations, three phases per condition must have three or more data points per phase. Findings from multiple baseline designs that do not meet either set of requirements will be rated Does Not Meet WWC Standards. Concurrence. The timing of the design’s implementation requires a degree of concurrence across the design. The concurrence requirement encompasses three elements: 1. Tiers must be organized to allow for vertical comparison. This means that time 1 for each tier must represent approximately the same point in time, time 2 for each tier must represent approximately the same point in time, and so on. Reviewers should assume this standard is met, unless authors provide evidence of nonconcurrence, such as describing the design as a nonconcurrent multiple baseline or graphing data in a way that suggests nonconcurrence. 2. All tiers must have data collected in the baseline phase prior to the introduction of intervention to any case. 3. Cases that have not yet received the intervention must have data at or after the time another case enters the intervention. oIf appropriate for the design, training phase data must be present. Some interventions require that the participant be trained in the intervention. The requirement for training will be discussed by the authors if training is necessary. Studies that do not discuss training need not meet the training data requirements. If the effect of the intervention is expected to be immediate at the onset of training, then data for the training phases must be present for every tier and can be considered part of the intervention. If the intervention effect is not expected until after the completion of the training, then tiers still in the baseline phase must continue baseline measurement at or after the time point when a preceding tier has the first intervention probe after completing training. This process prevents an overlap in the training/intervention phase for any two tiers and allows for cases that have begun to receive the full effect of intervention to be compared vertically with those cases still in the baseline. Tiers in multiple baseline designs A tier refers to a single row in a set of stacked rows in a multiple baseline design. In figure 19, tiers are cases. In that example, the comparisons between baseline and intervention cases can be made within and across cases. Tiers might also be outcomes or contexts within a single individual or case, and effect replication takes place purely within a single case.
115 Findings from any multiple baseline or multiple probe design that fail to meet the concurrence requirement will be rated Does Not Meet WWC Standards. Figure 19 provides an example of a multiple baseline design that is eligible to receive a research rating of Meets WWC Standards Without Reservations due to meeting the number of phases, number of data points per phase, and concurrence requirements for multiple baseline designs. Although the y-axis of the plot does not specify the exact date or time of the observations, the WWC generally assumes that authors have aligned their displays in a way that allows for vertical comparison, absent any evidence to the contrary. Figure 20 provides an example with evidence to the contrary. In multiple baseline and multiple probe designs, each stacked plot generally represents a single experiment. Each combination of stacked plot and outcome should receive a separate rating. Figure 19. Multiple baseline design example Figure 20 displays an example of a multiple baseline design that does not allow for vertical comparison and fails the requirement that all cases must have data in the baseline phase prior to the introduction of the intervention to any case. Although the first data points for Yolanda, Andre, and Amelia are all arranged in a stacked fashion, the actual timing of those data points are in five-day intervals from each other. Additionally, data points for Amelia do not begin until halfway through Yolanda’s treatment phase. This example would be rated Does Not Meet WWC Standards. Figure 20. Example violations of first and second concurrence requirements
116 Figure 21 displays an example of multiple baseline design that would not meet the third concurrence requirement. In this example, Andre’s baseline ends after time 6, prior to the onset of the intervention for Yolanda. No data allow for a vertical comparison to ensure that there is no change in Andre’s responses prior to the onset of intervention. The final data points in the baseline phase prior to the onset of the intervention are important for judging any change in the trajectory of the outcome data points, including the WWC’s baseline trend requirement. This example would be rated Does Not Meet WWC Standards. Figure 21. Example violation of the third concurrence requirement Figure 22 displays an example of a multiple baseline design with empty training phases. These empty training phases are not appropriate for interventions where the training is quick and impact is expected to be immediate. For those types of interventions, the training data would represent the beginning of the treatment phase from the WWC’s perspective and therefore are important to include as a part of the impact estimate. In those cases, this design would be rated Does Not Meet WWC Standards. For interventions that require a longer training period to have an impact, empty training phases are acceptable. The requirements for concurrence should ignore the training phase and focus on overlap between the baseline and treatment phases. In cases where the longer training is appropriate, this design would be eligible to receive a rating of Meet WWC Standards With Reservations. However, in cases where the training phases last as long or longer than the longest treatment phase in a design, the reviewer should consult a content expert to ensure that the training phases do not constitute an extra condition where data should be available. Figure 22. Example of empty training phases
117 Multiple probe design requirements. These designs are a special case of multiple baseline designs. Planned missing data is a key element of the multiple probe design and is the major difference between a multiple baseline design and a multiple probe design. Multiple probe designs must meet all the multiple baseline design requirements and additional criteria because baseline data points are intentionally missing.31 For multiple probe designs to meet WWC standards, the following must be true: • Initial preintervention data collection sessions must overlap. For findings to receive a research rating of Meets WWC Standards Without Reservations, each tier must have three data points in the first three sessions. For findings to receive a research rating of Meets WWC Standards With Reservations, there must be at least one session within the first three sessions where probe points overlap vertically for all tiers in the design. • Probe points must be available just prior to introducing the independent variable. Within the three sessions just prior to introducing the independent variable, the design must include three consecutive probe points for each case to be rated Meets WWC Standards Without Reservations and at least one probe point immediately preceding the onset of intervention for each case to be rated Meets WWC Standards With Reservations. • Each case not receiving the intervention must have a probe point in a session where another case either first receives the intervention or reaches a prespecified intervention criterion described by the researchers. o For designs with a training phase, when impacts are expected only after complete delivery of training, the “first receives the intervention” language should be interpreted as the time point when a case has the first intervention probe after completing their training. Findings from multiple probe designs that fail to meet any of these requirements in addition to the general multiple baseline design requirements will receive a research rating of Does Not Meet WWC Standards. Figure 23 displays an example of a multiple probe design that would be eligible to be rated Meets WWC Standards Without Reservations. The initial three data collection sessions have data for all cases. All cases have at least three data points directly before intervention begins for any other case. Each case still in the baseline has a data point when the other case first receives the intervention. The baseline phases are all phases with zero variability, and as a consequence three data points are enough data to be rated Meets WWC Standards Without Reservations. 31 Multiple baseline designs with unintentional missing data should not be reviewed under the multiple probe requirements. Reviewers should note any unplanned missing data in the study review guide.
118 Figure 23. Multiple probe design, example 1 Note that data collection after the onset of the intervention in a multiple probe design may be intermittent or continuous. The WWC has no specific requirements for the intervention phase other than the requirements for a minimum number of data points per phase. Figure 24 displays an example of a multiple probe design that would be potentially eligible for a research rating of Meets WWC Standards With Reservations. Each case has a single overlapping data point at time 1 and at time 3, but the study does not have the data point at time 2 for Andrew or Katherine. Each case has at least one data point in the three sessions prior to Thanaa receiving the intervention. Andrew and Katherine each have at least one data point in the three sessions prior to Andrew receiving the intervention. Katherine has at least one data point in the three sessions prior to receiving the intervention. Each case still in the baseline phase has a data point when another case enters the intervention. Figure 24. Multiple probe design, example 2
119 Alternating treatment designs Figure 25 displays an alternating treatment design example that would be eligible to be rated Meets WWC Standards Without Reservations due to meeting the data points per condition and contiguous data points requirement for alternating treatment designs. Figure 25. Alternating treatment design example Some alternating treatment designs will contain both a baseline phase and a phase that rapidly alternates between two or more conditions, and other designs will contain only a set of rapidly alternating conditions as seen in Figure 25. Reviews should ignore initial baseline phases included prior to the introduction of the experimental phase in alternating treatment designs, as a comparison between an initial baseline phase and one experimental condition from the rapidly alternating phase will not allow for three demonstrations of the intervention effect at three different points in time. An important consideration exists when designs include multiple intervention comparisons—for example, A versus B, A versus C, C versus B. The WWC considers each comparison between conditions as a separate contrast. Accordingly, each contrast should be reviewed for eligibility and research rating separately. Although the design refers to “alternating treatments,” the rapidly alternating phase can contain a business-as-usual condition. Contrasts containing a business-as-usual condition will most frequently be the findings of interest for the WWC. Data points per condition. Findings from alternating treatment designs must have at least five data points per condition to be rated Meets WWC Standards Without Reservations. Designs must have at least four data points per condition for their findings to be rated Meets WWC Standards With Reservations. Any findings based on fewer data points will result in the research rating of Does Not Meet WWC Standards. Contiguous data points. Within a phase involving the rapid alternation of treatments, there should be a maximum of two sequential data points of the same intervention condition without the interruption of another condition. Any comparison with more than two contiguous data points without the interruption of another condition shall receive a research rating of Does Not Meet WWC Standards.
120 There are two exceptions to the contiguous data points requirement. Some designs will continue to gather data on the intervention or condition deemed most successful after the completion of rapid alternation. More than two contiguous data points examining the most successful intervention after the rapid alternation ends will not be considered a violation of this requirement. Additionally, designs that use an unrestricted randomization procedure to assign condition order are exempt from the contiguous data points requirement. The design will still need to allow for three demonstrations of the intervention effect at three different points in time in order to be eligible to be rated Meets WWC Standards Without Reservations or Meets WWC Standards With Reservations. Other SCD designs SCDs that use methodology not currently described in the Handbook can still meet the WWC research design requirements. Review team leadership must document and use published professional conventions for the research design under review. For an SCD design not currently described in the Handbook to be rated Meets WWC Standards Without Reservations or Meets WWC Standards With Reservations, it must contain three attempts to demonstrate an intervention effect at three different points in time. Limited risk of bias This section is relevant for designs where the primary comparison is for the pattern of responding between separate phases, such as baseline and intervention phases. Of the designs for which the WWC has explicit standards, this includes treatment reversal/withdrawal designs, changing criterion designs, multiple baseline designs, and multiple probe designs. The only design with explicit standards that is not subject to the assessment of limited risk of bias is the alternating treatment design; the contrast of interest is within a rapidly alternating intervention phase rather than between separate phases. Any other designs not explicitly listed in the design standards but reviewed under a set of published professional conventions will be subject to an assessment of limited risk of bias if the design’s primary contrast of interest is a comparison between separate phases. This section is relevant only to those designs that are potentially eligible to receive a research rating of Meets WWC Standards Without Reservations after being reviewed under the data availability, independent variable, residual treatment effects, and design assessment standards. Designs that limit the risk of bias are eligible to receive a research rating of Meets WWC Standards Without Reservations. Designs with a potential risk of bias are eligible to be rated Meet WWC Standards With Reservations. Presently, the potential risk of bias that the WWC assesses is related to therapeutic baseline trend and a lack of reversibility. Baseline trend is an important consideration in designs where the primary comparison is between baseline phases and treatment phases, such as reversal/withdrawal designs, multiple baseline designs, or multiple probe designs. The presence of a trend in the direction of the expected treatment effect in the initial baseline phase(s)—that is, a therapeutic trend—is of particular concern. If there is notable improvement in the outcome across the initial baseline phase or in the data points just prior to the onset of an intervention phase for a case, or just prior to the onset of an intervention for a preceding case in designs like the multiple baseline design, then there is some ambiguity around whether intervention effects can be attributed solely to the intervention and not some intervening factor such as individual maturation or other events within a classroom or intervention context.
121 Reversibility is an important consideration in reversal/withdrawal designs where several baseline or business-as-usual phases are alternated with treatment phases. In the most credible forms of these designs, both the intervention and outcome must allow for the outcome to return to initial baseline levels when the intervention is withdrawn. If some form of learning takes place during the intervention, then outcomes assessed in the return-to-baseline phases may not fully return to initial baseline levels, and the contrasts involving those return-to-baseline phases will be attenuated. Incomplete reversibility does not mean that a study cannot serve as evidence for an intervention’s effectiveness; it simply may not be the highest quality evidence and therefore Meets WWC Standards With Reservations is the highest research rating that a finding with incomplete reversibility can receive. The WWC has identified the use of a quantitative nonoverlap measure as an appropriate method to assist with judgments of baseline trend and reversibility. Nonoverlap measures Nonoverlap measures were created to help researchers describe the proportion of data in an intervention phase that demonstrates improvement over a baseline phase. Research has shown that nonoverlap measures are broadly consistent with visual analytic judgments (Parker et al., 2014). Their integration is intended to allow for a review process that is broadly consistent with visual analytic judgments. Assessing nonoverlap involves examining the distribution of data points across phases. Nonoverlap of data points provides evidence that a change has occurred. For example, if behavior counts range from 1 to 4 in a baseline phase and from 5 to 9 in an intervention phase, then there is 100 percent nonoverlap in data points across phases. This result may be taken as evidence that behavior changed from baseline to intervention in the context of the SCD research designs previously described. While nonoverlap measures have traditionally been used to describe the effects of interventions, they also can be used to describe the similarity between two sets of data points. For instance, if there is no therapeutic trend in a baseline phase, then the data points at the end of the phase should be similar to the data points at the beginning of the phase. If a behavior is reversible, the pattern of responding in any withdrawal or return to baseline phases in a treatment reversal/withdrawal design should be similar to the initial baseline phase. Higher levels of nonoverlap are associated with less consistency between the data points at the beginning of the phase and the data points at the end of the phase, implying the presence of a trend in the data or incomplete reversibility. The WWC selected the nonoverlap of all pairs Parker and Vannest (2009) to help reviewers make judgments regarding baseline trend and reversibility because unlike many other nonoverlap indices, the magnitude of the index is not a function of the number of data points in a phase (Pustejovsky, 2019). The SCD standards contain benchmark values of the nonoverlap of all pairs for WWC reviewers to identify problematic instances of baseline trend and reversibility. These benchmarks represent the maximum acceptable values of the nonoverlap of pairs. Lower values represent a larger degree of overlap, where overlap is data points with values opposite the intended direction of the effect. Individual design types that use the nonoverlap of all pairs contain general guidance on the use of the nonoverlap of all pairs. Details of the calculation can be found in appendix I in the technical appendices.
122 Minimal therapeutic baseline trend For research designs in which the finding contrasts baseline phases and intervention phases, and that have findings eligible to receive a rating of Meets WWC Standards Without Reservations, reviewers should assess any initial baseline phases to ensure that there is minimal therapeutic trend. Designs with more than one baseline phase (such as a treatment reversal design) do not require assessment for baselines phases after the first. Reviewers should assess baseline trends comparing the last three data points with all other data points within the initial baseline phase. A nonoverlap of all pairs of .85 or smaller will be considered evidence of minimal baseline trends. Any baseline phase with at least three data points and zero within-phase variability will be assumed to have met this requirement. Any finding that fails to meet this requirement is still eligible to receive a research rating of Meets WWC Standards With Reservations. For multiple baseline designs or multiple probe designs, all baselines within the design will be subject to this requirement, and failure will cause the entire design to be rated Meets WWC Standards With Reservations. Designs with more than the minimum number of cases might still be eligible to be rated Meets WWC Standards Without Reservations if an eligible subset meets the baseline trend requirement, as described in the section regarding designs with extra cases below. Evidence of reversibility For research designs with return-to-baseline or withdrawal phases and findings that are eligible to receive a rating of Meets WWC Standards Without Reservations, reviewers should assess any return-to-baseline or withdrawal phases compared with the initial baseline phase to ensure that minimal reversibility was achieved. Simple multiple baseline designs or multiple probe designs without embedded reversals are not subject to this requirement, nor are alternating treatment designs. Reviewers should assess the reversibility of the outcomes using the nonoverlap of all pairs to compare the baseline and any return to baseline. A nonoverlap of all pairs of .85 or less will be taken as evidence of achieving at least minimal reversibility. Any finding that fails to meet this requirement is still eligible to receive a research rating of Meets WWC Standards With Reservations. Designs with extra cases/phases or combination designs, or designs with extra conditions Extra cases or phases. Reversal/withdrawal, multiple baseline, and multiple probe designs may have more than the minimum required number of phases, cases, or tiers required to meet standards. For example, a reversal/withdrawal design could have six phases (ABABAB), or a multiple baseline design could have four cases where each case has two phases. In general, as long as there are a sufficient number of phases, cases or tiers, and data points for a study finding to be rated Meets WWC Standards Without Reservations or Meets WWC Standards With Reservations to meet the minimum requirements for the design, additional phases, cases, or tiers with fewer than the required number of data points will not cause a study to receive a lower rating. In addition, any finding should receive the highest rating that any subset of its design is eligible for, and those subsets need not be sequential to qualify. There are two important caveats. First, the subset must still contain three opportunities to demonstrate an intervention effect at three different points in time. Second, nonsequential phases within a case should not be compared with each other, and so may not constitute a demonstration of an intervention’s effectiveness. Design subsets also must meet any other design-specific requirements for the design type contained in that subset.
123 Figure 26 displays an ABABAB reversal/withdrawal example with more than the minimum number of phases to meet standards. The first A phase contains six data points and the subsequent three phases contain five data points each, but the final two phases (the last AB pair) contain only two data points each. This study finding would still be eligible to receive a research rating of Meets WWC Standards Without Reservations because it contains enough information to potentially demonstrate the intervention effect at three different points in time within the first four phases. Figure 26. Treatment reversal design with extra phases that is rated Meets WWC Standards Without Reservations Figure 27 displays another example treatment reversal design with six phases. In contrast to the previous example, this finding would be rated Does Not Meet WWC Standards. Although it has four phases that meet the data point requirements for a rating of Meets WWC Standards Without Reservations, the two phases in the middle cannot be used as part of a phase transition. Nonsequential phases cannot serve as demonstrations of an intervention’s effect. Therefore, only two potential demonstrations of the intervention effect have a sufficient number of data points: one between the first and second phases and one between the fifth and sixth phases. Figure 27. Treatment reversal design with extra phases that is rated Does Not Meet WWC Standards
124 Figure 28 displays a multiple baseline design with four cases that all receive an intervention at staggered times. The first, second, and fourth case all have a sufficient number of data points to be rated Meets WWC Standards With Reservations. However, the third case dropped out after only two intervention data points were gathered. This finding would still be eligible to be rated Meets WWC Standards With Reservations because it contains enough information to demonstrate the intervention effect at three different points in time. Figure 28. Multiple baseline design with four cases rated Meets WWC Standards Without Reservations
125 Combination designs. Findings from combination designs (such as a multiple baseline with embedded reversals) should receive the highest possible rating that any subset of their design is eligible for. Figure 29 displays a multiple baseline design with three cases. The second and third cases only contain the traditional baseline and treatment phase, but the first case also contains an embedded reversal/withdrawal design. The second pair of phases in the first case are brief. These two phases contain only three data points each and therefore are at best eligible to receive a research rating of Meets WWC Standards With Reservations under the reversal/withdrawal design requirements. However, given that the subset of initial baseline and treatment phases for all three cases, when considered as a multiple baseline design, would be eligible to receive a research rating of Meets WWC Standards Without Reservations, the finding from that combination design should be rated Meets WWC Standards Without Reservations. Figure 29. Combination multiple baseline design with reversals that are rated Meets WWC Standards Without Reservations Extra conditions. Treatment reversal/withdrawal, multiple baseline designs, and alternating treatment designs may have more than one intervention condition. Unless otherwise specified by review team leadership, reviews should focus on contrasts comparing two conditions rather than reviewing three or more conditions at once.
Collepals.com Plagiarism Free Papers
Are you looking for custom essay writing service or even dissertation writing services? Just request for our write my paper service, and we'll match you with the best essay writer in your subject! With an exceptional team of professional academic experts in a wide range of subjects, we can guarantee you an unrivaled quality of custom-written papers.
Get ZERO PLAGIARISM, HUMAN WRITTEN ESSAYS
Why Hire Collepals.com writers to do your paper?
Quality- We are experienced and have access to ample research materials.
We write plagiarism Free Content
Confidential- We never share or sell your personal information to third parties.
Support-Chat with us today! We are always waiting to answer all your questions.
