Glossary: Research & Analysis Terminology
The educational outcome (e.g., test scores, end-of-course grades, graduation rate) of interest in an intervention or evaluation.
Any criterion, outcome, or response variable used to measure an educational outcome of interest. Outcomes may involve typical cognitive measures (e.g., test scores, gradebook data), noncognitive measures (e.g., self-esteem, critical thinking, 21st century skills, persistence), or alternative educational outcomes such as attendance, course retention, and graduation rate. The achievement metric used for an educational intervention should be the educational outcome that the intervention is purported to improve, and is the ultimate outcome of interest for any given study.
The degree to which two or more groups are similar for the purpose of comparison.
A testable assumption about the comparability of the treatment group and the control group. When comparing the effects of an intervention on two or more groups, it is assumed that the groups are equivalent prior to the intervention (i.e., at baseline) so that any subsequent differences can be attributed to the intervention. Establishing baseline equivalence enables one to isolate the effect on the treatment, while ruling out confounding factors and extraneous influences. An example of a factor that could contribute to a lack of baseline equivalence would be an imbalance across study groups of low and high performers.
A method of dividing subjects (e.g., students, schools) into meaningful groups based on the degree to which they share common characteristics (e.g., product usage).
Cluster analysis (or clustering) is a common method of dividing data into conceptually meaningful groups that share common characteristics. This statistical technique is used to group a set of units (e.g., students, schools) such that units in the same cluster (or group) are more similar to each other with regard to an identified criterion (e.g., usage) than they are to those in other clusters.
The group of subjects in a study who either does not receive a treatment or who receives a different treatment, and is thus compared to the group of subjects who did receive the treatment (i.e., the treatment group).
The group of subjects in a study or intervention who either does not receive the treatment or who receives an alternative treatment. This group is then compared to the group who receives the treatment in order to compare the impact of the intervention. (See related term: control group.)
The study group (e.g., treatment group, control group) to which the subject belongs.
The study group to which the subject belongs, which is typically either the treatment group (or experimental group) or the control group (or comparison group). Experimental conditions can also take the form of multiple levels of the treatment, in which case the different levels are examined to determine the extent to which each influences the outcome(s) of interest.
A range of values that one can conclude with a specified level of confidence that the true value lies within, which indicates the level of certainty one can have in the results.
A range of values estimated from a sample that is likely to contain the true value from the population, which indicates the amount of certainty associated with the sample estimate. For example, if the estimated effect in the sample is between 0.40 and 0.60 with a 90% level of confidence, then one can conclude that 9 out of 10 intervals from similar samples will contain the true population value.
The degree of certainty one can have that the estimated confidence interval contains the true population value.
The frequency with which observed confidence intervals contain the true population parameter. Calculated as 1 minus the significance criterion (ɑ), and set by the researcher, the confidence level indicates the percentage of all possible samples from a population that can be expected to include the true population parameter. If confidence intervals are estimated in multiple experiments on the same population, the confidence level would be the proportion of those intervals that contain the true population value. For example, a 90% confidence level would indicate that 90% of the confidence intervals from a given set of samples would include the true population parameter.
An extraneous factor that influences the variables of interest in a study, and therefore must be accounted for to ensure validity.
A factor or variable in a study that accounts for at least some of the relationship between the independent and dependent variable. Failure to control for or eliminate the influence of potential confounding variables limits the reliability and validity of the study. For example, when examining whether students who use a product (i.e., the treatment group) demonstrate greater gains in achievement than students who do not use a product (i.e., the control group), a confounding factor might be the average student ability of each group.
The group of subjects for whom treatment is withheld so that they can be compared to the treatment group to determine whether the intervention had an effect.
The group of subjects in a study or intervention who, by design, does not receive the intervention. This group is then compared to a treatment group to determine the impact of the intervention. Typically, subjects are randomly assigned to the control group, whereas randomization is not required to form comparison groups. (See related term: comparison group.)
A study in which a treatment group is compared to a control group to determine the efficacy of an intervention.
A type of study in which observations made during the intervention are compared to a standard, called the control. The control may be observations of subjects in the same study for whom the intervention was withheld, or it may consist of observations from outside the study (e.g., from an earlier trial).
A measure of the relationship (magnitude and direction) between two or more variables.
A statistical index of the interdependence between multiple variables of interest — the extent to which one variable changes in relation to another variable. The correlation coefficient indexes the direction and the magnitude of the statistical relationship between two or more variables. The index can range from -1.00 (inverse relationship) to +1.00 (positive relationship), with 0 being indicative of no relationship.
The relative effectiveness of a treatment or intervention compared to its cost.
Cost-effectiveness analysis consists of a set of techniques that allows one to compare the costs (direct and indirect) of a treatment or intervention to its effectiveness. Typically the cost-effectiveness analysis will result in a value that can be placed somewhere along continua within quadrants ranging from “high cost, low effectiveness” to “low cost, high effectiveness.”
A variable that may have an impact on the study’s outcome.
Covariates are factors that have the potential to influence the outcomes of a study. A common assumption is that covariate levels are identical for units (e.g., students) in the study groups (e.g., treatment group and control group), which allows one to infer that differences between groups are attributable to the treatment or intervention. When covariates are of interest, subgroup analysis can be conducted to determine their effect of the study outcomes. When covariates are extraneous factors, statistically controlling for them allows one to hold constant (or remove) their influence and thus rule out possible confounding factors. Common covariates include socioeconomic status, gender, grade, prior achievement, and school locale.
A variable that is explained by a study’s independent variable(s) and therefore represents the outcome of interest
A factor or phenomena that represents the outcome under investigation, and whose variation is being studied to determine whether predictors account for that variation. Dependent variables are also referred to as response variables, outcome variables, or explained variables.
An estimation of the extent to which an educational intervention met its objectives or had its intended effect.
The systematic investigation of an ongoing or completed educational intervention, with the aim of determining the extent to which the objectives of the intervention were accomplished. Specific methodologies can help determine the effectiveness and impact of an intervention, as well as its utility and cost-effectiveness.
A statistical estimate of the difference between two or more groups, which is often presented in a standardized way so that one can determine the magnitude of the difference.
A statistical technique for quantifying the difference between two or more groups. The effect size provides evidence about the impact of a given intervention by indexing the magnitude of the treatment’s effect in a standardized unit, which allows the results to be compared across numerous contexts. Effect sizes can take multiple forms, with a common form being the standardized mean difference (e.g., Hedge’s g) between the contrasted groups. When examining a single group, a different form of effect size (e.g., correlation, regression) is used, which indexes the interdependency between the explanatory variable (e.g., edtech usage) and the response variable (e.g., academic achievement).
A design that allows one to study the effect of multiple factors, and the interaction among those factors, on an outcome of interest.
A study design in which study groups receive one of several combinations of interventions. Factorial designs allow one to study the effect of multiple factors on the educational outcome of interest, as well as the effects of interactions between factors on the outcome. For example, one can investigate the extent to which combinations of product or suites of products are effective at improving an educational outcome of interest.
Specific measure of effect size based on the standardized mean difference between two or more groups (e.g., treatment and control groups).
A specific measure of effect size that quantifies the difference between two or more groups. Hedge’s g is a standardized metric that provides evidence about the impact of a given intervention by indexing the magnitude of the treatment’s effect. Hedge’s g is considered a corrected form of other commonly used effect sizes because it is more robust to the effects of smaller samples.
A variable that is examined to determine the extent to which it influences the dependent variable (i.e., the study’s outcome).
A factor or phenomenon that is theorized to influence another associated factor or phenomenon (i.e., the dependent variable). Typically the independent variable is manipulated to determine the extent to which different levels of the independent variable predict, or relate to, change in the dependent variable. Independent variables are also referred to as predictor variables or explanatory variables.
The process of applying a treatment (e.g., educational technology, new curriculum) to study whether the treatment is effective.
The treatment, process, or action that is the focus of a study. Interventions may include educational technologies, classroom activities, digital learning tools, pedagogical approaches, teaching practices, and other stimuli or phenomena that are under investigation.
The geographic location of the school, particularly with regard to where it falls on the spectrum from urban to rural.
Urban-centric locale code system that classifies schools into four major types: city, suburban, town, or rural.
Margin of Error
A value that is subtracted from and added to your estimated effect size, which provides some leeway so that you can be confident in your effect while accounting for sampling error or measurement error.
A statistical approximation of the amount of sampling error in a study’s effect size, which indicates the likelihood that the estimated effect from a sample accurately represents the true effect one would find if the entire population was examined. The larger the margin of error, the less confidence one can have that the study’s estimated effect is close to the population’s true effect. Margin of error is seen as half the width (i.e., the radius) of the confidence interval, which is the set of values on either side of the sample estimate, and is calculated using sample size, sample proportion, and a z-value derived from an established confidence interval.
Designing study groups to be equivalent for the sake of valid comparison.
A set of statistical techniques that allows one to utilize a given set of covariates to identify matched sets of subjects from the study groups. Subjects are matched (and subsequently compared) when they have roughly equal profiles on the characteristics or attributes measured by the covariates (e.g., gender, ethnicity, previous performance). Theoretically, the matching procedure will result in study groups that are approximately equivalent, which lessens the likelihood that extraneous or confounding factors are causing the treatment effects.
Statistical models that examine data at multiple levels to determine how factors from each unit of analysis (e.g., student, class, school) relate to the outcome of interest.
Statistical models that account for nested or clustered data structures by running analyses at multiple levels. Multilevel models are particularly appropriate for studies that have unit-level data (e.g., students) organized into groups (e.g., classes or schools). Multilevel models allow one to tease apart the various factors that impact a given outcome. For example, when examining student achievement, multilevel modeling would allow one to examine student-level (e.g., demographics, educational aptitude), class-level (e.g., instructor quality, class morale), and school-level (e.g., school leadership, technology infrastructure) factors that may influence student performance on the achievement outcome. Further, in the previous example, the analysis would allow one to determine what percentage of the variance in the achievement outcome is attributable to each level (e.g., student characteristics account for 70% of the variance while the other 30% is split evenly between classroom and school factors). Multilevel models are also referred to as hierarchical models, nested data models, mixed models, or random-effects models.
Scores on the study’s outcome that are taken after the start of the intervention.
Scores on the outcome variable (or achievement metric, response variable, independent variable, etc.) that are taken after the start of the intervention. A single posttest measure is needed for a summative evaluation (i.e., evaluation focused on the outcome of an intervention), but multiple consecutive posttest scores are required for a formative evaluation (i.e., evaluation focused on development throughout the intervention).
The ability of a study to detect a true effect (e.g., a difference between groups or a correlation between variables); for example, if at the end of an intervention the treatment group differs from the control group, statistical power is the probability that the study is able to correctly detect that difference.
The probability of correctly accepting the alternative hypothesis (and rejecting the null hypothesis) — the ability of a study to correctly detect an effect when a real effect actually exists. When statistical power is high, the study is more sensitive to the impacts of an intervention. Alternatively, when statistical power is low, the study may be unable to detect the effects of an intervention. Statistical power is influenced by (a) the level of confidence one wants to have in their estimate, (b) the magnitude of the effect one is trying to detect (larger effects are easier to detect), and (c) the sample size. Power ranges from 0 to 1.00, with experts suggesting .80 as a standard for sufficient power.
Scores on the study’s outcome taken prior to the start of the intervention.
Scores on the outcome variable (or achievement metric, response variable, independent variable, etc.) that are taken prior to the start of the intervention. The pretest provides a baseline to which subsequent scores are compared in order to determine the influence of the intervention.
Assigning subjects to study conditions using randomization, such that each subject has an equal chance of being assigned to the control or treatment group.
Technique for assigning subjects to study conditions (e.g., treatment and control groups) using randomization, such that each subject has an equal chance of being in a given study condition. Random assignment is a necessary condition for a true experimental design (e.g., randomized controlled trial), and it increases internal validity of the study by helping assure that different study groups are equivalent prior to the intervention (i.e., baseline equivalence). When random assignment is not feasible, specific measures can be taken to test for baseline equivalence and help remove the influence of group differences.
Randomized Controlled Trial
A study in which subjects are randomly assigned to treatment and control groups; the effect of the intervention is determined based on post-intervention differences between the groups.
A study design in which subjects (e.g., students, teachers) are randomly assigned to a treatment group or control group. The treatment is administered to the treatment group and the treatment is withheld from the control group. In examining the results of the study or intervention, the only expected difference between the groups is the outcome variable of interest (e.g., the achievement metric or educational outcome). A key element of randomized controlled trials is the randomization, which allows one to assume groups are equivalent on all variables except for the treatment.
The amount of exposure to a given treatment that is suggested in order for it to be effective.
The exposure (amount or frequency) to a treatment that is suggested in order for it to be efficacious. For example, EdTech Product A may recommend 10 modules completed per week, or EdTech Product B may recommend 50 minutes of product engagement per day. Recommended dosage is also sometimes referred to as recommended usage, prescribed dosage/usage, or dosage/usage recommendation.
A group that results from dividing a sample into roughly equal subgroups; for example, quartiles would be four roughly equal groups and quintiles would be five roughly equal groups.
Values that partition a finite set of values into a subset of groups that are roughly equal in size. Any number of quantiles can be determined for a set of values, with common quantiles being terciles (three groups), quartiles (four groups), and quintiles (five groups). For example, partitioning a set of values into quintiles would results in five roughly equal groups, where approximately 20% of the full set of values falls within each quintile.
A research design that satisfies most conditions of an experimental design, but that lacks random assignment of subjects to treatment and control groups.
A research design with methods and procedures that satisfy most conditions of a true experimental design, but lacks random assignment of subjects to study conditions. Because random assignment is often impractical and sometimes impossible, quasi-experimental designs are implemented frequently in scientific research. With the appropriate methodology, quasi-experimental designs are highly effective and addressing research inquiries. Multiple types of designs fall under the category of quasi-experimental research, including the nonequivalent groups design and the regression discontinuity design.
The number of subjects (e.g., students, teachers) included in your study or intervention.
The number of units (e.g., students, teachers) in your pool of subjects that are included in your study. In the event that you have multiple subsamples, the total sample size is the sum of the subsamples — for example, if you have 250 in your control group (nC = 250) and 250 in your treatment group (nT = 250), then your total sample size is 500 (N = 500). Sample size is an important aspect of external validity and generalizability, and is therefore a key feature of any study in which the goal is to make inferences about the population from which the sample was drawn.
The process of selecting subjects (e.g., students, teachers) to participate in your study or intervention.
The process of selecting subjects (e.g., students, educators, schools) from a population of interest. The degree to which one can generalize results from a study depends on how representative the sample is of the population from which it was drawn. There are different types of sampling methods, with probability sampling (e.g., simple random sampling, stratified random sampling, systematic random sampling) being more rigorous than nonprobability sampling (e.g., convenience sampling, purposive sampling).
The group of subjects in a study who receive a treatment, and is thus compared to the group of subjects who did not receive the treatment.
The group of subjects in a study or intervention who receives the treatment, and is thus compared to a group (or groups) of subjects who either does not receive the intervention (i.e., control group) or receives a different intervention. Also referred to as the experimental group or intervention group.
A measure of the extent to which the subjects used the product or intervention.
A measure of the degree to which the subject (e.g., student) used or was exposed to the treatment or intervention. For example, the usage metric for an educational technology product could refer to the number of times the student logged in, the time spent using the product, the number of modules completed, or the percentage of the syllabus completed.
Any measurable factor or phenomenon, such as a characteristic, attribute, or educational outcome.
Any factor or phenomenon that can take on a logical set of values, with each value representing an attribute of the subject being measured. For example, the variable “gender” can take on multiple values (e.g., male, female, other). There are many types of variables, including independent variables, dependent variables, mediational variables, and moderating variables.