A recent study by Wolf and Harbatkin examined the difference in terms of effect sizes between types of measures. To be considered in this review, the studies had to be carried out on reading, STEM, or behavior topics using randomized or quasi-experimental designs and be part of the What Works Clearinghouse repository. A total of 373 studies with 1553 effect sizes were included in the review.
The measures used in each study were then classified in four categories:
- Independent broad: when the measure was not created by the same researchers/developers who conducted the study or designed the program and was intended to evaluate student achievement in a subject.
- Independent narrow: similar to the previous one but included measures intended to evaluate specific elements of a subject area.
- Non-independent developer: when the measure was created by the developer of the program under evaluation.
- Non-independent researcher: when the measure was created by the authors of the study.
Most of the measures used were independent narrow (42%), followed by non-independent researcher (30%), independent broad (22%), and the rest (5%) non-independent developer. The results showed a large difference between effect sizes of independent and non-independent measures. Broad measures had a mean effect size of +0.10, narrow measures of +0.17, researcher measures of +0.38, and developer measures of +0.41. The differences in effect sizes were similar also when only studies with at least one independent and one non-independent measures were considered.
A possible explanation from the authors is that non-independent measures capture different constructs compared to independent measures with minimum overlap. A researcher who creates a measure to evaluate the efficacy of a program could create it completely aligned with the program, while standardized tests are completely independent from the intervention. The authors concluded that effectiveness studies to inform practitioners and policy makers should use independent measures.
Source: Wolf, B., & Harbatkin, E. (2022). Making sense of effect sizes: Systematic differences in intervention effect sizes by outcome measure type. Journal of Research on Educational Effectiveness, 0(0), 1–28. https://doi.org/10.1080/19345747.2022.2071364