There are different reasons why experimenters may want to randomize their experiment at a region level. In some cases, treatments cannot be turned on or off at the individual level, therefore requiring randomization at a group level, for which regions can be a good candidate. In other cases, experimenters may worry about network effects or other types of spillovers within a geographic area, and opt to randomize at the region level to address these.
These types of experiments oftentimes call for randomization and analysis methods that can account for a relatively small set of randomization units (geo-locations or other clusters of individual ids) that are highly heterogeneous. The use of stratification within the randomization procedure and Synthetic Diff-in-Diffs in the analysis can help ensure that treated and control groups are comparable, a criteria which is not necessarily met when using traditional analyses like ANCOVA or covariate adjusted regressions in these settings. However, the types of geo-randomized experiments vary substantially both in terms of heterogeneity and numbers of clusters.
To this end, we build a simulation approach to compare the performance of Diff-in-Diff (DID) and Synthetic-Diff-in-Diff (SDID) estimators across experiment settings in terms of bias, mean-squared error and standard errors. We construct an imbalance metric based using mean deviations from parallel trends between the control and treatment groups across time. This allows us to gauge how the estimators perform across the metrics of interest as the observed imbalance in trends increases while maintaining all other experimental features constant (i.e., number of units, outcome metric, experimental dates).
Our findings are suggestive that although SDID exhibits lower variance, whereby coefficients are associated with lower standard errors (and correspondingly higher power), it produces biased results for a set of the experiment settings both in A/A tests as well as under homogenous 1% and 5% treatment effects. Under other data settings, SDID outperforms DID by resulting in similar coefficients centered around the ‘true’ estimates across the observed imbalance spectrum, while the DID estimates exhibit higher variance that grows as the observed imbalance increases. The simulation framework can be used by experimenters to understand whether a SDID exhibits bias in their specific experiment setting. In these cases, we suggest a data-driven approach for defining the regularization parameter for time-weights in the SDiD as a potential solution.
Performance of synthetic diff-in-diff models for geo-randomized experiments
2024
Research areas