Variance Reduction via Machine Learning Imputation with Auxiliary Data (Job Market Paper) pdf
Randomized control trials, also known as A/B tests, provide clean identification and unbiased estimates of causal effects. However, when the experimental sample size is small, the estimates can suffer from large sampling variance and statistical tests may lack power. This paper provides a method to improve estimation efficiency when an auxiliary non-experimental sample on observable characteristics from the same target population is available. The proposed estimator attains the semiparametric efficiency bound, which is newly derived in this paper under the two-sample setup, hence no regular estimator can achieve a lower asymptotic variance. The amount of variance reduction depends positively on 1) the size of the auxiliary sample and 2) how well the observable characteristics predict the potential outcomes. The latter motivates the use of high-dimensional data and machine learning tools in our efficient estimator. Following recent development on debiased machine learning, our estimator is asymptotically normal at the root-n rate, allowing construction of confidence intervals and hypothesis tests. Simulation results show that the estimator performs well in finite samples.
Work in Progress
Characterizing Complier under Multi-valued Treatment
In applied microeconomic studies with instrumental variables, different instruments often lead to different results. This project aims at explaining these differences and providing insights on treatment effect heterogeneity. The study revisits the interpretation of 2SLS (IV) estimator when the treatment is multi-valued and provides a method to identify and estimate the distribution of observable characteristics of the relevant complier groups.
Quantile Treatment Effect Estimation with High-dimensional Data (with Nengchieh Chang)
This project provides a doubly robust extension (Chernozhukov et el. 2018) of the semiparametric quantile treatment effect estimation discussed in Firpo (2007). Our proposed estimator allows researchers to use a rich set of machine learning methods in the first-step estimation, while still obtaining valid inferences. Researchers can include as many control variables as they would consider necessary, without worrying about the over-fitting problem that would occur in the traditional estimation methods. This paper complements Belloni et al. (2017), which provided a very general framework to discuss the estimation and inference of many different treatment effects when researchers apply machine learning methods.