In this paper, I argue that in the presence of interactions among members of a population, statistical inference for heterogeneous treatment effects (HTEs) across pre-treatment variables is confounded since treatment effects may vary by pre-treatment variables, post-treatment variables that measure the exposure to peerโ€™s treatment statuses, or both.

For instance, in a job search assistance program, comparing the treatment effects of high school graduates with those of college graduates without controlling for the plausible differences in the fraction of treated persons in each group could lead to wrong inference for HTEs.

Figure 1 (on the right) from the paper is a graphical illustration of the possible sources of variations in treatment effects in a simple setting where the post-treatment variable is binary (ฮ  = {๐œ‹1, ๐œ‹2}) and the conditional average treatment effect is linear in a continuous pre-treatment variable ๐‘‹.

Motivated by this fact, I propose tests of the null hypothesis of (a) constant treatment effect (CTE) by treatment assignment for all pre-treatment variables values; and (b) CTE by the pre-treatment variables for all treatment assignment vectors. In addition, I recommend the multiple testing algorithm due Holm (1979) to disentangle the source of heterogeneity in the treatment effect. The proposed test statistics are sums of weighted L1-norm of the differences in consistent non-parametric kernel estimators of conditional average treatment effects that characterize the null hypotheses.

I show that the test statistics are asymptotically standard normal under the null hypotheses and have valid asymptotic sizes using a modern poissonization technique. Also, other theoretical results in the paper show that the tests are consistent against a broad range of fixed and local alternatives. Furthermore, I recommend a bootstrap method for small sample sizes and prove that it works.

To corroborate my theoretical results, I design Monte Carlo experiments to compute the empirical rejection probabilities of the proposed tests. Figures 2-3 on the right ( from the paper) present the empirical rejection probabilities of the tests (at three conventional levels of significance) when the null is true ( i.e., ๐›ฝ2 = 0 ) and false.

Finally, to illustrate the usage of the proposed tests, I revisit the Chinese weather insurance policy data set in Cai et al. (2015) and test for CTE by a post-treatment variable (fraction of treated units per village) and a pre-treatment variable ( the fraction of household income from rice production ). I find no evidence of heterogeneous treatment effect by both the pre-treatment and post-treatment variables.

[Paper, Slides, Replication Code]