Evaluating model selection criteria for nuisance models in causal inference: (when the true models are finite mixtures)
Independent thesis Advanced level (degree of Master (One Year)), 10 credits / 15 HE creditsStudent thesisAlternative title
Utvärdering av modellselektionskriterier för skattning av störningsparametrar inom kausal inferens : (När de sanna modellerna består av en blandning av ett ändligt antal fördelningar) (Swedish)
When using inverse probability weighting (IPW) and doubly robust (DR) estimators for estimating the causal effect we need to use nuisance models for estimating the propensity scores, and for the DR estimator also the outcome regression. These nuisance models are often created using a selection of covariates, higher order and interaction terms, which require some model selection criteria. In this paper we will study selection criteria and their performance in fitting nuisance models for semi-parametric estimation of the causal eect. This will be done as a simulation study. Apart from the usual simulation design using known parametric models for the true values of the outcomes and propensity scores we will also use finite mixture model simulation. From that we can evaluate how changing the design approach changes the characteristics of the nuisance models corresponding to the different criteria, and if we can still find the causal effect using them. In all simulations we found that the Bayesian Information Criteria generally creates the nuisance models with the least amounts of estimated parameters, especially for estimating the propensity scores where it rarely uses more than one covariate. Due to this the IPW estimations of the causal effect was generally more biased for that criteria than those made using other criteria. When using DR estimators the robustness of the estimator causes all criteria to create equally unbiased estimations with low variance. Using LASSO for model selection was generally the best as it created estimations with the lowest bias and variance. When using finite mixture models in the simulation design the differences between the estimations corresponding to the different criteria disappear. When we only use mixtures for the propensity scores we can still find the true causal effect, even for the IPW estimators that only use the propensity scores as a nuisance parameter. When applying finite mixture models to both the propensity scores and the outcome regressions we overestimate the causal effect for all criteria, and we see little to no difference between the estimators made using the dierent criteria. In that case we also could not find the true causal effect using a non-parametric matching estimator.
Place, publisher, year, edition, pages
Probability Theory and Statistics
IdentifiersURN: urn:nbn:se:umu:diva-122686OAI: oai:DiVA.org:umu-122686DiVA: diva2:940544