Optimal sampling for design-based estimators of regression models in two-phase designs

Methodology
Authors

Tong Chen

Published

1 September 2022

Publication details

PhD Thesis-University of Auckland

Links

 

The two-phase design collects additional information on a subsample which is selected from the study cohort. It is a cost-effective sampling method when the covariates of interest are expensive to measure for every individual in the cohort. With considerate choices of stratification and phase-two sampling strategies, a two-phase design will be more efficient than simple random sampling. At the design stage, it is desirable to conduct the sampling with the optimal design which will end up with the most efficient estimations. We develop the optimal design for analysis via the IPW estimator. In order to approximate the optimal design, we propose to use a multiwave sampling framework and incorporate the whole cohort information. We show the design efficiency can be further improved using the multiwave sampling with informative priors. Generalized raking is a more efficient class of design-based estimators. We derive the optimal design for analysis via generalized raking estimators. We then compare it with the optimal design for analysis via the IPW estimator and other two-phase designs in measurement-error settings. We show the optimal design for analysis via the IPW estimator is not optimal for the generalized raking estimation but typically gives nearoptimal efficiency. It has previously been shown that semiparametric efficiency under two-phase sampling is not robust to contiguous model misspecification, if the target of inference is defined by a hypothetical analysis of complete data. In two-phase studies, the optimal design for the efficient estimator is often very different from that for design-based estimators. We show this design optimality can also be sensitive to contiguous model misspecification.