Adjusting for selective non-participation with re-contact data in the FINRISK 2012 survey


Aims: A common objective of epidemiological surveys is to provide population-level estimates of health indicators. Survey results tend to be biased under selective non-participation. One approach to bias reduction is to collect information about non-participants by contacting them again and asking them to fill in a questionnaire. This information is called re-contact data, and it allows to adjust the estimates for non-participation. Methods: We analyse data from the FINRISK 2012 survey, where re-contact data were collected. We assume that the respondents of the re-contact survey are similar to the remaining non-participants with respect to the health given their available background information. Validity of this assumption is evaluated based on the hospitalisation data obtained through record linkage of survey data to the administrative registers. Using this assumption and multiple imputation, we estimate the prevalences of daily smoking and heavy alcohol consumption and compare them to estimates obtained with a commonly used assumption that the participants represent the entire target group. Results: When adjusting for non-participation using re-contact data, higher prevalence estimates were observed compared to prevalence estimates based on participants only. Among men, the smoking prevalence estimate was 28.5% (23.2% for participants) and heavy alcohol consumption prevalence was 9.4% (6.8% for participants). Among women, smoking prevalence was 19% (16.5% for participants) and heavy alcohol consumption was 4.8% (3% for participants). Conclusions: The utilisation of re-contact data is a useful method to adjust for non-participation bias on population estimates in epidemiological surveys.

Scandinavian Journal of Public Health, 46*(7), pp. 758-766

Supplementary notes can be added here, including code and math.

selection bias smoking alcohol consumption missing data
Juho Kopra
Post doc of Statistics

My research interests include Bayesian statistical methods, applied statistics for problems with high societal impact.