Imputation

Imputations are used to proxy data points that are missing in the survey data as a result of item nonresponse. For instance, respondents will not always share information about their income and wealth. The problem of missing data can be addressed in two ways: Either one ignores the problem and evaluates only the answers of those households that completed the questionnaire, or one replaces the missing observations with imputed data.

Ignoring the problem leads to a bias in the estimations, as the observations missing for individual respondents do not adhere to a random pattern. For example, international evidence shows that more affluent households are more likely to refuse to answer questions on wealth than people who are not that well off (see e.g.Barceló, 2006). If we do not adjust the results for item nonresponse, they will not be representative. The average wealth of households will be underestimated because of the underrepresentation of more affluent households. Therefore, we imputed missing observations for the HFCS using an imputation procedure called multivariate imputation by chained equations.

Imputation means that missing data will be replaced with forecast data. The forecasts are based on regression models. To give an example, we can use the outstanding loans of households with a comparable degree of educational attainment, age, income and housing conditions to estimate the size of loans which remained unspecified. To identify the characteristics that are adequate for estimating missing data, we relied on a range of statistical criteria.

For detailed information on the imputation procedure used for the HFCS in Austria, see chapter 5 of the methodological notes on the HFCS for Austria.

Imputation

Downloads

Deutsch