Stats NZ has a new website.

For new releases go to

www.stats.govt.nz

As we transition to our new site, you'll still find some Stats NZ information here on this archive site.

  • Share this page to Facebook
  • Share this page to Twitter
  • Share this page to Google+
Methods

A framework for assessing accuracy

The concepts of coverage error and measurement error provide a framework for assessing the accuracy of data sources (Zhang, 2012).

Coverage describes the relationship between the ideal target population and the actual set of people present in a dataset. For each variable discussed here, the target population of interest is New Zealand residents of all ages.

Measurement errors cause a recorded response to differ from its true value. If these errors are not random they may result in a systematic bias. Measurement error may occur when administrative definitions, concepts, or questions do not align well with the statistical concept being measured. Measurement errors in both the census and administrative data may also be due to errors within the respective collection and processing systems, and may result in missing or incorrect information.

The ability to integrate information with other sources through linking the same units also affects accuracy. Linkage errors are of two types: links may be missed (eg if a person's name is recorded differently on different files); or two different people may be wrongly linked (eg if their names and dates of birth are very similar). Linkage errors may reduce the coverage of an administrative source (no information is available if links are not made when they should be), or they may introduce measurement error if the wrong people are linked together.

Evaluating the quality of administrative sources

We now describe the methods used to evaluate the quality of the information in administrative sources.

A brief description of each data source provides basic information on population coverage. The concepts and definitions used in the administrative data collections are compared with the relevant national standard and related classification. Ideally concepts and definitions should be consistent across collections and consistent with the standard.

For each variable we then compare the administrative data with 2013 Census data. A linked Census-IDI dataset (described below) provides the basis for all the data analysis. A high linkage rate provides a good basis for this comparison. No adjustment is made for any remaining bias due to differential linkage rates.

Results summarise coverage for the administrative source compared with the New Zealand resident population. We gain insights into measurement error from aggregate and individual-level analysis.

For the aggregate analysis we compare the total responses in the administrative source to total responses in the census. As coverage of each data source varies considerably, we restrict the comparisons to responses in each administrative data source that were linked to usual residents in the census.

Even close results at the aggregate level may be a result of classification differences balancing out. The linked dataset allows us to compare the values recorded for an individual in the administrative sources against those recorded for the same individual in census. The individual-level analysis helps to show what is driving the results seen in the aggregate comparisons. The individual-level comparisons are made for the group of people who had records in the IDI and the census, their records were linked together, and a value was recorded for that variable.

  • Share this page to Facebook
  • Share this page to Twitter
  • Share this page to Google+
Top
  • Share this page to Facebook
  • Share this page to Twitter
  • Share this page to Google+