Stats NZ has a new website.

For new releases go to

As we transition to our new site, you'll still find some Stats NZ information here on this archive site.

  • Share this page to Facebook
  • Share this page to Twitter
  • Share this page to Google+
Methods for examining the quality of ethnicity information

Understanding causes of error in ethnicity data

The concepts of coverage error and measurement error provide a framework for assessing the accuracy of data sources (Zhang, 2011).

Coverage describes the relationship between the ideal target population and the actual set of people present in a dataset. For the census ethnicity variable, the population of interest is all New Zealand residents. Aggregate-level comparisons are most useful in providing insight into differences in coverage.

Measurement errors cause a recorded response to differ from its true value. If these errors are not random they may result in a systematic bias. Measurement error may occur when administrative definitions, concepts, or questions do not align well with the statistical concept being measured. Measurement errors in both the census and administrative data may also be due to errors within the respective collection and processing systems, and may result in missing or incorrect information. The individual-level comparisons can inform our understanding of measurement error.

The ability to integrate information with other sources through linking the same units also affects accuracy. Linkage errors are of two types: links may be missed (eg if a person's name is recorded differently on different files); or two different people may be wrongly linked (eg if their names and dates of birth are very similar). Linkage errors may reduce the coverage of an administrative source (no information is available if links are not made when they should be), or they may introduce measurement error if the wrong people are linked together.

Evaluating the quality of administrative sources of ethnicity

This investigation uses the following methods to evaluate the quality of the ethnicity information in IDI.

Comparison of concepts and definitions

The concepts and definitions of ethnicity used in the IDI and its individual data collections are compared to the statistical standard for ethnicity. Ideally the concepts and definitions should be consistent across collections and consistent with the standard.

Comparison of aggregate counts

Aggregate comparisons are used to examine the coverage of the administrative sources, and to compare total responses for each administrative source with the census. Analysis is restricted to those individuals in the linked census-IDI dataset.

Comparison of individual-level information

The ethnicities recorded for an individual in the IDI are compared against those recorded for the same individual in census. These comparisons can only be made for the group of people who had records in the IDI and the census which were linked together, and for whom an ethnicity was recorded in both the administrative source in the IDI and the census.

Close agreement of responses in administrative data and the census is a strong suggestion that the measurement in both sources is good. However, when responses are different, it is harder to determine which is likely to be the correct response. There are several reasons why an individual might record different ethnicity responses in the census and the IDI, and not all indicate errors in one source. People can identify with different ethnic groups over time, or in different contexts. Because questions on different administrative forms can be slightly different, this may prompt different responses from a person, which are all correct from their point of view.

While erroneous linkages are kept to a minimum, linkage errors could explain a small proportion of cases where ethnicity information is found to be different between the census and the administrative sources in the IDI. Apart from birth registrations, which form part of the spine, two linkages are involved in the comparison of census ethnicity and ethnicity in administrative sources: the linkage between the census and the IDI spine, and between administrative sources and the IDI spine.

Treatment of ‘New Zealander’ response

For comparability with the estimated resident population and administrative sources, the ‘New Zealander’ response has been included in the ‘European’ category in this investigation.

In the standard classification ‘New Zealander’ is coded to ‘Other ethnicity’, and this approach is used in the 2013 Census. However, the official estimated resident population series codes the ‘New Zealander’ response to ‘European’.

On the whole, administrative sources in the IDI do not have ‘New Zealander’ as a response. Current usual practice in the health sector is to code ‘New Zealander’ to the ‘European’ category (Cormack & McLeod, 2010). This is likely to be similar across other administrative collections.

  • Share this page to Facebook
  • Share this page to Twitter
  • Share this page to Google+
  • Share this page to Facebook
  • Share this page to Twitter
  • Share this page to Google+