Stats NZ has a new website.

For new releases go to

As we transition to our new site, you'll still find some Stats NZ information here on this archive site.

  • Share this page to Facebook
  • Share this page to Twitter
  • Share this page to Google+
Data sources

This chapter describes the data sources used in this investigation: the New Zealand Census of Population and Dwellings, the relevant administrative sources in the IDI, and the linked Census-IDI data.

New Zealand Census of Population and Dwellings

The Census of Population and Dwellings is the official count of people and dwellings in New Zealand. It provides a snapshot of our society at a point in time and tells the story of social and economic change in New Zealand. Census has a wide range of uses within and outside government. The latest census was held in March 2013.

The census aims to count everyone who is in New Zealand on census night. Overseas visitors are included in the census, while New Zealand residents who are not in New Zealand on census night are not included.

For this investigation we are only interested in New Zealand residents, not those visiting New Zealand temporarily on census night.

In the 2013 Census the net undercount varied by ethnic group. The highest undercount was for Māori (6.1 percent), followed by Pacific (4.8 percent), Asian (3.0 percent), and European (1.9 percent) (Statistics New Zealand 2014a).

Ethnicity information in the census

The census uses the statistical standard and classification for ethnicity described above.

Ethnicity is a ‘foremost’ variable in the census, which means that it is managed to produce information of highest quality. The non-response rate for ethnicity for usual residents who returned a form in the 2013 Census was 0.7 percent. If the substitute forms created to account for people who did not fill out a form are included, the non-response to ethnicity was 5.4 percent.

Figure 1 shows the ethnicity question for the 2013 Census. Up to six responses per person are recorded.

Figure 1

The ethnicity question in the 2013 Census

Image, The ethnicity question in the 2013 Census.

Integrated Data Infrastructure (IDI)

Statistics NZ developed the IDI as an environment in which to link multiple data sources in a systematic and secure way. It was developed to produce official statistics outputs and to allow Statistics NZ staff and external researchers to conduct policy evaluation and research on people’s transitions and outcomes. The IDI contains administrative and survey datasets, linked at the individual level. The IDI continues to change as new datasets are added. This section describes the structure and content of the IDI in May 2015.

The structure of the IDI is shown in figure 2, and can be described as a central ‘spine’ to which a series of data collections are linked. The target population for the spine is all individuals who have ever been residents of New Zealand.

Three data sources are linked together probabilistically to create the spine:

  • a list of all IRD numbers that have been issued by Inland Revenue (IR)
  • a list of all births registered in New Zealand since 1920
  • a list of all visas granted to migrants from 1997 (excluding visitor and transit visas).

Other data sources are linked to the IDI spine, and cover a wide range of subject areas. Statistics New Zealand, 2014b describes the linking methodology. Priority is placed on obtaining a high precision rate, ie minimising creating erroneous links, with the trade-off that more correct links may be missed. In practice, linkages are designed so that under 2 percent of links made are erroneous.

The IDI also contains summary tables that provide core information about individuals (age, sex, ethnicity, geographic location) summarised from across the available data sources.

Figure 2

Structure of the Integrated Data Infrastructure (IDI) in May 2015

Image, Structure of the Integrated Data Infrastructure (IDI) in May 2015.

Ethnicity information in the IDI

Ethnicity information in the IDI is contained within data collections from several government agencies. The dataset descriptions below are primarily based on Cormack (2010) and Cormack & McLeod (2010), which provide a thorough background to official collections of ethnicity.

Accident Compensation Corporation (ACC)

ACC is a Crown entity set up to deliver New Zealand's personal no-fault injury insurance scheme as set out in the Accident Compensation Act 2001.

Ethnicity data collected by ACC is used to produce injury statistics and to monitor access to the services provided by ACC. It has been collected since 1997, although at that time only one ethnicity was collected, and this collection was only done for a limited number of claims. Since 2001, ACC records up to three ethnicities, using level 2 of the standard classification of ethnicity from the 1996 ethnic standard. The question asked on the form is not standard, and includes a tick box option for ‘I’d prefer not to say’, which approximately 7 percent of respondents tick (Cormack & McLeod, 2010).

The ACC data within the IDI at May 2015 includes only claims made for work-related injuries.

Ministry of Education (MoE)

MoE collects information on ethnicity from providers of early childhood, primary, secondary, and tertiary education. This information is generally collected on enrolment forms and is used to produce a range of information and statistics (eg student participation for different ethnicities).

MoE uses Statistics NZ's definition of ethnicity, and since 2007, has recorded ethnicity as a numeric code using level 3 of the standard classification of ethnicity from the 2005 Ethnic Standard. All enrolment forms should allow students to identify with up to three ethnic groups; however, the Ministry requires some data providers to report a student as being in one ethnic group only. MoE uses Statistics NZ’s prioritisation method outlined in the 1996 ethnic standard to decide which ethnic group to use when a student identifies with more than one ethnic group.

The ethnicity question(s) on enrolment forms can differ between providers. Although the ministry provides guidelines on their website, it is likely that questions are not consistent with each other or with the census question.

Ministry of Health (MoH)

Information about ethnicity has been collected for several years in the health sector, with varying degrees of standardisation and completeness. Several key collections hold ethnicity, including the National Health Index (NHI), and several registries/databases such as the New Zealand Cancer Registry (NZCR), and the National Minimum Dataset (NMD). Ethnicity information is usually collected during contact with a health service or health provider, which can affect the quality and completeness of ethnicity information in the key collections and databases.

Since 1996, MoH has aligned its collection of ethnicity with Statistics NZ’s approach with the key collections holding at least one ethnicity for each individual (mandatory, ‘principal’ ethnicity), and having the ability to hold up to three ethnicities. Before 1996, only one was recorded. The introduction of the Ethnicity Data Protocols for the Health and Disability Sector in 2004 was a significant development for the health sector. The protocols provided guidance for standardising data collection and outputs across the health and disability sector.

Since 2008, MoH has aligned the health sector with the Statistics NZ standard classification of ethnicity from the 2005 ethnic standard, including the use of consistent level 1 codes. Ethnicity data in the NHI collection is recorded at level 2 of the Statistics NZ standard classification of ethnicity from the 2005 ethnic standard, and up to three ethnic groups are recorded per individual.

The MoH data in the IDI includes several different tables holding ethnicity information. In this study we used the combined NHI dataset, which is a unified national person list compiled by MoH. Te Rōpū Rangahau Hauora a Eru Pōmare, based at University of Otago, Wellington, has published a series of reports and discussion papers about ethnicity data in New Zealand with a particular focus on the health sector see Publications Te Rōpū Rangahau Hauora a Eru Pōmare. These give a good summary of different ethnicity collections in the health sector and the quality of that information.

Ministry of Social Development (MSD)

MSD collects ethnicity information for individuals obtaining Work and Income services (benefits). Ethnicity can be collected on application forms, or through other interactions with Work and Income (eg in person, online, or through call centres). However, it is not a compulsory field because it is not related to entitlement or eligibility for assistance. Ethnicity information is needed, however, to understand how access to benefits and social welfare is related to disability, access to health care, and health outcomes for Māori.

Ethnicity data has been collected since 1991 by MSD; however, they have used several different systems and classifications. Information in the IDI is available since 1993. Since the late 1990s, the collection of ethnicity information has been more consistent. This improvement was mainly due to the introduction of the SOLO system by Work and Income. This system is used to record information about job seekers and the provision of employment services and allows for individuals to identify with up to three ethnic groups at level 3 of the classification.

The different collections across MSD vary in their adherence to the statistical standard. For example, the question used to collect ethnicity on application forms for financial assistance (benefits) varies – both in the question and the categories used for responses. The voluntary nature of the question and the variability of questions are likely to affect the quality of the ethnicity data collected by MSD.

Department of Internal Affairs (DIA)

The Department of Internal Affairs is responsible for birth registrations, and records go back as far as the 19th century. Until 1962, separate registers were kept for Māori births. The Māori birth register included tribe, residence, and iwi details completed by the parents; however, for the most part these fields have not been digitised.

Between 1962 and September 1995, information was collected on “the degree of Māori or Pacific Island blood and the tribe or island of the newborn's mother and father” (Statistics NZ, 2015). Parents who were not of Māori or Pacific Island descent were not asked to provide any ethnicity information. A new birth registration form was introduced in September 1995. It included an ethnicity question consistent with the concept of ethnic self-identification. The registration form includes ethnicity questions for the mother, father, and child. This form has since been updated to align with the 2005 ethnicity standard.

Since 1998, birth registrations have been recorded digitally and considerable effort has been put into response rates and data quality.

Death registrations also contain ethnicity information, but they were not part of this study.

Statistics NZ survey collections

Some of Statistics NZ’s household survey collections are also included in the IDI. Ethnicity information in these surveys uses the ethnicity standard and is typically very good quality, but the number of people covered by these surveys is relatively small compared with the administrative sources. For this reason, they were not investigated as individual sources in this paper.

Personal details table

Within the IDI, business rules are applied to standardise the ethnic codes received from each agency.

The six level 1 categories from each selected administrative source are summarised in a ‘personal details table’ for each individual. Ministry of Justice data is excluded due to quality concerns. Some individuals do not have any ethnicity recorded in the IDI – for example, if they have not interacted with an agency that collects ethnicity.

As a result of this process, an individual’s ethnicity information in the personal details table is a combination of the original responses given to separate agencies, coded to level 1 of the 2005 ethnic standard. An ethnic group is included wherever it is captured by any agency, at any point in time (ever-recorded). It is not possible to directly identify the source(s) of ethnicity, or the date it was captured, in the personal details table, but ethnicity responses for each dataset can be examined individually.

Linking the census to the IDI

To enable individual-level comparisons between the ethnicity information in the IDI and the ethnicity information in the census, the census has been linked to the IDI at the individual level. This link was created by Census Transformation in May 2015. The linking was done to better understand the coverage and quality of census information in the IDI, and the linked data was only available to approved Statistics NZ staff working on the Census Transformation programme.

The census was linked to the May 2015 version of the IDI spine. Linking was completed in Quality Stage using probabilistic matching techniques. The variables used in the linkage process were full name, date of birth, sex, meshblock of usual residence, and country of birth.

Overall, 3,920,364 (or 92 percent of) census usual residents were linked to the IDI. Of most interest for this paper, 95 percent of census records for New Zealand residents in households where forms were returned were linked to the IDI. The match rate was much better for individuals who had used electronic forms (98 percent linked) compared with paper forms (93 percent linked). There were around 250,000 individuals in the census who provided an ethnicity but for whom a link could not be found in the IDI.

The links in this dataset have an estimated false positive rate of less than 1 percent (a false positive is when an incorrect link has been made between two different individuals).

  • Share this page to Facebook
  • Share this page to Twitter
  • Share this page to Google+
  • Share this page to Facebook
  • Share this page to Twitter
  • Share this page to Google+