Stats NZ has a new website.

For new releases go to

As we transition to our new site, you'll still find some Stats NZ information here on this archive site.

  • Share this page to Facebook
  • Share this page to Twitter
  • Share this page to Google+
Data sources

This section describes the data sources used in this investigation: the New Zealand Census of Population and Dwellings (the census), the main administrative sources for the four variables, how these administrative sources are brought together in Statistics NZ’s Integrated Data Infrastructure (IDI), and the linked Census-IDI dataset.

New Zealand Census of Population and Dwellings

The census is the official count of people and dwellings in New Zealand. It provides a snapshot of our society at a point in time and tells the story of social and economic change in New Zealand. The census has a wide range of uses within and outside government. The latest census was held in March 2013.

The census aims to count everyone who is in New Zealand on census night. Overseas visitors are included in the census, while New Zealand residents who are not in New Zealand on census night are not included. For this investigation, we are only interested in New Zealand residents, not those visiting New Zealand on census night.

Census coverage and missing data

The 2013 Census usual resident population count is 4,242,048 people. The census count includes 4.8 percent (203,052) substitute records (Statistics NZ, 2014a). A substitute is a census record created (imputed) where there is sufficient evidence received during the collection process that a person exists, or a dwelling was occupied, but we obtained no corresponding census form. As such, substitutes are part of census non-response. While the census imputes values for age and sex, there is no imputation in published census outputs for the variables considered in this paper

Coverage in the census is measured by the Post-Enumeration Survey (PES) (Statistics NZ, 2014b). Net census undercount for the 2013 Census was estimated at 2.4 percent. Younger adults aged 15–29 years (4.8 percent) had a higher percentage undercount than other age groups. Net undercount also varies by ethnicity, with the percentage undercount for Māori (6.1 percent) and Pacific peoples (4.8 percent), with young age structures, being higher than for Asian (3.0 percent) and European (1.9 percent) ethnic groups.

The estimated resident population

The estimated resident population (ERP) of New Zealand is an estimate of all people who usually live in New Zealand at a given date (Statistics NZ Standard for population terms). New Zealand’s ERP is derived by adjusting the census usually resident population count for net census undercount (as estimated by the PES) and the estimated number of residents temporarily overseas on census night. To obtain the ERP at a given date after census night, updates are made for natural increase (births less deaths) and net migration (arrivals less departures) between census night and the given date. The official ERP series provides the best measure of who is living in New Zealand at a given date.

Variables in the census

The census uses the statistical standard and classification for the four variables considered here. We summarise the main census results and the relationships between ethnicity, descent, and iwi.

See QuickStats about Māori (Statistics NZ, 2013) for further detail about census results.


In the 2013 Census, 598,605 people usually living in New Zealand identified with the Māori ethnic group. Almost half these people (278,196 or 46.5 percent) identified Māori as their only ethnicity.

Ethnicity is a ‘foremost’ variable in the census, which means it is managed to produce information of highest quality. The non-response rate for ethnicity for those who return a census form is low (0.7 percent in the 2013 Census). However, the overall non-response to ethnicity, including substitute forms, is 5.4 percent. The census reports Māori as making up 14.9 percent of the population, while the ERP (which adjusts for non-response) reports 15.6 percent.

Māori descent

The census Māori descent question asks respondents: “Are you descended from a Māori (that is, did you have a Māori birth parent, grandparent or great-grandparent, etc.)?”

In 2013, 16 percent (668,724 people) answered ‘yes’ to the Māori descent question and 72 percent answered ‘no’; 2 percent answered ‘don’t know’, and 10 percent did not respond.

Māori descent and Māori ethnicity are closely related concepts, but census results demonstrate that people do respond differently. Of the 2013 Census respondents who said they were of Māori descent, 16 percent (107,391) said they were not ethnically Māori. A smaller group (4,212 people) identified as being of Māori ethnicity while stating they had no Māori descent.

Iwi affiliation

Only those who selected ‘Yes’ to Māori descent are asked to provide information about their iwi. The question is: “Do you know the name(s) of your iwi (tribe or tribes)?” Respondents are able to state up to five iwi or rohe (region).

Iwi information collected in the census is subject to some processing and quality issues. The census question on iwi required a written-in response. A team of specialist process operators were employed to ensure responses were coded as accurately as possible.

While 80 percent of those with Māori descent provided at least one valid iwi in the 2013 Census, 17 percent said that they did not know their iwi, and a further 3 percent of responses could not be coded. At the same time, 14,000 people gave a valid iwi in the 2013 Census but did not respond to the Māori descent question. These were not included in either the descent or iwi census counts.

Te reo Māori

Information about people’s ability to speak te reo Māori is collected from the general ‘languages spoken’ question in the Census. The 2013 Census asked respondents to identify the languages in which they could “have a conversation about a lot of everyday things”. Māori is a tick box response option.

The non-response rate for the 2013 language question was 6.3 percent, of which most were substitute records.

Administrative sources

Several government agencies and Māori organisations collect information about the four variables. Table 1 summarises these sources by the variables available. The remainder of this section describes key features of the sources available for each variable.

Table 1
Administrative sources for the four variables

Source Māori ethnicity  Māori descent  Iwi  Te reo
Department of Internal Affairs
(Births and Deaths) 
ü ü .. .. 
Ministry of Health  ü .. ..  .. 
Ministry of Education ü .. ü ü
Ministry of Social Development  ü ..  ..  .. 
Accident Compensation Corporation ü ..  ..  ..
Electoral Commission  .. ü ..  .. 
Iwi registers ..  ü ü ü
Tūhono Trust  .. ü ü .. 
 Symbol: ..not available

Administrative sources for ethnicity

The main government agencies that collect ethnicity are: Department of Internal Affairs (DIA) Birth and Death registrations, Ministry of Health and health service providers, the Ministry of Education, and the Ministry of Social Development. Accident Compensation Commission data is also available.

Reid et al (2016) describe the ethnicity information collected by each of these agencies, primarily based on Cormack (2010) and Cormack & McLeod (2010). Most agencies apply the standard concepts of cultural affiliation, self-identification (where possible), and allow people to belong to more than one ethnic group.

Government agencies vary in the collection mode and questionnaire used to collect ethnicity information. Some forms, such as the birth registration form used since 1995, align very closely with the statistical standard, including having a nearly identical question to the census. Some other agencies use a question that is conceptually in line with the standard but differs in wording or presentation.

Response coding differs. Some agencies code to a higher (less detailed) level than the full level 4 classification, and others deal differently with multiple responses. Older data is often limited, and not consistent with the current standard.

The level of quality controls in place also varies. DIA works closely with Statistics NZ, which processes the data for publication of official statistics and closely monitors quality. Other agencies have few external checks on their data collection.

Administrative sources for Māori descent

Relatively few government administrative sources collect information on Māori descent. Only birth and death registrations and electoral enrolments ask about Māori descent directly. Birth and death registrations are available in the IDI. Individual-level data for electoral enrolments are not available to Statistics NZ, due to restrictions in the Electoral Act 1993, although aggregate tables can be compared with census results.

DIA is responsible for birth registrations, and records go back to the 19th century. Until 1962, the agency kept separate registers for Māori births. The Māori birth register included tribe, residence and iwi details completed by the parents, however for the most part these fields have not been digitised. Between 1962 and September 1995 information was collected on the degree of Māori or Pacific Island blood and the tribe or island of the newborn's mother and father. While this definition is not consistent with the measurement of ethnicity, it is consistent with a measure of descent for Māori.

DIA introduced a new birth registration form in September 1995 that included an ethnic question consistent with the concept of ethnic self-identification. In addition the form included a question on Māori descent. The registration form includes ethnicity and Māori descent questions for the mother, father, and child. Since 1998, birth and death records have been recorded digitally.

Under the Electoral Act 1993, all people eligible to vote are required to enrol with the Electoral Commission. As part of registering to vote, you must answer the question “Are you a New Zealand Māori or a descendant of a New Zealand Māori?” Tick boxes for ’yes’ and ’no’ are provided (Electoral Commission, 2015). Responses to this question are used to determine the number of Māori electorates. Only people answering ‘yes’ to this question are eligible for the Māori electoral roll and to vote in the Māori electorates. The collection processes and quality of the data collected are generally good since it is crucial for running the electoral system. The major limitation of the electoral roll for this investigation is that only people 18 years or older are eligible to vote.

As a result of a concerted cross-government effort in the 1990s, Māori descent information from birth and death registrations, electoral enrolment data, and the census, is well-standardised. These sources each have the advantage of being single, centralised collections across the entire country that are all handled by a single agency. Birth and death registrations and electoral enrolment are also important legal processes.

Administrative sources for iwi

The main government source of iwi information is the Ministry of Education (MoE). Iwi information is now collected from all sectors of the education system (Education Counts).

See Iwi data: collection and use for more details.

Tertiary providers have been required to provide iwi affiliation of all first-year Māori students from 2002, although many providers have also provided comprehensive information on Māori students who had first enrolled in previous years.

All School Roll Returns have included iwi from 2007, and systematic collection of iwi for early childhood providers began in 2014. Coverage for iwi is therefore limited to younger age groups. Information is collected during the enrolment process at each early childhood centre, school, or tertiary institution.

The MoE provides guidelines for collecting iwi information on enrolment forms. Iwi affiliation is based on self-identification, and forms should allow for up to three iwi. Statistics NZ’s iwi classification is used, and the iwi codes are organised into regional groupings for reporting. However, the wording of the question, and response options, varies widely across different schools and tertiary institutions.

Other government agencies have collected some iwi information, including through the Student Loans and Allowances scheme, by the Ministry of Health, by DIA in birth registrations, and by the Department of Corrections. However, due to the limited amount of data, these sources are not considered further.

Māori organisations also collect iwi information, and by default they also collect Māori descent. Most iwi have established their own registers of enrolled members – either as a precursor to, or a condition of, Treaty of Waitangi settlements. Unlike the census and most government agency data sources, iwi membership is not based on self-identification but on acknowledgement of whakapapa (genealogy), endorsed by the iwi or hapu kaumatua (elders) (Walling, Small-Rodriguez, & Kukutai, 2009). The registration process depends on the iwi’s own protocol and its position in the settlement process.

Iwi registers vary in completeness and quality, depending on the success of Māori- or iwi-driven initiatives and, to some extent, the iwi’s position in the Treaty settlement process. Walling et al (2009) note the main sources of error on the Waikato-Tainui register are duplicate records, invalid applications, and deceased members being retained. Registers will only include those who register as members of the iwi, which is likely to be a subset of the iwi-affiliated population provided in the census. These issues are likely to result in some under-coverage and some over-coverage in iwi registers compared with the census.

No data for iwi registers was available in the IDI for this investigation.

The Tūhono Trust is an important pan-tribal iwi organisation. Through a 2003 amendment to the Electoral Act, the trust has a legislated role as Kaitiaki (guardian) of the iwi affiliation of Māori who are registered to vote. Secure systems allow Tūhono to facilitate sharing of information about Māori registered to vote with their affiliated iwi. Other pan-tribal organisations, such as urban Māori authorities and Māori business entities, also have an interest in the iwi affiliation of their members.

Administrative sources for te reo

Government data relating to te reo Māori is limited. MoE data about enrolment in kura kaupapa Māori (Māori-medium schools) or te reo courses may provide some information about its uptake, but studying a language is not the same as being able to have a conversation in that language. MoE data would not capture people who learned te reo at home or overseas, or who completed their education before the early 1990s.

MoE information is also available about te reo teachers. Since 2014, MoE has also collected information on ‘language(s) spoken at home’ from early childhood establishments. Some iwi have collected measures of te reo proficiency. While these sources might provide an indication of language proficiency, their coverage of the population is low.

The limited population coverage of these data sources, and the conceptual differences between them and the census, make replacing census Māori language information with administrative data unlikely at present. We do not consider te reo further in this paper.

Integrated Data Infrastructure

Statistics NZ developed the Integrated Data Infrastructure (IDI) as an environment in which to link multiple data sources in a systematic and secure way. It was developed to produce official statistics outputs and to allow Statistics NZ staff and external researchers to conduct policy evaluation and research on people’s transitions and outcomes. The IDI contains administrative and survey datasets, linked at the individual level. The IDI continues to change as new datasets are added.

This section describes the structure and content of the IDI in May 2015.

The structure of the IDI is shown in figure 3 (appendix). The structure can be described as a central ‘spine’ to which a series of data collections are linked. The target population for the spine is all individuals who have ever been residents of New Zealand. Three data sources are linked together probabilistically to create the spine: a list of all IRD numbers issued by Inland Revenue; a list of all births registered in New Zealand since 1920; and a list of all visas granted to migrants from 1997 (excluding visitor and transit visas). Other datasets are linked to the IDI spine and include a wide range of subject areas.

Statistics NZ (2014c) describes the linking methodology. Priority is placed on obtaining a high precision rate, ie minimising creating erroneous links, with the trade-off that more correct links may be missed. In practice, linkages are designed so that under 2 percent of links made are erroneous.

The IDI contains summary tables that provide core information about individuals (age, sex, ethnicity, and geographic location) summarised from across the available data sources.

For ethnicity, IDI business rules are applied to standardise the ethnic codes received from each agency. Ethnic information for each individual is combined in the IDI Personal Details table. In the process applied in 2015, ethnicity in the Personal Details table is a combination of the original responses given to separate agencies, coded to level one of the 2005 Ethnic Standard. An ethnic group is recorded wherever it is captured by any agency, at any point in time (ever-recorded).

Of the administrative sources described above, most were available in the May 2015 IDI. The exceptions are electoral roll data, Māori-owned sources such as iwi registers, and data from pan-iwi organisations.

Linking the census to the IDI

The 2013 Census has been linked to the IDI spine to enable comparisons between the information available for an individual in the administrative sources, and the responses provided by the same person in the census. This linked Census-IDI dataset was created by Census Transformation in May 2015. The linking was done to better understand the coverage and quality of census information in the IDI; the linked data was only available to approved Statistics NZ staff working on the Census Transformation programme.

Linking was completed in Quality Stage using probabilistic matching techniques. The variables full name, date of birth, sex, meshblock of usual residence, and country of birth were used in the linkage process. Overall, 3,920,364 census records were linked to the IDI (92.4 percent of the census count). Of most interest for this paper, 95.4 percent of census records for New Zealand residents in households where forms were returned (non-substitute households) were linked to the IDI. The linkage rate was better for individuals who had used e-forms (98 percent linked) than for paper forms (93 percent linked). The links in this dataset have an estimated false positive rate of less than 1 percent (where an incorrect link is made between two different individuals).


  • Share this page to Facebook
  • Share this page to Twitter
  • Share this page to Google+
  • Share this page to Facebook
  • Share this page to Twitter
  • Share this page to Google+