Stats NZ has a new website.

For new releases go to

As we transition to our new site, you'll still find some Stats NZ information here on this archive site.

  • Share this page to Facebook
  • Share this page to Twitter
  • Share this page to Google+

Creating a usually resident population from the IDI 

The IDI spine contains more than 9 million individuals – far more than the 2013 New Zealand usual resident population of approximately 4.5 million. Many individuals in the IDI spine are former usual residents of New Zealand who have since left or died. It is therefore necessary to restrict the IDI spine population to the subset of individuals who were usual residents of New Zealand at a given date. 

The method used to select the resident population at a given date relies on identifying activity in New Zealand administrative systems that indicates an individual’s presence in New Zealand over a period prior to the reference date. We then remove individuals who left the population by death or outmigration prior to the reference date.

In theory, we can remove migrants by observing their travel patterns. However, one problem is that border movement data is only available from 1998. Another is that determining a change in residency status through matching inward and outward travel journeys is difficult. There is currently no standard definition of what period of time out of the country determines when a person is considered not to be a New Zealand resident, and travel patterns of residents and visitors are complex.

The rule applied here is a somewhat pragmatic choice. It is relatively straightforward to apply and strikes a balance between retaining New Zealand residents who spend periods of time overseas, and removing genuine external migrants. Nevertheless, some short-term visitors may be incorrectly retained in the IDI resident population.
Specifically, the method used to identify the IDI resident population for a given date was as follows:

Inclusion: retain individuals whose presence is indicated by activity

  • For ages five years and over, the spine population was restricted to those individuals who had activity in one of the following IDI datasets in the 12 months prior to the reference date:
    – ACC claims
    – Inland Revenue tax (employer monthly summary of tax paid at source, or annual tax return data; receipt of taxable benefit payments is included)
    – Health (pharmaceutical prescriptions, GP enrolment and attendance, hospital admissions, non-admission hospital visits)
    – Education (school enrolment, tertiary enrolment or attainment).
  • For ages under five years, having a record in the spine was sufficient for inclusion in the population. For these ages there was no additional requirement of activity in the previous 12 months.

Exclusion: remove those who have left the population

  • Linked death records were used to identify individuals with a date of death prior to the reference date.
  • Linked migration data were used to identify individuals who had moved overseas. Individuals were classified as having moved overseas if the total length of time spent overseas was at least 10 of the 12 months spanning the reference date (that is, the six months either side of the reference date).

The resulting resident population derived from the IDI is called the IDI-ERP.

Figure 2 shows a simple diagram (not to scale) of the administrative population derived from the IDI (the IDI-ERP) as a subset of the IDI spine.

Figure 2
The IDI-ERP shown as a subset of the IDI spine

Diagram showing the Integrated Data Infrastructure Estimated Resident Population (IDI-ERP) as a subset of the IDI spine.

Methods for assessing the coverage of the IDI resident population

The rules for inclusion and exclusion described above are used to create a list of the resident population from the IDI. This section describes the methods used to assess how well the rules are performing.

Aggregate comparison against the ERP and the quality standards

This administrative population derived from IDI (the IDI-ERP) can be compared to the official estimated resident population (the ERP) at an aggregate level, by age and sex. The result of this comparison will reveal net overcoverage or undercoverage.

Census Transformation has developed a set of quality standards to assess the quality of population estimates produced from alternatives to the current census model (McNally and Bycroft, 2015). The quality standards reflect a series of discussions held with core customers of population statistics. They provide a measure of the lowest acceptable accuracy for users of the estimated resident population data series.

The relevant standards for this paper are those for the national population applied to an administrative census model that produced independent estimates each year. According to the quality standards, the total national population estimate should be within 0.5 percent of the ERP. National population estimates by sex and five-year age group should all be within 5 percent of the ERP, and 90 percent of them should be within 1.5 percent of the ERP.

It should be noted that these quality standards apply to the final population estimates. The estimates presented in this paper are likely to be initial counts, which would be further improved by coverage adjustments and estimation methods. Nonetheless, they are a useful guide for evaluating the broad quality of the population counts in this paper.

Individual comparison against census

There are limitations to the aggregate comparison. Comparing populations at the aggregate level cannot reveal which individuals are missing from the population (undercoverage) and which are erroneously included (overcoverage). Furthermore, aggregate comparisons may obscure patches of undercoverage and overcoverage that balance out to produce good net coverage.

To address these limitations of the aggregate-level comparisons, we can compare the IDI-ERP at the individual level against another list of the population – in this case, the census. While the census is not a completely accurate list of the New Zealand resident population (it contains undercoverage, overcoverage, and by design does not include residents overseas on census night), the results of an individual-level comparison against census may still reveal patterns of overcoverage and undercoverage that were not apparent in the aggregate-level analysis. This information could be used to understand and improve the IDI-ERP coverage of the resident population.

To enable a fair comparison of the IDI-ERP and census populations, we made the following adjustments:

  • overseas visitors were removed from the census population
  • New Zealand residents who were recorded in migration data as being overseas on census night (residents temporarily overseas) were removed from the IDI-ERP population
  • babies born in March 2013 were removed from both populations (only month and year of birth were available in IDI, so it was not possible to distinguish babies born before 5 March from those born after).

The linked Census-IDI dataset is used for these individual-level comparisons. 

  • Share this page to Facebook
  • Share this page to Twitter
  • Share this page to Google+
  • Share this page to Facebook
  • Share this page to Twitter
  • Share this page to Google+