Stats NZ has a new website.

For new releases go to

As we transition to our new site, you'll still find some Stats NZ information here on this archive site.

  • Share this page to Facebook
  • Share this page to Twitter
  • Share this page to Google+

A New Zealand resident population (the IDI-ERP) has been derived from the linked administrative sources in the IDI. The IDI-ERP population is 2 percent larger than the official ERP population estimate. The overall pattern of the national age-sex distribution is similar to the ERP distribution, suggesting that the approach taken to deriving the IDI-ERP works reasonably well. However, coverage patterns vary by age and sex, with high net overcoverage in early adult ages (20–34 years), especially for males. 

A key finding is that the accuracy of linkages becomes critical when we wish to count populations using linked data sources. It is just as vital to minimise missed linkages as it is to avoid linking two different individuals.

Coverage errors

Net coverage typically conceals underlying undercoverage and overcoverage. Linking the census to the IDI has provided some insight into the individuals who may be wrongly included, or wrongly excluded from the IDI-ERP. However, our initial estimates are inflated by linkage errors in the Census-IDI link and by census non-response.

Coverage errors in the IDI-ERP may be due to the rules we have used to define New Zealand residents, and to linkage errors in the construction of the IDI.

Sources of overcoverage in the IDI-ERP include:

  • People who are not part of the resident population but were erroneously included in the IDI-ERP, for example short- or medium-term migrants.
  • False negative links between component datasets of the IDI spine. IDI datasets are linked together probabilistically and are subject to linking error in the same way as the Census-IDI link. False negative links in the spine lead to an individual appearing twice, and therefore contribute to overcoverage in the IDI-ERP.

Sources of undercoverage in the IDI-ERP include:

  • People who are part of the resident population but were not selected into the IDI-ERP because they did not have recent activity in the administrative data sources used here.
  • People who are part of the resident population but do not appear in the IDI spine. For example those born overseas whose visa is before 1997 (or who do not require visas) and have no tax records.
  • False positive links between component datasets of the IDI spine. False positive links lead to two individuals being counted as one and therefore contribute to undercoverage in the IDI-ERP.
  • False negative links between the IDI spine and activity data sources. People may have been recently active, but a failure to link any record of activity to the spine would mean they are not included in the IDI-ERP.

At the aggregate level there appears to be considerable overcoverage in the IDI-ERP, suggesting that we are erroneously including individuals who are not New Zealand residents. Many of these erroneous inclusions are young adult males. They may be due to linkage error within the IDI spine (resulting in duplicate records for an individual) or to short-term visitors to New Zealand who are not identified as such from migration data. Errors in identifying migrants may be a result of the rules we have used to identify migrants, or to linkage errors involving the border movements data in the IDI.

In addition, the rules are failing to select some people who are usual residents. In particular there is a group of individuals who are in the IDI and census but are not being selected into the IDI-ERP. Many of these individuals are in the ages leading up to retirement. They may not have activity in any of the relevant datasets in the IDI (for example, they have retired early and are not visiting a doctor regularly). Or they may have been active, but their records were not linked in the IDI. Or they may be absent from the spine, for example, if they migrated to New Zealand before 1997 (or do not require a visa to live in New Zealand), and have not worked or received a taxable benefit.

Further work

This work has been undertaken in the context of Census Transformation. The IDI-ERP administrative population estimates presented in this paper are likely to be initial counts, which would be further improved by coverage adjustments and estimation methods to fully meet the quality standards. Some level of coverage error in the IDI-ERP seems inevitable. However, larger discrepancies will require a larger coverage survey and greater reliance on models in the final estimation, with consequently higher costs and higher levels of uncertainty.

We anticipate that a method for identifying New Zealand residents at a given time will also be useful more generally for research using the IDI.

In conclusion, the structure of the linked administrative data available in the IDI, with a spine that targets those 'ever resident' in New Zealand linked to records of activity in health, taxation and education, international border movements, and deaths provides a solid basis for identifying a New Zealand resident population at a given time. However, further work is needed to understand the causes of undercoverage and erroneous inclusions that are apparent from this study.

  • Share this page to Facebook
  • Share this page to Twitter
  • Share this page to Google+
  • Share this page to Facebook
  • Share this page to Twitter
  • Share this page to Google+