Stats NZ has a new website.

For new releases go to

www.stats.govt.nz

As we transition to our new site, you'll still find some Stats NZ information here on this archive site.

  • Share this page to Facebook
  • Share this page to Twitter
  • Share this page to Google+
Discussion

This paper examined the quality of geographic information in the IDI by comparing it with the geographic information contained in the 2013 Census.

Summary of main findings

Coverage of geographic information in the IDI was high, with 99 percent of individuals in the resident population having a meshblock recorded in at least one IDI data source.

Comparison of the meshblocks recorded in IDI against those recorded in the census revealed that different administrative data sources have different levels of agreement with the census, with health having the highest and MSD’s working age benefits the lowest. Combining the geographic information from these individual sources produced geographic information that was more accurate than any single source alone. When the most recently updated meshblock from any source was selected, 79 percent of people had the same meshblock recorded in the IDI and the census, 82 percent had the same area unit, and 94 percent had the same territorial authority. Individuals in the young adult ages (15-30), and particularly males, were least likely to have agreement between IDI and census meshblocks.

The quality of geographic location information was also tested by using it to create households, and then comparing household size and composition against that recorded in the census. Agreement between households in IDI and census was lower than for individual geographic information. Overall, 55 percent of census households had the same household size in the IDI, and 48 percent contained exactly the same set of household members.

There are several possible reasons why the location information recorded about an individual in the IDI does not agree with that recorded in the census. Some individuals do not have frequent interactions with data providers, and may not update their address with the data provider when they move house. In addition, recording or geocoding errors may result in an individual’s address being coded to the wrong meshblock. Finally, simple comparisons between IDI and census location information do not reflect the complex reality for some people. People who live across multiple residences may report different addresses in different sources. Some portion of the disagreement between IDI and census addresses may reflect these complex situations, rather than errors or outdated addresses.

Some agencies have less operational imperative than others to update address information, particularly as the majority of services move to being offered online. This may be one explanation for the finding that different data sources have different levels of agreement with census geographic information. An additional possibility is that some different data sources cover population groups that are more likely to have outdated address. For example, individuals receiving welfare benefits may be more mobile than other groups, and this may explain the lower agreement between MSD and census addresses.

Limitations

There are some limitations to these results.

The analyses in this paper were restricted to individuals who had IDI and census records that were able to be linked together. The linking of IDI and census records relied in part on meshblock of usual residence, which was used as a blocking variable. Therefore, the results reported in this paper may overestimate the level of agreement between census and IDI geographic information. It should be noted, however, that the link rate of census to IDI was high (94 percent) and the estimated rate of false positive errors was low (0.7 percent), suggesting that the linking was of high quality and the results in this paper are likely to be close to the ‘true’ levels of agreement.

The comparisons between IDI and census geographic information made in this paper could only be made for census day (5 March 2013). It is possible that the level of agreement between IDI and census geographic information has changed since census day, however this cannot be tested until the next census in 2018.

Improving the quality of location information

While the quality of geographic location information in the IDI varies by data source, it is possible to combine these sources in a way that provides accurate information for around 80 percent of people. While this result is promising, it leaves around 20 percent of individuals with an incorrect address. Given the importance of accurate location information to a range of analyses, attention should be given to improving the quality of the location information available in the IDI.

One strategy for improving the quality of address information in IDI is to refine the method for selecting a meshblock from multiple available meshblocks in the IDI. However, the analyses in this paper suggest that, at the present time, this strategy would only result in a small improvement in address quality.

There may be some small improvement in address quality when more up to date PHO address updates become available in the IDI. At the time that the analyses in this paper were conducted, PHO address updates were only available to the end of November 2012. If address updates were available right up to census night, this may capture additional updates and improve the quality of the location information.

Another strategy that could improve the quality of geographic information in the IDI could be to improve the method for geocoding addresses. This is currently under investigation.

The strategies mentioned here are likely to result in only small improvements in address quality. It is likely that greater gains would come from strategies to improve the quality of address information at source agencies. It is not mandatory for source agencies to collect residential addresses, and many agencies do not have a need to collect accurate and up-to-date location information. However, improvements in address quality could still be obtained by ensuring that addresses:

  • are collected according to common standards 
  • include enough information to be accurately geocoded 
  • are updated regularly.

In particular, improving address quality for groups of individuals who are known to have poor quality addresses, such as tertiary students and other young adults, could be worthwhile. Some improvements are already in place. An improvement to Ministry of Health geocoding processes in 2013, for example, is likely to result in improved quality for newer health addresses.

Even with improvements in address quality, some individuals are likely to have an incorrect address recorded. There may be a role for modelling approaches that identify and correct likely address misclassifications. Imputing the small number of missing meshblocks in the IDI-ERP could also be useful in improving the quality of geographic information.

Accurate information about where individuals live is key to producing official statistics, and is central to many policy and research questions. If individuals are not placed in the right location, these errors will flow through to all regional statistics and analyses, including regional breakdowns of households and families, incomes, and educational achievement.

Further work is being undertaken as part of the Census Transformation programme to develop our understanding of the quality of geographic location information in administrative data sources. This includes examining the predictors of errors in geographic location, the impact of these errors on subnational population distributions, and a more in-depth investigation into the quality of household information in administrative data.

  • Share this page to Facebook
  • Share this page to Twitter
  • Share this page to Google+
Top
  • Share this page to Facebook
  • Share this page to Twitter
  • Share this page to Google+