Stats NZ has a new website.

For new releases go to

www.stats.govt.nz

As we transition to our new site, you'll still find some Stats NZ information here on this archive site.

  • Share this page to Facebook
  • Share this page to Twitter
  • Share this page to Google+
How to add a dataset to the IDI or LBD

This page contains information for researchers who want to add data to the Integrated Data Infrastructure (IDI) or Longitudinal Business Database (LBD).

See Upcoming datasets for the IDI and LBD to find out which new datasets will be added to the IDI in future and which datasets have been added recently.

Benefits of linking data

Linking data to the IDI or LBD can be used to:

  • evaluate the longitudinal outcomes or performance of a programme
  • locate a specific subpopulation that is difficult to find within the IDI/LBD
  • gain a broader understanding of the population of your dataset
  • research a topic or population not currently possible in the IDI/LBD.

The IDI contains person-centred microdata from a range of government agencies, Stats NZ surveys including the 2013 Census, and non-government organisations. For more information about data in the IDI, see Data in the IDI.

The LBD contains business-centred microdata from a range Stats NZ surveys and government agencies. For more information about data in the LBD, see Longitudinal Business Database.

Application process

To link data to the IDI or LBD, you need to go through an application process. Figure 1 shows the six steps in the process, and approximately how long each step takes. Each step is explained in more detail below.

image, idi diagram.

Step 1 – Submit application to load data

Fill in the form ‘Application to add a dataset to the Integrated Data Infrastructure or Longitudinal Business Database’ (see ‘Available files’ box on the right-hand side of this page) if you want to link data to the IDI or LBD. The form includes questions about:

  • purpose and benefits of adding the data
  • dataset coverage and quality of linking variables
  • privacy and consent considerations.

Email your completed application form to access2microdata@stats.govt.nz.

Step 2 – Application reviewed for feasibility

The Integrated Data team will assess the practicalities, benefits, and risks of linking the data, then recommend the most suitable linking process.

It usually takes us up to 15 working days to review new applications. We will contact you during this time to arrange a discussion about your application, so please make sure you are available.

Step 3 – Prioritising applications

We prioritise reviewed applications based on the scale of the linking project.

Full integration

A full integration is where a dataset containing a new population is requested for inclusion in the IDI. We will link the new population to the IDI using existing unique identifiers as well as personal identifying information (eg full name and date of birth). Projects intended for full integration are usually updated on a regular basis as part of the quarterly IDI refresh process.

The Integrated Data Advisory Group meets monthly to prioritise dataset requests that require full integration, and on an ad hoc basis when the request is urgent.

About the Integrated Data Advisory Group.

Ad hoc load

When a dataset contains a unique identifier that exists in the IDI, we can perform an ‘ad hoc load.’ An ad hoc load is where we use deterministic linking to link a new dataset to an existing population in the IDI. Projects in this category are typically a one-off supply intended for a specific purpose.

We prioritise ad hoc load requests on a monthly basis.

See Privacy impact assessments for the IDI for the unique identifiers in the IDI that can be used for ad hoc loads.

We prioritise datasets based on the following criteria.

  • How important is the policy/research area the data would contribute to?
  • How important is the data to resolving the policy/research issue?
  • By using the data, what potential value would it create?
  • Will the data be a one-off supply, or ongoing?
  • What is the breadth of use of the data?
  • How do the potential privacy risks associated with integrating the data compare with the potential value it would create?
  • Is there data in the IDI that could be used as a proxy for the proposed dataset?

Step 4 – Integrated Data team schedules linking projects

Upcoming datasets for the IDI and LBD shows which datasets are scheduled or being considered for inclusion in the IDI and LBD.

We prioritise datasets according to the criteria in step 3, but to schedule the integration/load of the data we need to consider the following.

  • Is the data readily available?
  • Has the data supplier been engaged and are they on board?
  • What is the quality of the variables used for linking?
  • How complete are the agreements and documentation required for integration?

IDI linking projects

New IDI linking projects are included as part of the quarterly refresh cycle. We determine which projects can be completed in the upcoming quarter with available resources. The highest priority projects, as determined by the prioritisation process in step 3, will be given preference. When we receive new applications, we will prioritise and schedule the request based on the above criteria – this process may result in a reordering of already scheduled requests.

We will let you know the outcome of the prioritisation process for your application, and whether or not we have scheduled your dataset for inclusion in the upcoming quarter. This information will be shared on Upcoming datasets for the IDI and LBD.

Note: Ad hoc loads are not included as part of the quarterly refresh cycle. We prioritise ad hoc loads monthly and then schedule them for loading into the IDI according to the criteria in step 3 and depending on the resources available. Ad hoc loads cannot, however, be processed while an IDI refresh is in progress.

LBD linking projects

New LBD linking projects are added as part of the ongoing yearly loading process. We determine how soon projects can be completed with available resources. The highest priority projects, as determined by the prioritisation process in step 3, will be given preference.

We will let you know the outcome of the prioritisation process for your application, and an estimate of when your data is scheduled for inclusion. This information will be shared on Upcoming datasets for the IDI and LBD.

Estimated times for inclusion of data may change as new applications are prioritised. We may contact you closer to the time of your data’s inclusion to check that your application is still required.

Step 5 – Data linking preparation and documentation

When a dataset is scheduled for integration in an upcoming refresh or an ad hoc load, we begin preparations for linking and complete important documentation. Requirements and timeframes will vary depending on the nature of the project but can include:

  • a business case
  • a memorandum of understanding
  • a privacy impact assessment (PIA)
  • technical specifications
  • metadata.

Ad hoc loads

We can process ad hoc load requests faster if:

  • the dataset contains a population already covered by an existing privacy impact assessment. See Privacy impact assessments for existing IDI PIAs.
  • you have obtained consent from the dataset population permitting the use of their data for research purposes.

Step 6 – Data is linked and made available for research

Before we make your dataset available for researchers, we remove personal identifying information such as names, addresses (if these have been supplied), and the day from the date of birth/death. We encrypt (ie replace with another unrelated number) identifiers such as IRD and NHI numbers. We will let you know when your dataset is available for research in the IDI/LBD.

For ad hoc loads, steps 5 and 6 can take up to three months depending on demand. IDI projects intended for full integration are dependent on the quarterly refresh, so steps 5 and 6 may take up to nine months. The timeframe for large-scale LBD projects will depend on how your project is prioritised.

To access your dataset you will need to complete either:

  • a new Data Lab project application form (usually takes 15 working days)

or

  • a variation request for an existing Data Lab project (usually takes 10 working days).

See Access microdata in the Data Lab for more details about the Data Lab project application process. 

top

The Integrated Data Advisory Group

The Integrated Data Advisory Group (IDAG), established in 2017, replaced the previous governance structure. The group’s vision is to ‘enable the use of integrated data to inform decision-makers for improved outcomes for New Zealanders’. More specifically, it aims to:

  • get the highest priority datasets into the IDI and LBD
  • improve access to the IDI and LBD
  • encourage visibility of analytical outputs and real-world outcomes
  • demonstrate transparency around governance and dataset prioritisation.

The IDAG comprises people from government and non-government sectors with a mix of knowledge, understanding, and experience of different fields.

See Upcoming datasets for the IDI and LBD for the datasets that have been prioritised and scheduled for inclusion in the IDI and LBD.

See the Integrated Data Advisory Group – Terms of Reference for more information about the IDAG, including its members and goals.

Schedule of meetings for 2018

  • 24 January
  • 28 February 
  • 22 March 
  • 18 April
  • 17 May
  • 20 June
  • 19 July
  • 15 August
  • 20 September
  • 17 October
  • 22 November
  • 19 December

 

Updated 8 February 2018

  • Share this page to Facebook
  • Share this page to Twitter
  • Share this page to Google+
Top
  • Share this page to Facebook
  • Share this page to Twitter
  • Share this page to Google+