Stats NZ has a new website.

For new releases go to

www.stats.govt.nz

As we transition to our new site, you'll still find some Stats NZ information here on this archive site.

  • Share this page to Facebook
  • Share this page to Twitter
  • Share this page to Google+
LEED processes

A high level summary of the processes to create and maintain the LEED dataset is shown below. The following sections contain a brief description of the key steps undertaken during each of the broad process stages.

LEED Process Flow

Diagram, LEED process flow.

Receive data

An extract of data is received from Inland Revenue each month. This extract contains all EMS records which have been processed or updated since the previous extract, and any updates to core reference tables.

Clean data

The base data received is of an overall high quality but cleaning and transformation processes are required. Both the datasets are of a high standard for the purpose for which they have been collected. However, the process of merging the sets requires editing and imputation processes so that robust official statistics on the labour market can be produced.

Missing or unknown variables are imputed, where possible, using standard Statistics NZ imputation processes. Other variables are derived using information in the tax data.

Employees are matched to their employers to create a job record for a specific month. Some breaks can occur in these job records over time that are not true breaks in employment. It is necessary to identify these so that accurate job histories can be produced, and job and worker flow statistics are not upwardly biased. Employer identification is an example of this type of transformation process.

Employer identification

It is important to identify when employers in different periods are actually the same enterprise using a different IRD number. In general, a new business appears in the data as a new IRD number while an IRD number that is no longer used indicates a ceased business. However a change in IRD number may occur with a change in legal status or ownership. Failure to identify these changes in IRD number for purely administrative reasons will result in some continuing businesses being incorrectly classified as firm births or deaths. This will upwardly bias job and worker flow statistics, and adversely affect the ability to produce job histories and measures of employment tenure.

One method of employer identification is to recognize potential links between employing firms by tracking common employees who move from one firm to the next. A second method recognises births onto the Statistics NZ BF, which are really a transfer of ownership of an existing enterprise.

Integrate data

Integration of the tax and BF data has resulted in the development of a longitudinal version of the BF (the LBF). The LBF stores information on New Zealand businesses as they change over time and enables users to access this data. A number of imputations and repairs are applied to improve the quality of the data. These include employer identification information recognised during transformation processing, and the addition of employers found in LEED that are not on the BF.

Using the LBF, LEED is able to allocate jobs from an IRD number on an EMS to a geographical unit. This is a simple process where an employer IRD number links to a BF enterprise that has one geographic unit. However the process becomes more complex as an enterprise structure becomes more complicated.

For enterprises with complex structures, including multiple geographic units, the allocation of jobs must take into account several factors. The employment count figures for each geographical unit associated with the enterprise on the LBF are used as target figures. The jobs to allocate can then be divided across the geographic units using these target figures. This is done using an algorithm which minimises the travel distance between an individual’s location and the employer’s geographic location while aiming to keep the employment counts in proportion to the targets. A second algorithm aims to keep continuing employees at the same geographical unit.

Produce outputs

Once the transformation and integration processes are completed, a series of tables are created or updated. These tables form the basis for the outputs.

The outputs are generated and analysed, with checks made for:

  • coherency with other available statistics
  • consistency with analyst’s expectations
  • confidentiality of results.

During the development process, techniques such as the perturbation of data (adding ‘noise’) were considered that might allow maximum release of information while preserving the confidentiality of individual units. It was found that applying these techniques did not allow many more cells to be released without introducing a level of noise that limited the value of the data. The method used in LEED is to collapse the output categories for each dimension so that the confidentiality of individuals and businesses is protected.

The issues associated with maintaining the confidentiality of industry and regional-based statistics are a key constraint in the release of data from LEED. This reflects a small economy which in many areas is characterised by a small number of dominant industry participants.

Release

Customised data enquiries

LEED has the capacity to produce customised data but user demands for detailed statistics must be balanced against the requirement to maintain confidentiality.

The issue of confidentiality and privacy is further complicated as LEED produces longitudinal and multi-dimensional views of the data. Cross-table comparisons have the potential to result in the inadvertent release of sensitive information.

To protect confidentiality, all possible views of the available measures which maintain the confidentiality of individuals and businesses have been released. The categories of the data released were established by a rigorous testing process. Consequently, any requests for further breakdowns are unlikely to be provided as confidentiality could be compromised. Customised data requests may be considered if the request is to provide a measure not currently available.

Micro-data access

Legislative barriers limit access to LEED unit record micro-data for researchers outside of Statistics NZ. All LEED research so far has been undertaken by Statistics NZ employees, or other government agency researchers who have been seconded into Statistics NZ for this purpose.

  • Share this page to Facebook
  • Share this page to Twitter
  • Share this page to Google+
Top
  • Share this page to Facebook
  • Share this page to Twitter
  • Share this page to Google+