Stats NZ has a new website.

For new releases go to

www.stats.govt.nz

As we transition to our new site, you'll still find some Stats NZ information here on this archive site.

  • Share this page to Facebook
  • Share this page to Twitter
  • Share this page to Google+
Privacy, security, and confidentiality of information supplied to Statistics NZ
Safeguarding confidentiality

Overall aims for confidentiality

When publishing data, Statistics New Zealand is legally required to protect the information of individuals and businesses who have been surveyed. This page provides an overview of how we adjust our data to make sure that individual responses remain confidential and how the data may be affected by these adjustments.

This page covers the different techniques used for different types of output: tables and microdata. Several techniques are available. We apply different methods depending on the type of data in the particular statistics. Each type of data has its own confidentiality risks and modification methods, which are discussed here. All examples presented in this document are entirely fictional.
top

Census tables

We have special requirements for the Census of Population and Dwellings. We take care to protect cells with small numbers of respondents.

Census tables are often count tables. They count the number of records that possess certain properties. For example, the number of people by age group and the region they live in.

We protect information in census counts tables by random rounding to base 3 (RR3) and by suppressing small counts in some tables. 

2013 Census confidentiality rules and how they are applied has a full list of methods we use for census statistics.

Random rounding

What does this method do?

The counts are randomly rounded to base three. This is to disguise small counts, but all cells in the table are randomly rounded.

Counts that are already a multiple of three are left unchanged. Those not a multiple of three are rounded to one of the two nearest multiples. For example, a one will be rounded to either a zero or a three. Each value in the table is rounded independently. This means counts may not sum to totals, but ensures that published totals are within two of the original number. Because of random rounding, tables with percentages may not sum to 100.

Table 1 illustrates how we randomly round values in a table.

Table 1

Random rounding to base 3 
Value in the table Rounded value could be
 1  0,3
 2  0,3
 29  27,30
 103  102, 105

Table 2 shows an example of random rounding.

Table 2

Number of businesses by turnover and industry  (Before and after random rounding)
Occupation  Income  
 Less than $50,000 Greater or equal to $50,000 Total
Unrounded Rounded Unrounded Rounded Unrounded Rounded
Clowns  2 3 4 6 6 6
Jugglers  0 0 2 3 2 3
Lion tamers 7 6 1 3 8 9
How does this affect the final data?

For small numbers, where there is the most risk that individuals could be identified, there are larger percentage changes compared with larger numbers. For example, a cell with a one changed to a three has been changed by 200 percent, but a cell with 1,001 changed to 1,002 has been changed by only 0.1 percent. When analysing data, small counts need to be treated with caution. Therefore, the larger percentage changes in these cells do not cause a problem.

top

Social collections

Tables of data collected from social surveys are often count tables. They count the number of records that possess certain properties.  For example:

  • the number of people by age group and the region they live in
  • the number of businesses by industry and the size of the workforce.

Table 3 

Age group by income bracket, people aged 15 years and older from town X
Age group  Income bracket
Low Medium High Total
15–29
 0  3  0  3
30–39  1  0  1  2
40 – 49  1  0  8  9
50 – 59
 3  2  2  7
60+
  0   4  0  4
Total  5  9 11 25

Detection of sensitive cells in a count table

Published tables combine individual responses to produce national, regional, demographic, or industry totals. Small frequencies of zeros, ones, and twos, as in the table above, can disclose information about actual respondents. It is possible that somebody could recognise one or more of these respondents.

We protect the confidentiality of these responses by:

  • collapsing/aggregating rows or columns
  • modifying the cell values, for example by random rounding
  • suppressing cells.

Each of these methods affects the data in a different way. In the next section, we discuss the different processes used to adjust a count table and how this affects the published data. In many of our tables, we use a combination of these methods to make sure that individual responses remain confidential.

Collapsing of categories (aggregation)

What does this method do?

This method combines two or more groups into one new representative group. Aggregation is often used for industries with very few businesses.

Tables 4a and 4b show an example of aggregation.

Table 4a (before aggregation) 

Number of businesses by turnover and industry in a small area
Occupation Income
Less than $50,000 $50,000-$100,000  More than $100,000
Carrot farmers  1 3  1
Beetroot farmers  3 1  2
Corn farmers 10 5  3

Table 4b (after aggregation)

Number of businesses by turnover and industry in a small area
Occupation   Industry turnover  
 Less than $50,000
$50,000–$100,000 
More than $100,000  
Carrot & Beetroot farmers  4 4 3
Corn farmers 10 5 3
How does this affect the final data?

Aggregation decreases the detail of our published data. top

Cell suppression

Some sensitive cells have their value suppressed, and not released, if it is deemed necessary. Primary cell suppression protects sensitive cells by blanking them out. But if row and column totals are included in tables, deleting the sensitive cell alone will not protect the confidentiality of that respondent. Therefore, we delete non-sensitive cells to protect the sensitive cell. The number of cells suppressed depends on the size of the table. This is referred to as secondary cell suppression. 

Table 5 shows the process we use when suppressing cells in our tables. We use a one-way table for this example. First, we suppress the sensitive cell (primary suppression), but as it is still possible to work out its value by using the remaining values we need to suppress another cell in our table (secondary suppression).

Table 5

Farms by industry
Type of Farm Original data After primary suppression After secondary suppression
Dry stock farms 23  23 23
Orchards   2     S      S
Organic farms     17     17  S
Worm farms        30         30     30
Total 72 72 72
Symbol:
S suppressed for confidentiality reasons.

After secondary suppression, it is not possible to work out the exact numbers in the suppressed cells.

When applying the secondary suppression technique we make a judgement about which cell should be suppressed to minimise the loss of information. This decision will depend on how the table will be used. Often cells that represent fewer respondents will be suppressed.

top

Business collections

Tables of data collected from business surveys are often magnitude tables. These group members of the population into different categories and then sum up some numerical variable, for example GST sales, for the businesses that meet a certain criteria. As another example, we would use a magnitude table to show the total number of employees across all businesses, by region and industry.

There are two sorts of magnitude tables: value magnitude tables of numerical measures, such as turnover of a business, and count magnitude tables of numerical counts of units within the business, such as the number of sheep on a farm. A table may contain both types of magnitude as well as count data (as shown in table 6).

Table 6

Enterprise count, employee count and total turnover by business type
Business Type
(classification variable)
Enterprises
(count)
Employee Count 
(count magnitude)

Total Turnover 
 (value magnitude)
 

       Number          Number      $(000)
Individual  Proprietorship 79,407 46,010  9,000,010
Partnership 52,607 77,805 22,000,000

The p percent rule for value magnitudes

For value magnitudes tables, we use the p percent rule to determine sensitive cells. A cell is sensitive if any respondent in a category can estimate another individual or business's response to within p percent.  The value of p is not disclosed.  For example, the table below shows how business B can only estimate business A's contribution to within 75 percent.

Example - A cell in a table contains sales in $m and has four contributors.

    
Calculation of p percent
   $(million)
Business A   40
Business B  30
Business C  20
Business D  10
Total          100

In the worst case, business B can estimate business A's value like this:    

  1. B's estimate of A = Total - B's own value = 100 - 30 = 70.
  2. In this calculation, business B's error = business B's calculated value - A's value = 70 - 40  =  30.
  3. B's error as a percentage of A's value = business B's calculated value / A's value = 30 / 40 = 75 percent.
  4. This is the p percent value for the cell.

top

Graduated random rounding

What does this method do?

This method is similar to random rounding to base 3. With graduated random rounding (GRR), cells of different sizes are rounded to different bases. The rounding base gradually increases as the cell size does. This means the ratio of the rounding base to the cell size is reasonably constant. An example is shown in Table 7.

Table 7 

Graduated Random Rounding 
Original cell size Rounding base
0–99 5
100–1,000 10
1,000 + 100
How does this affect the final data?

This method can slightly reduce the accuracy of the reported data. Totals are rounded independently from the other cells. Because of rounding, these other cells may not sum to their stated total.

top

Microdata

Statistics NZ also provides researchers access, under specific conditions, to anonymised unit record datasets, which are known as microdata. Every individual (person, enterprise, event, etc.) has one record in the dataset.

Statistics NZ treats microdata datasets with extreme care and only allows access under specific conditions that meet the requirements of the Statistics Act 1975. Currently researchers can access microdata using:

  • the Data Laboratory
  • Confidentialised unit record files.

All microdata is anonymised by removing direct identifiers. These include name, address, telephone number. We use a range of additional confidentiality techniques, when needed, to ensure the confidentiality of respondents.

top

Data Laboratory (Data Lab)

Approved researchers can work with anonymised microdata in secure Data Labs, located in our Wellington, Auckland, and Christchurch offices. These are carefully controlled environments where researchers perform analysis on datasets they are allowed to access.  More detail about microdata access protocols is available on our website. top

Confidentialised unit record files (CURFs)

Since 2004, Statistics NZ has been producing confidentialised unit record files (CURFs). A CURF is a highly confidentialised version of microdata provided to a researcher on a compact disc. Researchers must apply to access a CURF. Unit record files are can be confidentialised by removing variables, collapsing categories, modifying unusual unit records, or swapping the values of variables between records. Many of these methods could be used to confidentialise a single file. More detail about microdata access is available on our website. 

A researcher may access a CURF if:

  • the CURF is only used for bona fide statistical purposes
  • Statistics NZ is assured that the researcher and his/her institution have a proven track record of keeping data secure and meeting the terms of the CURF agreements.

Future methods

Methods for providing researchers with access to microdata are developing rapidly. Statistics NZ is keeping itself updated on those methods.

Published: 29 October 2015

  • Share this page to Facebook
  • Share this page to Twitter
  • Share this page to Google+
Top
  • Share this page to Facebook
  • Share this page to Twitter
  • Share this page to Google+