Stats NZ has a new website.

For new releases go to

www.stats.govt.nz

As we transition to our new site, you'll still find some Stats NZ information here on this archive site.

  • Share this page to Facebook
  • Share this page to Twitter
  • Share this page to Google+
2013 Census confidentiality rules and how they are applied

This page explains the 2013 Census confidentiality rules and how they are applied. Example Excel tables and flowcharts can be found in the ‘Available files’ box.

Confidentiality rules improved for 2013 Census

Statistics NZ has extensively reviewed the 2006 Census confidentiality rules in consultation with a range of users. As a result, we have refined and improved the rules for the 2013 Census.

The updated confidentiality rules apply to all new requests for census data after the first release of 2013 Census data on 3 December 2013. This includes requests for census data from 2001 and 2006. All census data from 1996 and earlier will have random rounding to base 3 as the confidentiality process.

There are eight confidentiality rules, supported by the overarching confidentiality principles of the Statistics Act 1975. Each rule is explained on this page. The explanations include examples of situations that would pass and fail the 2013 rules.

The most notable changes to the confidentiality rules from 2006 include:

  • removal of the 2006 income rule
  • extension of the rescue by threshold rule (large cells in any table are now rescued from suppression), meaning that any table is now available to users, although in some very detailed tables, with small counts in each cell, all the counts may be suppressed
  • better definition of the rules for proportions and measures (which previously existed in practice)
  • extension of the release under licence of less confidentialised counts.

2013 Census confidentiality rules – summary of changes from 2006 has more detail.

Important information about applying the rules

There are eight rules. The first four rules are for tables of counts. The remaining four rules apply to proportions, ratios, percentages, and measures (such as means, medians, and quartiles) created from tables of counts.

The first four rules must be applied in order from Rule 1 to Rule 4. Rules 1 and 2 are decision rules that determine whether Rule 3 (suppression of counts < 6 in ‘sensitive’ tables) should be applied. Rule 4 (random rounding) is always applied.

Rule 5 is applied to the calculation of proportions, ratios, and percentages. Rule 6 decides when to suppress a measure, while Rule 7 controls the calculation of measures. Finally, Rule 8 controls the conventions for the rounding of measures.

The overall effect of these rules is that all tables can be released, but some of them (those deemed ‘sensitive’ by Rule 1 or Rule 2) will have counts below six suppressed. All remaining counts for release are randomly rounded.

The rules are applied separately to the table for each geographic area. ‘Geographic area’ is a generic term used to describe the various standard Statistics NZ output geographies, such as meshblocks and area units.

The rules must be applied to all census data released by Statistics NZ, and have been automated in the data output systems where possible. Exemptions to the rules must be documented and approved by the General Manager Census.

The 2013 Census confidentiality rules are for use with tables of counts from census data and the proportions and measures derived from them. They do not fully cover the requirements for protecting unit record data (microdata). Access to census microdata can be granted to researchers through the Statistics NZ Data Lab. Data Lab processes protect the microdata, and regulate the output in ways that are equivalent to these rules.

See more information about accessing our microdata 

Rule 1: Meshblock rule

The purpose of this rule is to provide targeted population protection for data about meshblocks. This is because they can be small in both population and physical size. The rule dates from the 2001 Census.

The rule outlines how to determine whether meshblock tables are sensitive, and therefore require suppression of small numbers by Rule 3 (discussed below).

If the geographic unit for the table is a meshblock, these two questions are asked:

  • Does the table use one variable, and is it at a level more detailed than its highest level? Eg the highest level for income has seven categories (‘grouped’ income) whereas the more detailed level has 17 categories.
  • Does the table use two or more variables? Eg sex and age.

If the answer to either question is yes, the table is deemed to be sensitive and Rule 3 is applied. If the answer to both questions is no, then the table is not sensitive under Rule 1, but it may be under Rule 2.

A table in which the geographic area is a non-standard grouping of meshblocks is treated as if it were a meshblock. Standard groupings of meshblocks into higher geographies are not subject to the meshblock rule, but still need to be tested by Rule 2. Standard groupings include those boundaries maintained by Statistics NZ for external users, eg the health domiciles used in health funding, and police areas, districts, and stations.

See more information on geographic hierarchy

Please see the Excel files in the ‘Available files’ box for the following examples:

  • Table 1: example that passes Rule 1 because it does not use a detailed variable
  • Table 2: example that fails Rule 1 because it uses a detailed variable at meshblock , and is suppressed by Rule 3.
  • Table 3: example that fails Rule 1 because it uses two variables at meshblock level, and is suppressed by Rule 3.

Rule 2: Mean cell size rule

The purpose of this rule is to provide targeted protection for the small counts in tables that are likely to have sparse areas in them. This rule continues to allow access to counts in tables that are less likely to have sparse areas.

The mean cell size is applied separately at each geographic level and is calculated like this:

Image, 2013 calculation for mean cell size rule.

Each geographic unit in a table is sensitive if the mean cell size is less than or equal to 2. Rule 3 is therefore applied to each sensitive geographic unit.

A table may use a second geographic variable (for example, workplace address) as one of its categorical variables. Such a table is always deemed to be sensitive, and Rule 3 must be applied. A table with two or more geographic variables will therefore always have counts less than six suppressed, eg a usual residence by workplace address table. This does not apply to a table containing two classifications of the same geographic variable (eg using usual residence at area unit and regional council levels). Such a table would still be tested by the rules, like any other table.

Exceptions to Rule 1 and 2

A table may contain the total count for its subject population by geographic area only, and not use any other variables. Such a table is not subject to Rule 1 or Rule 2, and is treated as not sensitive. Rule 3 is therefore not required, and only Rule 4 needs to be applied.

Please see the Excel files in the ‘Available files’ box for the following examples:

  • Table 4: example of geographic areas passing and failing Rule 2 (table using one variable), showing no suppression for the former, and suppression and random rounding to base 3 for the latter.
  • Table 5: example of geographic areas passing and failing Rule 2 (table using two variables). This table demonstrates how suppression can occur depending on how cross-tabulated variables interact.

Rule 3: Threshold rule

The purpose of this rule is to enhance both utility and safety. The larger and more useful counts of six or more can always be released. The smaller and more disclosive counts are protected, where tables are likely to be sparse or related to small subject populations.

The rule outlines when counts should be suppressed: when the table for a geographic unit is deemed sensitive by either Rule 1 or Rule 2, all counts of less than six are suppressed and replaced by a ‘..C’ symbol.

Please see the Excel files in the ‘Available files’ box for the following examples:

  • Tables 2 to 5: examples of suppressed sensitive cells that have failed Rule 1 or 2.

Rule 4: Random rounding rule

Random rounding to base 3 involves randomly rounding every unsuppressed count in a table to a number divisible by three. Statistics NZ applies random rounding to base 3 to census outputs by rounding values to:

  • the nearest multiple of three with a probability of two-thirds (applied approximately two-thirds of the time)
  • the second closest multiple of three with a probability of one-third (applied approximately one-third of the time).

Values that are already multiples of three are left unchanged.

The reasons for randomly rounding all counts, large and small, to base 3 include:

  • disguising the small counts of 0, 1, 2, and 3, which have the highest disclosure risk
  • protecting against the recalculation of small counts from differencing large counts
  • retaining almost all of the statistical properties of the table by adding only a little noise to the larger counts.

Each value in a table is rounded independently, including the totals. This means that the marginal totals can differ slightly from the corresponding sum of the rows or columns, ie if the columns or rows in a table are added, they will not always equal the total given. Almost all the statistical properties of the table are retained, as the values will never be more than two digits higher or lower than the original number.

See the ‘Available files’ box for a flowchart showing how a table of counts is tested and treated by the confidentiality rules.

Rule 5: Proportions from counts rule

The purpose of this rule is to prevent the reconstruction of the original unrounded counts.

All proportions, percentages, and ratios (not totals) that come from counts must be calculated using fully confidentialised data. A proportion will only be available for release if both counts in the proportion have not been suppressed.

Rules 6, 7, and 8: Rules for measures

Rules 1–5 cover the confidentiality of tables of counts. Census tables can also contain medians and other measures from the distribution of numerical variables (or variables in which the categories have been assigned numerical values, eg annual income). The purpose here is to maximise the quantity of values released, while protecting particulars about individuals. The three rules are as follows.

Rule 6: Suppression for measures rule

A measure is suppressed if the total unrounded count of individuals contributing to the measure is less than the threshold value for that type of measure. For each measure there is a threshold value:

  • for means and medians this threshold value is six, so any table with less than six individuals/households/families etc contributing to it will not have a median or mean shown
  • for quartiles the threshold is 12
  • for quintiles it is 15
  • for deciles it is 30.

If other measures are requested, the census confidentiality assessment team will decide this value.

Rule 7: Use of rounded counts for calculating measures rule

All measures need to be calculated from randomly rounded counts.

Rule 8: Rounding of measures rule

All measures have simple conventional rounding applied. Different variables require different levels of rounding:

  • measures from annual income are rounded to the nearest $100
  • measures from weekly rent paid are rounded to the nearest $10
  • measures for age are rounded to one decimal place
  • measures from whole number count variables are rounded to one decimal place.

See the ‘Available files’ box for a flowchart showing how proportions and measures are tested and treated by the confidentiality rules.

Release under licence of less confidentialised counts

We have made an extension to the licensed release of less-confidentialised counts. We may offer licence agreements to institutions for specific projects, meaning that:

  • the institution agrees to work within the licence agreement
  • all individual users within the institution sign the user undertaking
  • our count data is released to the institution, with random rounding only applied to the counts.

A Statistics NZ confidentiality assessment team assesses a request for such an agreement. The team retains the right to decide whether the project justifies the release of less-confidentialised tables. The final decision rests with the General Manager Census.

In a release under licence agreement, the confidentiality protection shifts from suppressing data (as done for public release) to the context in which the data is used. The tables will be released, but only under licence conditions, and strictly for internal use only within the institution.

Licence agreements enable fuller use of census data by key users, for statistical purposes only. The new agreement expands on the one created for 2006 Census data, and is consistent with that used for basic confidentialised unit record files (CURFs).

Confidentiality principle

Under the Statistics Act 1975, Statistics NZ employees are required to withhold any output that may identify the characteristics of a particular person or undertaking. If, despite Rules 1–8, there is any reason to suspect that an output may identify the characteristics of a particular person or undertaking, the data should not be released and advice must be sought from the General Manager Census.

More information

2013 Census confidentiality rules – summary of changes from 2006 

Read more about the privacy, security, and confidentiality of information supplied to Statistics NZ 

Contact our Information Centre for specific queries 

Request customised data 

Use NZ.Stat to explore our data 

Use DataInfo+ to explore our metadata 

Published 3 September 2013

  • Share this page to Facebook
  • Share this page to Twitter
  • Share this page to Google+
Top
  • Share this page to Facebook
  • Share this page to Twitter
  • Share this page to Google+