Stats NZ has a new website.

For new releases go to

www.stats.govt.nz

As we transition to our new site, you'll still find some Stats NZ information here on this archive site.

  • Share this page to Facebook
  • Share this page to Twitter
  • Share this page to Google+
2013 Census variable quality rating scale

Icon, 2013 Census information about data and the census.  2013 Census information by variable – Information that helps with understanding our census data, covering matters such as non-response rates, comparability over time and data quality.

The table below explains the ratings used to describe the quality of variables output from the 2013 Census. It is a guide for data users.

A number of factors can affect the quality of a variable:

  • The quality management strategy for the 2013 Census assigns a level of priority to each variable. Age and sex, for example, are considered foremost variables. Foremost variables are core census variables that have the highest priority in terms of quality, time, and resources applied across all phases of the 2013 Census.
  • More complex questions may be harder for a respondent to answer, which can affect data quality.
  • Non-response to questions can vary and the level of non-response helps determine the overall quality level of the variable. 

Quality rating

Very high quality data

Characteristics of the data in the 2013 Census dataset

Fit for use

Data has either no data quality issues or a few very minor data quality issues that have very little effect on the data.

What does this mean in practice?
  • Any issues with the variable appear in a very low number of cases (typically fewer than a hundred). Non-response or the amount of unclassifiable data is very low (or data is imputed so the data does not have this category).
  • Time series is consistent.  
Example

Age is very high quality variable, and is a foremost variable. If this question is not answered, age is imputed so there is no non-response.

top

High quality data

Characteristics of the data in the 2013 Census dataset

Fit for use

Data has only minor data quality issues.

What does this mean in practice? 
  • Any issues with the variable appear in a low number of cases (typically in the low hundreds).
  • Non-response or the amount of unclassifiable data is acceptably low (typically, non-response rates of around 5 percent or less are considered acceptable).
  • Data looks sensible and reflects reality.
Example

Number of motor vehicles is a high-quality variable. This variable has low non-response and a consistent time series. top

Moderate quality data

Characteristics of the data in the 2013 Census dataset

Fit for use

Data has various data quality issues involving several categories or aspects of the data, or an entire level of a hierarchical classification.

What does this mean in practice?
  • These issues can include undercounts/overcounts for some categories, inconsistencies with other related data, and problems with the classification or coding of data.
  • Non-response or the amount of unclassifiable data may be higher than is desirable (typically, response rates of around 5 percent or less are considered acceptable). The higher the non-response rate, the more likely that non-response can affect data quality, particularly if non-response is higher among certain groups.
  • Some limitations on use and interpretation of the data.
Examples

In 2013, non-response for total personal  income is around 10 percent, with non-response much higher among some ethnic and age groups. Despite this higher-than-desirable non-response rate, time series data is consistent. Age and sex distribution of income is also consistent.

For some variables with a write-in response category – such as occupation, qualifications, and subject studied – vague responses result in coding issues. Some responses are not able to be coded to a category, thereby increasing the ‘not specified’ category. For paper forms, write-in responses that were illegible also caused difficulties. In general, data quality on Internet forms is better as responses are typed and not hand-written.

top

Poor quality data

Characteristics of the data in the 2013 Census dataset

Fit for use – with caution

Significant data quality issues emerged during evaluation. Data should be used with care.

What does this mean in practice?
  • The data may contain one or more of the following issues:
    • high non-response rate that could lead to potential bias in the data – for example, extended family income has a very high non-response rate
    • significant coding and classification problems.
  • Data appears inconsistent over time or when compared with other data sources. This can sometimes be due to changes in the questionnaire or to external circumstances.
Examples

Changes to the 1996 Census questionnaire resulted in responses to the ethnicity question being inconsistent with 1991 and 2001 data. Another example is the change to the national qualifications framework, which makes it difficult to compare qualifications data for 2011 with data for 2006 and 2013.

top

Very poor quality data

Characteristics of the data in the 2013 Census dataset

Not fit for use

Data has major quality problems that mean it is not fit for use.

What does this mean in practice?

Poor data quality due to:

  • respondent misinterpretation, major coding problems, or extremely high non-response
  • data that does not reflect reality and should not be used at all – this is a very rare situation.
Example

The civil union category of legally registered relationship status is completely inconsistent with administrative data, and may stem from a general misunderstanding of the term civil union.

top

Published 3 December 2013

  • Share this page to Facebook
  • Share this page to Twitter
  • Share this page to Google+
Top
  • Share this page to Facebook
  • Share this page to Twitter
  • Share this page to Google+