Stats NZ has a new website.

For new releases go to

As we transition to our new site, you'll still find some Stats NZ information here on this archive site.

  • Share this page to Facebook
  • Share this page to Twitter
  • Share this page to Google+
Small Area Estimation of Unemployment: From Feasibility to Implementation

Soon Song
Statistics New Zealand 


Statistics New Zealand is exploring model-based approaches to produce unemployment rates at the territorial authority (TA) level, in response to the demand for small area statistics to support planning, decision making, and service delivery at a local area level. The Household Labour Force Survey (HLFS) is the main source of national and regional level information on the labour market. Statistics NZ does not publish unemployment-related statistics at territorial authority (TA) level using survey direct estimates due to the insufficient sample size at the TA level.

Statistics NZ has undertaken various research projects to produce TA-level unemployment rate using HLFS sample data since 2003. In 2009, we investigated the usability of a model developed for a research programme funded by Eurostat called Enhancing Small Area Estimation Techniques to meet European Needs (EURAREA). Our investigation was positive towards producing unemployment rates using HLFS sample data. In 2010, we proposed to produce an experimental series for TA-level model-based quarterly
unemployment rates, using HLFS sample data and empirical best linear unbiased prediction (EBLUP) models in EURAREA.

Currently, we use quarterly population estimates at national level for the HLFS benchmarks. These benchmarks are incorporated into the weighting process system. We do not have quarterly population estimates at TA level to use as a TA-level benchmark. In this paper, we propose to produce the TA-level quarterly population estimates. This is a ratio method, which combines two sources of population estimates, TA-level yearly population estimates and national-level quarterly population estimates. Firstly, we can calculate the TA-level proportions of sex by age groups against the national-level total of sex by age groups.

Secondly, we can multiply these proportions to the national-level quarterly population estimates to produce the TA-level quarterly population estimates. With these TA-level quarterly population estimates, we propose three options for producing TA-level weights. These options are:

  • using the final original weight without alteration
  • by direct post stratification 
  • by adjusting the final weight.

We decided to use the option of adjusting the final weight. As a result of the TA-level quarterly population estimates and the adjusted final weight, we could produce estimates of count and rate statistics at TA level for unemployment, employment, and not in the labour force.

We tested all models in EURAREA in 2009 project and recommended using the EBLUP models with covariates of sex, age, and benefit recipients. Based on the recommendation, we also tested EBLUP models with the proposed covariates as well as ethnicity. In order to identify a best model, we conducted the following steps.

Firstly, we identified significant covariates for two target variables independently, unemployment and employment. We used the SAS proc mixed procedure to identify the significant individual variables, which are sex, age, ethnicity, and benefit recipients for model covariates as EURAREA lacks this particular functionality. Secondly, we produced mean square errors (MSE) for unemployment proportions using several combinations of covariates, which were the model fit indicators for EBLUPA and EBLUPB.

The combinations of covariates were ‘sex and age’, ‘sex, age and ethnicity’, and ‘sex, age, ethnicity, and benefit recipients’. We found that EBLUPA produced smaller MSE than EBLUPB, which used all covariates. Note that we did not attempt using interaction terms of covariates due to the complexity of organising model input datasets.

Thirdly, based on the four significant covariates, we produced model-based estimates for EBLUPA (EBLUP unit model) and EBLUPB (EBLUP area model). Also, we produced two direct estimates using the final weight and the adjusted weight at TA level. We investigated the four estimates with a time series graph for each sampled TA.

Note that we selected eight sampled TAs based on sample sizes of TAs for the purpose of presentation. The findings from the comparison of four estimates were:

  • the direct estimates from the small sizes of the sampled primary sampling units (PSUs) TAs showed a greater fluctuation over time than the model-based estimates 
  • the estimates of EBLUPA model were slightly higher than those of EBLUPB model 
  • direct estimates using the final weight were not much different from those using the adjusted final weight 
  • as sample size increased, the gap between the model-based estimates and the direct estimates was smaller.

Fourthly, we tested the EBLUP time series model and discovered much greater model errors than EBLUPA and EBLUPB models. Twelve quarters of test data may be insufficient to test a robust time series analysis. Therefore, we did not carry out further investigation of the EBLUP time series model.

Lastly, we produced two estimates: the direct estimates at regional level using the original final weight, and the estimates with summation of the TA-level model-based estimates for EBLUPA and EBLUPB separately. We compared EBLUPA estimates and EBLUPB estimates to the regional-level direct estimates separately to check if they were similar. We checked the following:

  • time series of estimates for EBLUPA, EBLUPB, and direct estimate
  • bias of estimates for EBLUPA and EBLUPB, based on the assumption that the regional-level direct estimates were unbiased 
  • coverage diagnostics for two model-based estimates against direct estimates.

We found that EBLUPB estimates were closer to the direct estimates than EBLUPA estimates.

So far, we introduced several methods to identify a suitable model: identification of significant covariates, comparison of MSE for various combinations of covariates and EBLUP time series model, and comparisons of regional level estimates. However, we were not able to decide on one conclusive model. In the end, we reached a practical conclusion of using average estimates of EBLUPA and EBLUPB as our final model.

Please contact the Information Centre ( if you would like a copy of this conference paper (PDF, 1.15Mb)

  • Share this page to Facebook
  • Share this page to Twitter
  • Share this page to Google+
  • Share this page to Facebook
  • Share this page to Twitter
  • Share this page to Google+