Stats NZ has a new website.

For new releases go to

As we transition to our new site, you'll still find some Stats NZ information here on this archive site.

  • Share this page to Facebook
  • Share this page to Twitter
  • Share this page to Google+
5 – Concepts and compilation challenges

Section 4 provided an overview of what productivity is, and briefly covered what the main compilation challenges are. This section discusses these challenges in more detail, as well as covering the other challenges that have come up during the feasibility study.

This section starts with an overview of what is internationally considered to be best practice in measuring change in government output and productivity. It then discusses in all of the challenges for measuring output and productivity (including the main challenges already set out in section 4): there is one sub-section for more general challenges, and another sub-section for those challenges that are specific to measuring government services.

5.1 Developments in best practice

Much progress has been made recently around the globe in improving measures of non-market output. Little specific attention has been paid to non-market inputs, due to the fact that markets for the inputs do exist (although there is a debate about the impact on labour markets of monopsony employers, for example) and prices which are more-or-less market prices are available. Accordingly, the measurement issues for constructing estimates of non-market inputs are no more difficult than for the market sector.

Over recent years, various publications have incrementally improved the guidance available to those wishing to construct estimates of non-market output. Table 2 presents a list of the publications with international guidance on the measurement of non-market output and productivity.


Table 2 International guidance on measuring non-market output and productivity


Organisation(s) responsible

Type of guidance on measurement of government output


System of National Accounts, SNA (1993), new version SNA 2008 under preparation

UN, OECD, World Bank IMF, and European Commission Document prepared by Inter-secretariat Working Group on National Accounts. Approved by UN Statistics Commission

High level guidance

International standard

European System of Accounts ESA (1995)


Fully consistent with SNA 1993, more focused on the circumstances and data needs of the European Union

A legal basis to ensure strict application, providing harmonised statistics

Eurostat Handbook on Price and Volume Measures in National Accounts (2001 edition)


Expansion of ESA 1995 guidance distinguishing activities, output and outcomes. Introduces A/B/C score for methods of Member States

Develops ESA 1995 to ensure harmonised price and volume data, now legally mandated

OECD Manual Measuring Productivity (2001)


Comprehensive guide to productivity measurement

No formal status, but indicates desirable properties of productivity measures

Atkinson Review: Final Report Measurement of Government Output and Productivity for the National Accounts (2005)

Sir Tony Atkinson

Comprehensive guide to measuring output and productivity for non-market government services

Accepted by the UK’s National Statistician; the basis for Eurostat and OECD thinking on how to measure non-market output

Source: modified from Table 3.1 of Atkinson Review: Final Report


Due to be published in 2010, the OECD has been writing a manual on the measurement of non-market health care and education output. It is understood that this manual draws on all of the existing guidance listed in table 1, and is evolutionary rather than revolutionary. In particular, on the vexed question of quality measurement, the manual is expected to draw the conclusion that there is as yet no international consensus on how quality change in health care and education should be measured, and how such measures should be incorporated with the existing quantity estimates of output.

5.2 Scope of government productivity measures

This section is repeated from section 4.4.1 in order for section 5 to comprehensively cover all challenges and issues..

A key question which needs to be addressed concerns the scope, or coverage, of government productivity measures for health care and education. There are a number of perspectives from which productivity performance is of interest, and from each perspective the question, and therefore the answer, is not necessarily the same.

From the National Accounts perspective, as well as the economy-wide productivity performance perspective, the question would be, ‘how much does the health care or education industry contribute to total economic output?’ Here the scope is defined by industry (according to the ANZSIC classification). A first step should be to address the industry perspective to, provide estimates of government productivity that are consistent with Statistics NZ’s existing market sector productivity estimates.

From the perspective of those in charge of public sector service provision, one of the economic questions might be, ‘how do publicly-owned parts of the health care and education systems contribute to the economy, and how is the associated productivity changing over time?’ Here the scope is defined by whether the production is carried out by the public or private sector.

From the perspective of taxpayers, the question might be, ‘how well are taxpayer funds, or government controlled funds, being used in delivering health care and education?’ Here the scope is defined by the source of financing. Variations on a theme are provided by whether this question is narrowly defined to cover only Ministry of Health or Ministry of Education funding, or other parts of the public sector – such as the Accident Compensation Corporation, Ministry of Social Development, prisons and the armed forces – incurring expenditure on health care and education.

These scoping questions matter, as the information requirements differ and perhaps more importantly the end results will also differ.

This feasibility study does not offer a single answer to these particular scoping questions, but rather sets out how sources and methods can be combined for whichever of the different measures is required. Scoping issues specific to health care and education are discussed in greater detail in sections 6 and 7.

Recommendation G1

Any implementation of this study should be clear what the question(s) associated with any requested productivity measure is (are), with particular emphasis on the perspective of the measure.


Recommendation G2

A first step in implementing this study should be to address the industry perspective, to provide estimates of government productivity that are consistent with Statistics NZ’s existing market sector productivity estimates.

5.3 General issues relating to output, inputs, and productivity measurement top

This section sets out and discusses the particular issues associated with the measurement of output, inputs and hence productivity, and suggests possible solutions. This section’s main focus is on issues relating to the measurement of output, because this is the topic which presents the most difficulty, mainly relating to the lack of prices but also the fact that what is produced is almost exclusively services rather than goods which are inherently more difficult to define and measure. The measurement of government inputs, in contrast, is little different from the measurement of market inputs.

5.3.1 Terminology: quantity and quality

The terminology used in this report is consistent with that used in other major reports on the topic, for example the SNA and the Atkinson Review.

Extra care is taken to distinguish between the quantity and quality components of the volume of output (and inputs, for that matter) to help avoid ambiguity and confusion, and ensure consistency in use:

  • Quantity relates to the number of units being measured; for example, number of hospital operations or GP appointments; and
  • Quality relates to change in the set of characteristics of the units being measured; for example, that the hospital operations are more effective or the GP appointments are more convenient.

Recommendation G3

Consistent terminology should be adopted and consistently used to avoid ambiguity and confusion. In this feasibility study, the term ‘quantity’ refers to the number of units being measured whereas ‘quality’ refers to change in the set of characteristics of the units being measured.

5.3.2 Combining distinct measures of quantity and quality

Measuring changes in quantities of output using a system of disaggregation and differential weighting, such as is used in casemix weighting means that elements of quality change are captured implicitly (although not all – changes in quality within any of the casemix groupings will still not be captured). The term casemix refers to the blend of different types of treatment provided in hospital. In New Zealand, as in many other countries, the classification used to identify changes in casemix is the Diagnosis Related Groups (DRG) classification. For each type of treatment in the DRG classification, an average cost is calculated which can be used as a weight. Differentiation between various types of output is the National Accounts’ main tool for incorporating quality change alongside quantity change.

Two further techniques are available:

  • adjusting the existing measures of quantity change using a measure of quality change; and
  • defining the measure of output in terms of quality.

The difference between the two techniques can be seen through an illustration. If the quantity measure of output is number of hospital operations, and the associated quality measure is success of those operations, then a model which combined the two measures (for example, a multiplicative model which valued proportionate change in both quantity and quality equally) is an example of the first technique.

If the unit of output is taken to be success of hospital operations, then this is an example of the second technique (note that this latter technique could incorporate a quantity element if not limited to, for example, ‘average’ degree of success but total success across all patients).

Countries which have been examining the extent to which existing quantity measures of non-market output can be improved using methods for quality adjustment have adopted a cautious approach, given the lack of a consensus at the international level on how best to carry out this adjustment (see sections 6.2.1 and 7.2.1).

Recommendation G4

A cautious approach should be taken in combining measures of quantity and quality change in health care and education output, with wide and transparent discussion of options and careful building of a consensus before decisions on methods are adopted. Until then, quality change should not be incorporated into measures of quantity change in output.

5.3.3 Level of disaggregation for the measure of output top

What constitutes a suitable level of disaggregation for the measure of output? The level of disaggregation can matter a great deal. For example, an increase implemented in 2004 in the level of disaggregation used in the UK’s health care output measure (designed to capture changes in casemix better) impacted directly on estimates of total UK GDP growth. This change, along with a few other improvements to the health care output methodology, had such a large impact that it altered the official estimate of total UK economic output (which grew by an extra 0.1 percentage points in 2002 and 2003).

In deciding on the level of disaggregation, there are two criteria. The practical criterion is that the level of disaggregation should not be too fine: too many categories and calculation becomes burdensome. Also, with increasing levels of disaggregation, the risk of having categories with no activity count also increases: a zero in the current year means an infinite decrease (which is calculable but undesirable) and a zero in the base year means an infinite increase (which is not calculable at all). As an aside, the method for dealing with new output is for it to be subsumed within existing categories until it is important enough to be separately identified.

The conceptual criterion is that the taxonomy should distinguish between homogenous and heterogeneous activities: activities within a category of the classification should be similar in terms of the characteristics which are of value to the consumer, and there should be as many categories as there are different combinations of characteristics. When considering what the characteristics that consumers value are when purchasing a car, the answer may include: speed, colour, fuel efficiency, sound insulation, equipment specification, brand, durability, length of warranty, and many other factors besides. The answer is no less simple in the field of health care, where the characteristics may include, for example: diagnosis, treatment, complexity, comorbidity, severity, speed of access, convenience, cleanliness of premises, availability of choice, and so on.

The idea of substitutability should ideally enter into the definition of what constitutes homogenous output, whereby treatments that have the same outcome for patients should appear in the same category. For example, grouping together psychotherapy and drug therapy where they are substitutes should mean that any gains in efficiency through substitution will be captured in the productivity index. However, such grouping on the basis of substitution has not been implemented anywhere in the world as yet, although there are specific examples of its impact in limited areas. See, for example, Price indexes for the treatment of depression (Frank 1999), Measuring the value of cataract surgery (Shapiro 2001), and Measuring health care output in the UK: a diagnosis based approach (ONS 2004).

A further complication concerns whether or not to take into account differences in the mix of people going through the health care or education system. For example, if in one year the cohort of children beginning school has a higher starting point in terms of educational status, all other things being equal the school might need to do less to get the same exam scores. Or the school might provide the same level of service as for the previous cohort, but the children achieve better exam scores simply because of their different starting point. Ideally, such differences in the mix of patients, schoolchildren, and so on, ought to be taken into account in the measure of the school’s or hospital’s output. In practice, this would be very data intensive.

Typically, the decision on the level of disaggregation is based on what information and classifications are already available, rather than on purity of concept. DRG style classifications are being adopted by those countries which have implemented such classifications, and is currently the approach in New Zealand. It would be worth comparing the results from different levels of disaggregation and use of different classifications to understand what impact casemix and other factors have on output estimates.

The existing Ministry of Health care productivity methodology uses the DRG classification to differentiate between different types of hospital activity, whereas the Ministry of Health’s system for reimbursing hospitals for their activity also uses, for example length of stay as well as DRG. This reflects different purposes: financing versus productivity measurement. The financing purpose requires the disaggregation method to distinguish on the basis of different costs, while the productivity purpose requires the disaggregation method to distinguish between types of activity. What is appropriate for one purpose may not be appropriate for the other.

Recommendation G5

Statistics NZ and the Ministries of Health and Education should explore further what level of disaggregation is most suitable in the New Zealand context, to understand the impact on estimates of output and productivity, and to inform the choice of this level. The choice of which to adopt should be reached after wide discussion and consensus building.

5.3.4 Cost versus value weights top

The index number methodology (which is discussed later in this report, see section 5.3.8) requires that the growth rates of different types of output are weighted together in a way that reflects their relative importance.

In a perfect market equilibrium situation, marginal cost equals marginal price, with this market-clearing price reflecting the fact that both consumer and producer place equal value on the good or service at the margin. This equilibrium price is the measure of relative importance to be used as weights for combining the growth rates of different types of output.

In reality, markets are not perfect for a number of different reasons, including for example the fact that consumers and producers do not have perfect information (about differential price levels, product specifications and so on), and the amount paid by the consumer is not the amount received by the producer due to the existence of taxes and subsidies and so on. In a non-market situation, there is no market-clearing mechanism, and it cannot be taken for granted that the consumer and producer both place the same value on any particular good or service. Indeed, in the health care and education sectors, there are very few meaningful prices at all: where payments are made, these are usually heavily subsidised.

A main purpose for subsidising some types of goods and services is to boost demand: more people will consume goods and services at relatively lower prices. Therefore, the amounts paid by the consumer for subsidised goods and services cannot be seen as reliable estimates of the value of those goods and services relative to other things that consumers choose to spend their money on. Total costs – the sum of the costs borne by the consumer and those costs borne by the institution paying out the subsidy – is a more reliable valuation of the goods and services that can be interpreted as the value from the perspective of the society as a whole (including both government and individual consumer).

An alternative to using costs as the means for measuring relative importance is to assign value on the basis of final outcomes, such as changes in life expectancy associated with a health care treatment or changes in educational status. An immediate problem with using this type of valuation is that it is not monetary (as used elsewhere in economic accounting): the units of measurement would be in terms of health or educational status, which cannot be added together directly. Any conversion would mean placing a money value on health and education status, for which there is little experience in economic accounting. One (future) means of arriving at a monetary valuation might be provided by ongoing work to measure ‘human capital’. The idea is that in order to live full and happy lives, people need to invest in different aspects of life, including health care and education. This perspective lends itself to capital-style accounting, and hence the term human capital. Human capital is understood to have many components, and there is as yet no consensus on definitions and measurement.

A further problem is that final health and education outcomes are influenced by factors other than the services provided by either the health care or educational system. These factors are many. To improve their health, people may begin going to the gym regularly, quit smoking, eat more healthily, and so on. To increase the chances of their children getting the best out of their education, parents may spend more time helping their children with homework. Increases in health and education status due to external factors such as these should not be included as part of the output of hospitals and schools.

Ideally, when using final outcomes as the valuation method, it is the final outcome for the marginal consumer that should be used. According to economic theory, it is only at the margin (that is, for only one consumer) that the consumer’s valuation matches the producer’s costs of production: for all other consumers, the price that the consumer would have been willing to pay is either higher than the price set by the market (giving rise to consumer surplus) or it is lower (in which case the consumer would not have made the purchase).

Information on final outcomes for the equivalent marginal customer (the one for whom the benefit in terms of, for example, improved health or education status matches exactly the costs of production) are not available. Instead, only the average final outcome would be available. This valuation would therefore incorporate any consumer surplus and it may indeed incorporate any ‘producer surplus’, where the gain is less than suggested by the cost of production. Therefore, the average final outcome may not be a good estimate of the equivalent of the equilibrium market valuation. As already discussed in section 4.2, efficiency may not be the only criterion used when judging whether or not to provide the service. For example, providing equity of access to health care and education services for the whole of the population may mean that the efficiency of providing some services in rural settings is less than that in urban settings.

The international consensus on how to combine different types of health care and education output is that cost weights are appropriate. This is not for reasons of conceptual purity, but the fact that, typically, costs are systematically available for most, if not all, types of health care and education provided, whereas other types of weight are not. It is important to note that cost and other types of weight reflect different valuations from different perspectives and none is ‘wrong’ or ‘right’ in the context of measuring change in non-market sector output. Instead, they should be interpreted as they are; that is, reflecting the different perspectives of producer and consumer.

Costs of production are a reflection of the value placed on the good or service by the producer. It is possible to take the perspective of the consumer, and imagine how a set of relative weights might be formed. One way could be to collect information on consumer preferences, asking how much consumers might be willing to pay. Given the information asymmetries that typically exist in the market sector, this may not be a good solution.

A joint project by the University of York and the National Institute of Economic and Social Research in the UK, Developing new approaches to measuring NHS outputs and activity (York 2005), recommends that the ideal set of weights for combining different types of health care output is one which identifies the relative health benefits of treatment, measured in terms of QALYs (quality adjusted life years). Putting aside the fact that QALYs only take into account the health benefits dimension of what patients value (and ignore much of the patient experience dimension), the key problem with any other set of weights than cost weights, such as QALY weights, is that they are not systematically available.

The set of costs weights should reflect the total costs of producing the good or providing the service. The key point about weights is that they are relatives: they should demonstrate the relative importance of one type of good or service to all others. Note that a benefit of making the weights total cost weights (weights that sum to total expenditure rather than simply cost relatives or ratios that do not sum to total expenditure) is that total expenditure on all output will sum to total expenditure on all inputs: this relationship is a good check on the statistical quality (in terms of the comprehensiveness of coverage) of the output measure.

Recommendation G6

In order to weight together the growth rates of different types of health care and education in a composite measure of total output, the relative weights should be total cost weights. Examining the impact of other types of weight may be useful in understanding different perspectives, for example in cost / benefit analyses.

5.3.5 Comprehensiveness and representativeness top

In measuring the productivity of the health care and education sectors, it is important to comprehensively include in the measure all health care provided to patients, and education provided to pupils, given whatever scope is decided (see section 5.2). Without comprehensive coverage of all relevant activities, assumptions would have to be made about the relative growth rates of those activities which are not included in the measure, and this will introduce bias if the actual growth rates differ from what is assumed. (Alternatively, the labelling of any partial measure would need careful crafting.)

Recommendation G7

Any measure of output should be as comprehensive as possible in terms of the coverage of the types of health care provided to patients or education provided to students.

If comprehensiveness is not possible, then the second best solution is to strive for representativeness: growth in some types of health care or education activity, for which figures are available, may be considered to be representative of the growth rates for other types of health care activity for which figures are not available (although there is still a requirement for some kind of evidence of growth of the latter).

There may well be types of health care and education for which there is neither quantitative nor qualitative information about change over time. The usual practice adopted in such cases is to assume that growth in unmeasured activity is the same as growth in measured activity.


Recommendation G8

Where quantitative information on change over time is not available for some types of services, there may be qualitative information about change which can be used to make informed decisions about the use of proxy measures (for example, growth in some types of activity for which figures are available may be considered to be representative of the growth rates for other types of activity for which figures are not available). For those types of services for which neither quantitative nor qualitative information on change over time is available, growth should be assumed to be the same as growth in measured activity, or labelling would need to be clear about how partial the measure is.


Hint G1

The extent of coverage of all health care and education activities, (in terms of measured activity as a percentage of total activity) is one measure of the statistical quality of the output measure (see section 5.4.1).

5.3.6 The 80:20 rule top

International development experience has shown that there are some aspects of measurement that take little resource and have large impact (whether that be on the estimates or on perceptions of statistical quality), while there are others which consume large amounts of resource and lead to little improvement. A good example of the former is improving the level of disaggregation (see section 5.3.3) and of the latter researching how to measure quality change (see inter alia section 5.41). This is not to say that development effort should be concentrated on the former, but that it is important to manage and review development activity, trading off quick wins against longer-term projects.

Recommendation G9

A staged approach to implementation is recommended, giving higher priority to those areas of measurement that take little resource and have large impact.

5.3.7 Different forms of the productivity equation

The basic specification of a productivity equation, as the ratio of the volume of output and the volume of inputs, does not specify how output or inputs are defined.

The main choice for the measure of the volume of output is whether to use gross output or value added. Taking bread making as an illustrative example, the output of the baker would be the bread, the value added can be thought of as the value added to the flour and other materials purchased by the baker. In basic terms, the value added is defined as gross output less intermediate consumption.

The main choice for the measure of the volume of inputs is whether to take a measure of total inputs (in which case the productivity measure is referred to as ‘total factor productivity’); a measure of only a single factor of production (ie labour productivity or capital productivity); or a mix of labour and capital inputs (referred to as multifactor productivity).

Looking at a health care productivity measure, if interest lies in understanding the marginal extra value added by the health system (for example, the fact that medications are typically bought in and not produced by the government health sector, so are not part of its value added), then a value added single or multifactor productivity methodology should be constructed. If interest lies in understanding the total output of the health system, then a productivity measure based on gross output should be constructed.

Statistics NZ’s currently published estimates of productivity change for the measured sector use value added, rather than gross output, and are confined to labour and capital as the inputs. The reason for using value added and not gross output is that the published Statistics NZ National Accounts do not include volume estimates of gross output and intermediate consumption.

To complete its current suite of official productivity estimates, Statistics NZ requires a value-added multi-factor productivity approach. This feasibility study takes no stance on which is the appropriate measure for Statistics NZ to produce beyond that, but instead sets out how the various data sources and methods can be brought together to form any of these versions of the productivity methodology.

Recommendation G10

Statistics NZ should garner user views on the relative priorities of the productivity-specific questions and decide which one(s) should be answered over and above those required to expand its current industry-based suite of official productivity estimates

5.3.8 Index number methodology top

The current index number methodology used in Statistics NZ’s productivity estimates differs for the numerator and the denominator:

  • The numerator, a volume index of output change over time, is taken directly from the National Accounts and is used without modification, as are all measures of output change in the Statistics NZ productivity series. This avoids possible user confusion with having more than one (official) measure of output change. The New Zealand National Accounts measure of output volume change, as is the case in many countries, takes the form of a chained Laspeyres volume index.
  • The denominator, a volume index of inputs change, is constructed as a chained Tornqvist index, as recommended in, for example, the OECD’s Measuring productivity, (OECD 2001a), the productivity expert’s bible.

There seems to be no literature available internationally that recommends using different index number methodology for the numerator and denominator of the productivity equation. The history within Statistics NZ suggests that the rationale for having different index number methodologies for the numerator and denominator of the productivity equation lies in wanting to balance the desire to have output data that are consistent with the National Accounts and to employ the best index number methodology. This could be reviewed, particularly in the light of Statistics NZ’s recent work investigating the index number methodology for volume change in output, which concluded in favour of no changeover from the Laspeyres to the Tornqvist methodologies.

Recommendation G11

Statistics NZ should review the desirability of using different index number methodologies for the numerator and denominator of the productivity equation.

The index number methodology adopted in this feasibility study for both the output and inputs indices will be chained Laspeyres volume indices, at least in principle (data availability may mean that complete chaining is not possible). This is for a number of reasons:

  1. Use of the same conceptual weights (based on previous years’ prices) for the numerator and denominator is desirable, due to the valuation of the different output and inputs being in the same time period;
  2. The chained Laspeyres volume index is as used in the National Accounts;
  3. The valuation basis for the different output and inputs is straightforward, and therefore simple to explain to users;
  4. The information required to construct the chained Laspeyres volume index is similar to that required for other indexes, including the Fisher and Tornqvist indexes.
  5. The main disadvantage of Laspeyres over, say, Fisher or Tornqvist is that by fixing weights in the base period, they do not take into account changes in the mix of products over time (Fisher and Tornqvist have symmetric weights, and therefore fully take into account changes in product mix). This disadvantage is minimised by chaining: various empirical studies have shown that the difference in results between the use of chained Laspeyres and Fisher or Tornqvist is small.

The index number methodology used in this feasibility study, whether for output or for inputs then, is as follows:


Where p and v are the price and volume components, t is a subscript denoting time, and i is a subscript denoting the categories used in the classification for disaggregating.

An alternative, sometimes more easily computable, form of equation (i) is:


where wit-1 is the share of total expenditure on product i in time period t-1 (previous year).

Of course, while the prices and volumes for the individual components within the overall output and inputs indexes differ, in the non-market sector there is a (forced) relationship between total current price expenditure on inputs and total current price expenditure on output (more correctly for the latter, total current price costs). In the National Accounts, non-market output in current prices is defined to be the sum of input costs. There is, therefore, a quality check implicit in the productivity calculations: something is wrong if total output expenditure does not sum to total inputs expenditure.

Hint G2

As current price expenditure on non-market output is valued according to input costs, a quality check on the productivity measure calculation is to check that the sum of output expenditure values used as weights equals total expenditure on inputs.

5.3.9 An additional note on chaining

The main benefit of chaining (in a Laspeyres volume index, this means using previous years’ prices for every pair of years being compared, rather than prices that are fixed in some base year – for further detail, see paragraph 16.31 of SNA 1993) is that it allows changes in relative prices to feed correctly into the price / volume breakdown. A further benefit is that the index number series is constructed as a series of pairs of years. This means that the basket does not need to be the same as time progresses. Given the increasing availability of data over time for, say health care output, this property means that whatever information that is available could be used without disturbing the weighting structure.

While this is useful for dealing with the lack of availability of historical data, there comes a point in time where the coverage falls below what could be considered a crucial level. Care needs to be exercised in deciding on this cut-off point to ensure that relatively low coverage levels do not introduce too much bias into the productivity calculation for early years.

Cautionary note G2

Care needs to be taken in deciding how far back in time to calculate productivity measures, especially when coverage rates are relatively low.

5.4 Issues relating to government output, inputs, and productivity measurement top

This section sets out and discusses the particular issues associated with the measurement of non-market output and productivity, and suggests possible solutions.

5.4.1 Measurement error and ‘fitness-for-purpose’ of resulting estimates

Development of health care and education output and productivity methods continues to be carried out in a number of countries and by international organisations including OECD and Eurostat. As is set out in the section on international practices (see section 10), different countries have reached different states of play, but no country has a perfect measure, or set of measures, of health care and education output and productivity. As for any statistical estimates, especially those that are under development, it is particularly important to ensure that users are given sufficient information on the statistical quality, and how any figures should and can be interpreted. Given the great interest in and sensitivity of these output and productivity estimates, particular care should be taken to ensure users are informed clearly and transparently, and that the quality of any estimates (relative to other official statistics) is clear. Use of ‘experimental’ labels, and the like, can help.

As part of the information users may need to help them understand the statistical quality of health care and education output and productivity estimates, as well as how to correctly interpret any changes over time, it would be useful to show, where possible and available, any indicators of statistical quality. This may be in the form of quantitative information, such as coverage rates for output estimates (see section 5.3.5), or it may be in the form of qualitative, or descriptive, information, such as that published in the UK’s Health care quality report (ONS 2008b).

The experience in the UK, as for many other countries, has been that it is difficult to summarise the statistical quality of these output and productivity estimates. This is mainly because the estimates are the result of a complex set of calculations involving many different data sources, methodologies and assumptions. The UK’s solution to this is to provide users with as much information on statistical quality as possible, including publication of a detailed Sources and methods report (ONS 2008a), a report on quality of the estimates according to the statistical quality framework Health care quality report (ONS 2008b), as well as summaries within the main articles Public service productivity: health care (ONS 2008c) and Public service productivity: education (ONS 2005).

The European Union has enacted legislation that requires the reporting of the statistical quality of the methods for compiling estimates of government output in the National Accounts. The reporting method involves ascribing one of three levels of quality to published methods, as follows:

A methods – these involve an output indicator approach where the indicators satisfy the following criteria:

a) cover all services provided;

b) weighted by the cost of each type of output in the base year;

c) as detailed as possible; and

d) quality adjusted.

B methods – these involve an output indicator approach where the criteria are not fully satisfied: for example, the level of detail could be improved or the measure does not take into account changes in quality.

C methods – if input, activity, or outcome is used (unless outcome can be interpreted as quality-adjusted output) or if coverage of output method is not representative.

This would also be a reasonable, and simple, way to inform users of the statistical quality of any estimate produced in New Zealand. Adopting such a system would mean some comparability in reporting with European Union countries.

Recommendation G12

Statistics NZ should consider how best to inform users of the statistical quality of any government productivity measures it publishes, bearing in mind both quantitative and qualitative means.

User confidence in the developing measures of health care and education output and productivity in the UK has also been bolstered by on ongoing discussion between the statistical office and users, involving occasional workshops, conferences, media presentations, and consultations.


Recommendation G13

Statistics NZ should consider what are the appropriate ways for ensuring on ongoing dialogue with users, to ensure that the statistics provide (at least part of) an answer to specific user questions, and that any external expertise and experience can be drawn on to improve the development work.

5.4.2 Co-production top

Section 5.2 discussed some of the ways that the scope of a government output or productivity measure can be defined. If the scope does not correspond with the usual scope for a production function, then extra care needs to be taken in matching inputs and output, and in distinguishing between gross output and value added.

For example, if the definition taken of the unit of output corresponds with the health care pathway (that is, the unit of output is ‘a patient treated’), this poses a problem if the scope of the output and productivity measures is public hospitals, and the health care pathway traverses primary and secondary care (co-production): how does one distinguish between the value added by the (private sector) general practitioner, the (public sector) outpatient appointment, and the (public sector) inpatient day care or inpatient stay? Of course, there are many other permutations of the pathway through the various public and private sector entities within the health care system, of which this is just one example.

In the market sector, value added is distinguished from gross output according to the prices paid when businesses purchase intermediate consumption items (raw or unfinished goods and services) from each other.

Generally speaking within the health care sector, there is little empirical data that helps distinguish which part of the health sector is responsible for what proportion of the care. The fall-back solution will be to use whatever information is available to approximate these proportions (in terms of both the number of treatments, as well as any change in the quality of those treatments): in a lot of cases, costing information will be available, but this will not always be the case. Where no costing data are available, the only solution may be qualitative, drawing on expert opinion to arrive at a set of reasonable assumptions.

Cautionary note G3

Care needs to be taken in determining value added when a service is delivered by a number of different providers. In some cases, costing data may be available which can be used to derive an approximation for value-added. Other solutions may include basing assumptions on expert opinion.

5.4.3 Co-financing

A further complexity arises from the fact that, for some parts of the health care and education systems, there is co-financing, or co-funding. For example, the cost of an appointment with a general practitioner is covered both by patients paying a fee-for-service payment out of their own pockets, as well as a contribution from government funds. Tertiary education is likewise paid for with a combination of student fees and government funding.

If the scope of the productivity measure is defined according to who is paying, a question arises about how to deal with these services: how much of the output should be associated with government financing, and how much with private financing? Of course, this problem is not confined to general practitioner appointments and tertiary education; there are other health care and education activities which have multiple sources of financing.

Production functions do not lend themselves to this type of analysis. In the UK, a practical solution to this problem has been adopted. The production function is effectively split in proportion to the sources of financing. If the government contribution to total costs is 40 per cent, then 40 per cent of each of the inputs and 40 per cent of output is classified as government output.

A major benefit of this approach is that it deals well with changes in the relative size of the government contribution to cost, compared with other contributions. For example, if nothing changes other than the government contribution to the cost of a general practitioner appointment increasing from 25 per cent to 50 per cent, output and productivity change should remain the same. By also forcing government inputs to change from 25 per cent to 50 per cent, the ratio of government inputs and output remains the same. Any other assumption would mean that the ratio of government output to inputs would change, and therefore would have an impact on overall government productivity.

There are concerns with this assumption. For example, it presumes that the services provided by a single producer to households whether they are funded by households only, government only, or households and government together, are the same and that there is no cross-subsidising. This may not be realistic.

Recommendation G14

In order to deal with complications associated with separating between government output and private sector output if the scope of the productivity measure is defined according to who is paying, then the distribution by source of financing should be used to calculate how much of the inputs and output are government and how much are private.

5.4.4 Complementary statistics top

The Atkinson Review highlights the benefits of comparing and contrasting productivity measures with independent evidence to improve quality, aid interpretation of results, and provide commentary on underlying data issues: The Atkinson Review introduces the term triangulation for this, although this feasibility study prefers the term complementary statistics. This process can also shed light on the performance questions that productivity estimates are not designed to answer; such as, equity, effectiveness, and economy. Later sections of this report highlight some complimentary indicators that might prove useful.

Hint G3

Complementary indicators help various users interpret government productivity estimates in context of the outcomes that most interest them.

5.4.5 Matching output and inputs

The scope of inputs must match that of output in the productivity equation; the labour, intermediate, and capital services used in the production of output should feature as the inputs to production. In practice, apportioning and correctly weighting some of the inputs can be difficult. How to deal with policy and administration work is a case in point.

Education administration and policy takes a variety of forms. Some are clearly identifiable as associated with a particular sector, such as the Tertiary Education Commission, while others target overall long-term educational strategy or are designed to link up the work of other parts of government, such as the Department of Labour or the Ministry of Social Development. Even resources that are targeted at a particular sector, such as ECE, have flow-on effects for other sectors of the educational system.

Typically, there is scant data available that might be used to apportion these resources. Because of the difficulty inherent in apportionment of this work to various sectors, spreading the cost across the industry on a pro-rata basis may be a desirable alternative.

Recommendation G15

The scope of inputs must match that of output in the productivity equation. Where apportionment is not feasible, inputs should be spread across the industry on a pro-rata basis.

5.4.6 Non-market and market consistency

Conceptually, the measurement of health care and education services should be consistent however the services are provided and/or funded. In order to accomplish this, the same definition of output quantity and quality should be used, and ideally the same methods and similar types of sources.

Recommendation G16

Measurement of productivity for the government sector should follow as closely as possible that of the market sector where data sources and user needs allow.

In practice, output estimates for market services tend to be compiled by using price deflators in conjunction with information on current price expenditure, whereas for non- services there are no actual prices, so methods tend to be direct volume measures. Given the ‘expenditure = price * quantity’ identity, it is clear that the principles for achieving high quality estimates of market and non-market estimates are the same, bearing equally on the direct output volume indices and to the price deflators and expenditure information.

Where information exists, for example on the market sector side, it may be informative to compile alternative sets of estimates based on the deflated-expenditure and direct volume approaches, and compare and contrast differences. This should help to improve statistical quality.

Recommendation G17

To help improve statistical quality, where information exists to compile output estimates using both deflated expenditure and a direct volume approaches, the sources, methods and results should be compared and contrasted, with the better quality aspects of both approaches being drawn on to form a single best method.

5.4.7 Rate of return used in calculating the user cost of capital top

For the purposes of productivity measurement, capital services are estimated from a measure of the capital stock, and assuming that the flow of capital services is directly proportional to the underlying stock of the capital being considered. The relative weight of capital services is given by the user cost of capital. The user cost of capital can be seen as an imputed rent: it is the rent that the owner of the capital might notionally charge themselves for use of the capital. In some cases, there may be fully functioning capital rental markets and the imputed rents may be inferred from equivalent actual rents. For many assets, though, there are no fully functioning capital rental markets, and the imputed rent has to be calculated indirectly. The user cost of capital can be seen as being made up of two basic terms: the cost of financing and the change in the value of the capital. The cost of financing is made up of two further parts: an estimate of the interest payment on a loan to purchase the capital, and the cost of depreciation.

The OECD’s Measuring productivity (OECD 2001) sets out the methodology and concepts. Implementation for market sector industries in New Zealand is set out in Productivity statistics: sources and methods (Statistics NZ 2009).

A key principle adopted in this feasibility study is that, where appropriate, the measurement of productivity for the government sector should follow as closely as possible that for the market sector. While this principle holds for the most part in the measurement of government sector capital services, there is one aspect of calculating government sector capital services that may warrant a different treatment: what is the appropriate way to estimate the cost of financing?

For the government sector, it may be more appropriate to adopt a risk-free long-run real rate of interest, perhaps the average interest rate for New Zealand Government bonds, or the rates at which capital is financed, which are specific to the different parts of the government sector.

Recommendation G18

Statistics NZ should consider what the appropriate rate of return should be for calculating the user cost of capital used in the government sector.

  • Share this page to Facebook
  • Share this page to Twitter
  • Share this page to Google+
  • Share this page to Facebook
  • Share this page to Twitter
  • Share this page to Google+