NUSAP net

Dr. Jerome R. Ravetz, The Research Methods Consultancy, 111 Victoria Road, Oxford OX2 7QG, jerome-ravetz@tiscali.co.uk www.jerryravetz.co.uk
Silvio O. Funtowicz, E.C. Joint Research Centre, Ispra, Italy

I often say that when you can measure what you are speaking about, and express it in numbers, you know something about it; but when you cannot measure it, when you cannot express it in numbers, your knowledge is of a meagre and unsatisfactory kind.

(Lord Kelvin)

First, we must insist on risk calculation being expressed as distributions of estimates, and not as magic numbers that can be manipulaed without regard to what they really mean. We must try to display more realistic estimates of risk to show a range of probabilities. To help do this we need tools for quantifying and ordering sources of uncertainty and for putting them in perspective.

(W.D. Ruckelshaus)

Introduction

In this age of powerful information technology, decision-makers in all fields have instant access to masses of quantitative data to help them formulate their policies. These will usually be tested for their quality, by their success (or otherwise) in the achievement of their stated goals. But how much testing is there of the quality of the information on which the decisions are based? All over the industrial and administrative worlds, quality has become a keyword, and significant resources are devoted to quality-control of processes and procedures. Yet the quantitative information that is crucial to all such exercises has almost always escaped the same critical scrutiny.

Issues of uncertainty, and closely related, those of quality of information, are involved whenever research related to policy is utilized in the policy process. As these issues are new, we do not yet possess practical skills for dealing with them effectively. There are a variety of possible responses among users, contradictory among themselves, but all of them to be seen in ordinary practice. The simplest, and still most common, is when decision-makers and the public demand at least the appearance of certainty. The scientific advisers are under severe pressure to provide a single number, regardless of their private reservations. The decision-makers’ attitude may be typified by the statement of a US Food and Drug Administration Commissioner, "I’m looking for a clean bill of health, not a wishy-washy, iffy answer on cyclamattes" (Whittemore, 1983). Of course, when an issue is already polarized, such simplicity does not achieve its desired ends. With such contested policy numbers in play, debates may combine the hyper-sophistication of scholastic disputations with the ferocity of sectarian politics. The scientific inputs then have the paradoxical property of promising objectivity and certainty by their form, but producing only greater contention by their substance.

It is easy to imagine, in a general way, what we mean by "quality" of information, on the analogy of the quality in any other input. Will it perform reliably at a reasonable cost? and, equally important, do we have good and proper evidence that it will do so? The absence of good quality is betrayed by high uncertainty. If we simply do not know, or we are not sure about the testing and certification of the input, then its quality is low, whether it be a plastic moulding or a set of numbers. Up to now, tests for the quality of quantitative information have been very undeveloped, in comparison with those in all other spheres. There are the standard statistical tests on sets of numbers in relation to an hypothesis; and there are highly elaborated formal theories of decision-making in which "uncertainty" is manipulated as one of the variables. But none of these approaches help with the simple question that a decision-maker wants to ask, when confronted with a screenful of information: is this reliable, can I use it safely?

There are two related reasons for the lack of methods in this important field. One is in our common assumptions about the way we know about the world. Science is based on numbers, therefore numbers are necessary for the effective study of the world; and we then assume that numbers, any numbers, are sufficient as well. In such an environment, it is not easy for anyone to imagine analyzing the uncertainties in numbers, and still less easy for him or her to get an audience. Although there is a rich literature of sceptical jokes about statistics, we still use them, usually quite uncritically, because there is nothing better to hand. One can even hear an argument for pseudo-precision in numerical expression (say, using three or four digits on very rough data), on the grounds that one has no idea of the uncertainties, therefore any "error bar" would be only a guess, and so therefore let all the digits stand! If this policy were adopted in manufacturing, the consequences would be severe; are we sure that they are not equally so in finance and administration?

We have devised a notational system for quantitative information by which these difficulties can, to some extent at least, be overcome. It is based in large part on the experience of research work in the matured natural sciences. Contrary to the impression conveyed by popularisers and philosophers, the success of these sciences is not so much due to their being "exact" or certain, but rather to their effective control of the inherent uncertainties (of which inexactness is one sort) in their data and theories. Academics who teach and research in the sciences involved in decision-making, including economics and systems analysis, have found it difficult to resist "physics-envy", and are then hampered by an anachronistic and fantasized vision of the methods and results of physical science.

Of course, the myth of unlimited accuracy in quantitative sciences has many uses, principally for maintaining the plausibility of projects that have lost most or all contact with reality. For example, only with the debate over "Star Wars" did the public learn that computer programming is a highly uncertain and inexact art. With the proliferation of "models" of all sorts in natural and social science, we must recognize a new sort of pseudo-science, which we may call GIGO (Garbage In, Garbage Out). This can be defined as one where the uncertainties in the inputs must be suppressed, lest the outputs become indeterminate. Anyone whose work has suffered through reliance on low-quality quantitative information will have some idea of how much harm it can cause, and some inkling of how widespread is this particular form of incompetence in our society.

Remedying this situation is a big job; it will involve practitioners and teachers, and perhaps philosophers as well. In some ways the problem is built into our system of numbers, which was designed for counting and for simple calculation, and not for performing and expressing estimates. A simple joke tells the story. A museum attendant was overheard telling a group of schoolchildren that a fossil bone in the collection is sixty-five million and twelve years old. Someone asked him how he could be so precise, and he explained that when he came on the job, twelve years previously, he was told that the bone is sixty-five million years old. So he then did the sum:

65,000,000

+ 12

65,000,012

The paradox here is that although in context the sum is ridiculous, our system of arithmetical notation constrains us to write it that way. We need a simple notation in which our calculations can reflect, and not violate, our common sense; and where we can express our estimates with clarity.

The notational system "NUSAP" enables the different sorts of uncertainty in quantitative information are displayed in a standardized and self-explanatory way. It enables providers and users of such information to be clear about its uncertainties. Since the management of uncertainty is at the core of the quality-control of quantititative information, the system "NUSAP" also fosters an enhanced appreciation of the issue of quality in information. It thereby enables a more effective criticism of quantitative information by clients and users of all sorts, expert and lay.

The NUSAP system is based on five categories, which generally reflect the standard practice of the matured experimental sciences. By providing a separate box, or "field", for each aspect of the information, it enables a great flexibility in their expression. By means of NUSAP, nuances of meaning about quantities can be conveyed concisely and clearly, to a degree that is quite impossible otherwise. The name "NUSAP" is an acronym for the categories. The first is Numeral; this will usually be an ordinary number; but when appropriate it can be a more general quantity, such as the expression "a million" (which is not the same as the number lying between 999,999 and 1,000,001). Second comes Unit, which may be of the conventional sort, but which may also contain extra information, as the date at which the unit is evaluated (most commonly with money). The middle category is Spread, which generalizes from the "random error" of experiments or the "variance" of statistics. Although Spread is usually conveyed by a number (either + , % or "factor of")it is not an ordinary quantity, for its own inexactness is not of the same sort as that of measurements.

This brings us to the more qualitative side of the NUSAP expression. The next category is Assessment; this provides a place for a concise expression of the salient qualitative judgements about the information. In the case of statistical tests, this might be the significance-level; in the case of numerical estimates for policy purposes, it might be the qualifier "optimistic" or "pessimistic". In some experimental fields, information is given with two + terms, of which the first is the spread, or random error, and the second is the "systematic error" which must estimated on the basis of the history of the measurement, and which corresponds to our Assessment. It might be thought that the "systematic error" must always be less than the "experimental error", or else the stated "error bar" would be meaningless or misleading. But the "systematic error" can be well estimated only in retrospect, and then it can give surprises. Fig. 1 shows how the successive recommended values of a well-known fundamental physical constant bounced up and down, by rather more than the "experimental error", several times before settling down.

Fig. 1. Succesive "recommended values" of the fine-structure constant.

Finally there is P for Pedigree. It might be surprising to imagine numbers as having pedigrees, as if they were showdogs or racehorses. But where quality is crucial, a pedigree is essential. In our case, the pedigree does not show ancestry, but is an evaluative description of the mode of production (and where relevant, of anticipated use) of the information. Each special sort of information has its own pedigree; and we have found that research workers can quickly learn to formulate the distinctions around which a special pedigree is constructed. In the process they also gain clarity about the characteristic uncertainties of their own field. Although we have not yet had the opportunity to test this with non-expert groups, we are equally sure with some preliminary training, any client or user could learn how to elicit the pedigree of information being provided by an expert.

The pedigree is expressed by means of a matrix; the columns represent the various phases of production or use of the information, and within each column there are modes, normatively ranked descriptions. These can be numerically graded, so that with a coarse arithmetic, a "quality index" can be calculated for use in Assessment if desired. For general statistical information, the pedigree is laid out as in the Table, where the top row has grade 4 and the bottom two, 0. For a numerical evaluation, average scores of 4 downwards are rated as High, Good, Medium, Low and Poor.

Grade	Definitions & Standards	Data-collection & Analysis	Institutional Culture	Review
4	Negotiation	Task-force	Dialogue	External
3	Science	Direct Survey	Accommodation	Independent
2	Convenience	Indirect Survey	Obedience	Regular
1	Symbolism	Educated Guess	Evasion	Occasional
0	Inertia	Fiat	No-contact	None
0	Unknown	Unknown	Unknown	Unknown

The Pedigree Matrix for Statistical Information

The first column describes how the job is defined, for any competent statistician knows that "just collecting numbers" leads to nonsense. The whole Pedigree matrix is conditioned by the principle that statistical work is (unlike some traditional lab research) a highly articulated social activity. So in "Definition and Standards" we put "negotiation" as superior to "science", since those on the job will know of special features and problems of which an expert with only a general training might miss. It is important to be able to descibe low-quality work; and so "symbolism" in statistics is something which any comprehensive scheme must allow for. Similarly, a "Task Force" gets a higher rating than a "Direct Survey", for the latter (like a census) may produce information that is not tailored to the problem at hand. The other two columns relate to the more directly social aspects of the work. "Institutional Culture" describes the relations between the various levels in hierarchy of command and control; and we allow for the phenomena variously described by "Clochemerle" or "Schweik". Since quality-assurance is an essential part of any productive process, including that for information, we have a column for "Review". This needs no explanation.

Thus the pedigree matrix, with its multiplicity of categories, enables a considerable variety of evaluative descriptions to be simply scored and coded. In our book Uncertainty and Quality in Science for Policy (Kluwer Academic, 1990), we illustrated the NUSAP system with a somewhat imaginary example of the history of the statistics on hand-pumps for drinking water in a Third-World country. The earlier efforts had a distinctly low Pedigree profile; inertia, symbolism and fiat were prominent, along with the absence of effective review; but by the end, with the lessons of exprience, improvements could be recorded.

The use of NUSAP could also highlight crucial features of the process. For example, the problems of definition can be explored; is a "pump" one that is listed in an old census, one that is ordered from abroad, one that is registered as delivered, one that is installed, or one that is actually in full and satisfactory use? At the other end of the process, NUSAP alerts us to the meaningfulness or otherwise of numerical expressions. If we see a number like 11,287 coming out of an unsophisticated statistical exercise, we should be able to recognize hyper-precision. On the other hand, a correctly framed estimate like "11,300:pumps: +5%" has certain uses, but the less precise statement, "<111/2:K-pumps" may be more appropriate for policy purposes. In prose, this is "somewhat less than 111/2 thousand pumps, where the aggregated unit of counting is a thousand-pumps, with a Spread of a quarter-thousand. Thus with NUSAP we are able to provide numerical statements with considerable nuance of expression.

We know enough about the use of numbers to be aware that no single system will prevent incompentence and abuse of statistics. But with the improvement of competence all around, and especially with the arming users and clients with an instrument of analysis, NUSAP will at least make it possible for the debate to be conducted at a higher level. We cannot claim that an improvement in this one area of practice will transform industry and administration for the better; but we do believe that everyone will be better off when they know what they are doing, in the management of the uncertainty and quality of their quantitative information.

(Note on references: all citations will be found in the book Uncertainty and Quality in Science for Policy, by S.O. Funtowicz and J.R. Ravetz)