Categories
Statistics

3. Populations and samples

Populations

In statistics the term “population” has a slightly different meaning from the one given to it in ordinary speech. It need not refer only to people or to animate creatures – the population of Britain, for instance or the dog population of London. Statisticians also speak of a population of objects, or events, or procedures, or observations, including such things as the quantity of lead in urine, visits to the doctor, or surgical operations. A population is thus an aggregate of creatures, things, cases and so on.

Although a statistician should clearly define the population he or she is dealing with, they may not be able to enumerate it exactly. For instance, in ordinary usage the population of England denotes the number of people within England’s boundaries, perhaps as enumerated at a census. But a physician might embark on a study to try to answer the question “What is the average systolic blood pressure of Englishmen aged 40-59?” But who are the “Englishmen” referred to here? Not all Englishmen live in England, and the social and genetic background of those that do may vary. A surgeon may study the effects of two alternative operations for gastric ulcer. But how old are the patients? What sex are they? How severe is their disease? Where do they live? And so on. The reader needs precise information on such matters to draw valid inferences from the sample that was studied to the population being considered. Statistics such as averages and standard deviations, when taken from populations are referred to as population parameters. They are often denoted by Greek letters: the population mean is denoted by μ(mu) and the standard deviation denoted by ς (low case sigma)

Samples

A population commonly contains too many individuals to study conveniently, so an investigation is often restricted to one or more samples drawn from it. A well chosen sample will contain most of the information about a particular population parameter but the relation between the sample and the population must be such as to allow true inferences to be made about a population from that sample.

Consequently, the first important attribute of a sample is that every individual in the population from which it is drawn must have a known non-zero chance of being included in it; a natural suggestion is that these chances should be equal. We would like the choices to be made independently; in other words, the choice of one subject will not affect the chance of other subjects being chosen. To ensure this we make the choice by means of a process in which chance alone operates, such as spinning a coin or, more usually, the use of a table of random numbers. A limited table is given in the Table F (Appendix), and more extensive ones have been published.(1-4) A sample so chosen is called a random sample.The word “random” does not describe the sample as such but the way in which it is selected.

To draw a satisfactory sample sometimes presents greater problems than to analyse statistically the observations made on it. A full discussion of the topic is beyond the scope of this book, but guidance is readily available(1)(2). In this book only an introduction is offered.

Before drawing a sample the investigator should define the population from which it is to come. Sometimes he or she can completely enumerate its members before beginning analysis – for example, all the livers studied at necropsy over the previous year, all the patients aged 20-44 admitted to hospital with perforated peptic ulcer in the previous 20 months. In retrospective studies of this kind numbers can be allotted serially from any point in the table to each patient or specimen. Suppose we have a population of size 150, and we wish to take a sample of size five. contains a set of computer generated random digits arranged in groups of five. Choose any row and column, say the last column of five digits. Read only the first three digits, and go down the column starting with the first row. Thus we have 265, 881, 722, etc. If a number appears between 001 and 150 then we include it in our sample. Thus, in order, in the sample will be subjects numbered 24, 59, 107, 73, and 65. If necessary we can carry on down the next column to the left until the full sample is chosen.

The use of random numbers in this way is generally preferable to taking every alternate patient or every fifth specimen, or acting on some other such regular plan. The regularity of the plan can occasionally coincide by chance with some unforeseen regularity in the presentation of the material for study – for example, by hospital appointments being made from patients from certain practices on certain days of the week, or specimens being prepared in batches in accordance with some schedule.

As susceptibility to disease generally varies in relation to age, sex, occupation, family history, exposure to risk, inoculation state, country lived in or visited, and many other genetic or environmental factors, it is advisable to examine samples when drawn to see whether they are, on average, comparable in these respects. The random process of selection is intended to make them so, but sometimes it can by chance lead to disparities. To guard against this possibility the sampling may be stratified.This means that a framework is laid down initially, and the patients or objects of the study in a random sample are then allotted to the compartments of the framework. For instance, the framework might have a primary division into males and females and then a secondary division of each of those categories into five age groups, the result being a framework with ten compartments. It is then important to bear in mind that the distributions of the categories on two samples made up on such a framework may be truly comparable, but they will not reflect the distribution of these categories in the population from which the sample is drawn unless the compartments in the framework have been designed with that in mind. For instance, equal numbers might be admitted to the male and female categories, but males and females are not equally numerous in the general population, and their relative proportions vary with age. This is known as stratified random sampling.For taking a sample from a long list a compromise between strict theory and practicalities is known as a systematic random sample.In this case we choose subjects a fixed interval apart on the list, say every tenth subject, but we choose the starting point within the first interval at random.

Unbiasedness and precision

The terms unbiased and precision have acquired special meanings in statistics. When we say that a measurement is unbiased we mean that the average of a large set of unbiased measurements will be close to the true value. When we say it is precise we mean that it is repeatable. Repeated measurements will be close to one another, but not necessarily close to the true value. We would like a measurement that is both accurate and precise. Some authors equate unbiasedness with accuracy,but this is not universal and others use the term accuracy to mean a measurement that is both unbiased and precise. Strike (5) gives a good discussion of the problem.

An estimate of a parameter taken from a random sample is known to be unbiased. As the sample size increases, it gets more precise.

Randomisation

Another use of random number tables is to randomise the allocation of treatments to patients in a clinical trial. This ensures that there is no bias in treatment allocation and, in the long run, the subjects in each treatment group are comparable in both known and unknown prognostic factors. A common method is to use blocked randomisation. This is to ensure that at regular intervals there are equal numbers in the two groups. Usual sizes for blocks are two, four, six, eight, and ten. Suppose we chose a block size of ten. A simple method using Table F (Appendix) is to choose the first five unique digits in any row. If we chose the first row, the first five unique digits are 3, 5, 6, 8, and 4. Thus we would allocate the third, fourth, fifth, sixth, and eighth subjects to one treatment and the first, second, seventh, ninth, and tenth to the other. If the block size was less than ten we would ignore digits bigger than the block size. To allocate further subjects to treatment, we carry on along the same row, choosing the next five unique digits for the first treatment. In randomised controlled trials it is advisable to change the block size from time to time to make it more difficult to guess what the next treatment is going to be.

It is important to realise that patients in a randomised trial are not a random sample from the population of people with the disease in question but rather a highly selected set of eligible and willing patients. However, randomisation ensures that in the long run any differences in outcome in the two treatment groups are due solely to differences in treatment.

Variation between samples

Even if we ensure that every member of a population has a known, and usually an equal, chance of being included in a sample, it does not follow that a series of samples drawn from one population and fulfilling this criterion will be identical. They will show chance variations from one to another, and the variation may be slight or considerable. For example, a series of samples of the body temperature of healthy people would show very little variation from one to another, but the variation between samples of the systolic blood pressure would be considerable. Thus the variation between samples depends partly on the amount of variation in the population from which they are drawn.

Furthermore, it is a matter of common observation that a small sample is a much less certain guide to the population from which it was drawn than a large sample. In other words, the more members of a population that are included in a sample the more chance will that sample have of accurately representing the population, provided a random process is used to construct the sample. A consequence of this is that, if two or more samples are drawn from a population, the larger they are the more likely they are to resemble each other – again provided that the random technique is followed. Thus the variation between samples depends partly also on the size of the sample. Usually, however, we are not in a position to take a random sample; our sample is simply those subjects available for study. This is a “convenience” sample. For valid generalisations to be made we would like to assert that our sample is in some way representative of the population as a whole and for this reason the first stage in a report is to describe the sample, say by age, sex, and disease status, so that other readers can decide if it is representative of the type of patients they encounter.

Standard error of the mean

If we draw a series of samples and calculate the mean of the observations in each, we have a series of means. These means generally conform to a Normal distribution, and they often do so even if the observations from which they were obtained do not (see Exercise 3.3). This can be proven mathematically and is known as the “Central Limit Theorem”. The series of means, like the series of observations in each sample, has a standard deviation. The standard error of the mean of one sample is an estimate of the standard deviation that would be obtained from the means of a large number of samples drawn from that population.

As noted above, if random samples are drawn from a population their means will vary from one to another. The variation depends on the variation of the population and the size of the sample. We do not know the variation in the population so we use the variation in the sample as an estimate of it. This is expressed in the standard deviation. If we now divide the standard deviation by the square root of the number of observations in the sample we have an estimate of the standard error of the mean, . It is important to realise that we do not have to take repeated samples in order to estimate the standard error, there is sufficient information within a single sample. However, the conception is that ifwe were to take repeated random samples from the population, this is how we would expect the mean to vary, purely by chance.

A general practitioner in Yorkshire has a practice which includes part of a town with a large printing works and some of the adjacent sheep farming country. With her patients’ informed consent she has been investigating whether the diastolic blood pressure of men aged 20-44 differs between the printers and the farm workers. For this purpose she has obtained a random sample of 72 printers and 48 farm workers and calculated the mean and standard deviations, as shown in Table 3.1.

To calculate the standard errors of the two mean blood pressures the standard deviation of each sample is divided by the square root of the number of the observations in the sample.

These standard errors may be used to study the significance of the difference between the two means, as described in successive chapters

Table 3.1

Standard error of a proportion or a percentage

Just as we can calculate a standard error associated with a mean so we can also calculate a standard error associated with a percentage or a proportion. Here the size of the sample will affect the size of the standard error but the amount of variation is determined by the value of the percentage or proportion in the population itself, and so we do not need an estimate of the standard deviation. For example, a senior surgical registrar in a large hospital is investigating acute appendicitis in people aged 65 and over. As a preliminary study he examines the hospital case notes over the previous 10 years and finds that of 120 patients in this age group with a diagnosis confirmed at operation 73 (60.8%) were women and 47 (39.2%) were men.

If p represents one percentage, 100 p represents the other. Then the standard error of each of these percentages is obtained by (1) multiplying them together, (2) dividing the product by the number in the sample, and (3) taking the square root:

which for the appendicitis data given above is as follows:

Problems with non-random samples

In general we do not have the luxury of a random sample; we have to make do with what is available, a “convenience sample“. In order to be able to make generalisations we should investigate whether biases could have crept in, which mean that the patients available are not typical. Common biases are:

  • hospital patients are not the same as ones seen in the community;
  • volunteers are not typical of non-volunteers;
  • patients who return questionnaires are different from those who do not.

In order to persuade the reader that the patients included are typical it is important to give as much detail as possible at the beginning of a report of the selection process and some demographic data such as age, sex, social class and response rate.

Common questions

Given measurements on a sample, what is the difference between a standard deviation and a standard error?

A standard deviation is a sample estimate of the population parameter; that is, it is an estimate of the variability of the observations. Since the population is unique, it has a unique standard deviation, which may be large or small depending on how variable the observations are. We would not expect the sample standard deviation to get smaller because the sample gets larger. However, a large sample would provide a more precise estimate of the population standard deviation than a small sample.

A standard error, on the other hand, is a measure of precision of an estimate of a population parameter. A standard error is always attached to a parameter, and one can have standard errors of any estimate, such as mean, median, fifth centile, even the standard error of the standard deviation. Since one would expect the precision of the estimate to increase with the sample size, the standard error of an estimate will decrease as the sample size increases.

When should I use a standard deviation to describe data and when should I use a standard error?

It is a common mistake to try and use the standard error to describe data. Usually it is done because the standard error is smaller, and so the study appears more precise. If the purpose is to describe the data (for example so that one can see if the patients are typical) and if the data are plausibly Normal, then one should use the standard deviation (mnemonic D for Description and D for Deviation). If the purpose is to describe the outcome of a study, for example to estimate the prevalence of a disease, or the mean height of a group, then one should use a standard error (or, better, a confidence interval; see Chapter 4) (mnemonic E for Estimate and E for Error).

References

  1. Altman DG. Practical Statistics for Medical Research.London: Chapman & Hall, 1991
  2. Armitage P, Berry G. Statistical Methods in Medical Research.Oxford: Blackwell Scientific Publications, 1994.
  3. Campbell MJ, Machin D. Medical Statistics: A Commonsense Approach.2nd ed. Chichester: John Wiley, 1993.
  4. Fisher RA, Yates F. Statistical Tables for Biological, Agricultural and Medical Research,6th ed. London: Longman, 1974.
  5. Strike PW. Measurement and control. Statistical Methods in Laboratory Medicine.Oxford: Butterworth-Heinemann, 1991:255.

Exercises

Exercise 3.1

The mean urinary lead concentration in 140 children was 2.18 mol/24 h, with standard deviation 0.87. What is the standard error of the mean?

Answers Chapter 3 Q1.pdf

Exercise 3.2

In Table F (Appendix), what is the distribution of the digits, and what are the mean and standard deviation?

Answers Chapter 3 Q2.pdf

Exercise 3.3

For the first column of five digits in Table F take the mean value of the five digits and do this for all rows of five digits in the column.

What would you expect a histogram of the means to look like?

What would you expect the mean and standard deviation to be?

Answers Chapter 3 Q3.pdf

Categories
Statistics

2. Mean and standard deviation

Missing alternative text

The median is known as a measure of location; that is, it tells us where
the data are. As stated in , we do not need to know all the exact values to
calculate the median; if we made the smallest value even smaller or the
largest value even larger, it would not change the value of the median. Thus
the median does not use all the information in the data and so it can be
shown to be less efficient than the mean or average, which does use all
values of the data. To calculate the mean we add up the observed values and
divide by the number of them. The total of the values obtained in Table 1.1
was 22.5 Missing alternative text , which was divided
by their number, 15, to give a mean of 1.5. This familiar process is
conveniently expressed by the following symbols:

Missing alternative text

Missing alternative text (pronounced “x bar”)
signifies the mean; x is each of the values of urinary lead; n is the number
of these values; and σ , the Greek capital sigma (our “S”) denotes “sum of”.
A major disadvantage of the mean is that it is sensitive to outlying points.
For example, replacing 2.2 by 22 in Table 1.1 increases the mean to 2.82 ,
whereas the median will be unchanged.

As well as measures of location we need measures of how variable the data
are. We met two of these measures, the range and interquartile range, in Chapter
1
.

The range is an important measurement, for figures at the top and bottom of
it denote the findings furthest removed from the generality. However, they
do not give much indication of the spread of observations about the mean.
This is where the standard deviation (SD) comes in.

The theoretical basis of the standard deviation is complex and need not
trouble the ordinary user. We will discuss sampling and populations in
Chapter 3. A practical point to note here is that, when the population from
which the data arise have a distribution that is approximately “Normal” (or
Gaussian), then the standard deviation provides a useful basis for
interpreting the data in terms of probability.

The Normal distribution is represented by a family of curves defined
uniquely by two parameters, which are the mean and the standard deviation of
the population. The curves are always symmetrically bell shaped, but the
extent to which the bell is compressed or flattened out depends on the
standard deviation of the population. However, the mere fact that a curve is
bell shaped does not mean that it represents a Normal distribution, because
other distributions may have a similar sort of shape.

Many biological characteristics conform to a Normal distribution closely
enough for it to be commonly used – for example, heights of adult men and
women, blood pressures in a healthy population, random errors in many types
of laboratory measurements and biochemical data. Figure 2.1 shows a Normal
curve calculated from the diastolic blood pressures of 500 men, mean 82
mmHg, standard deviation 10 mmHg. The ranges representing [+-1SD, +12SD, and
+-3SD] about the mean are marked. A more extensive set of values is given in
Table A of the print edition.

Figure 2.1

Missing alternative text

The reason why the standard deviation is such a useful measure of the
scatter of the observations is this: if the observations follow a Normal
distribution, a range covered by one standard deviation above the mean and
one standard deviation below it

Missing alternative text

includes about 68% of the observations; a range of two standard deviations
above and two below (Missing alternative text) about 95% of the
observations; and of three standard deviations above and three below (Missing alternative text) about 99.7% of the
observations. Consequently, if we know the mean and standard deviation of a
set of observations, we can obtain some useful information by simple
arithmetic. By putting one, two, or three standard deviations above and
below the mean we can estimate the ranges that would be expected to include
about 68%, 95%, and 99.7% of the observations.

Standard deviation from ungrouped data

The standard deviation is a summary measure of the differences of each
observation from the mean. If the differences themselves were added up, the
positive would exactly balance the negative and so their sum would be zero.
Consequently the squares of the differences are added. The sum of the
squares is then divided by the number of observations minus oneto give the
mean of the squares, and the square root is taken to bring the measurements
back to the units we started with. (The division by the number of
observations minus oneinstead of the number of observations itself to obtain
the mean square is because “degrees of freedom” must be used. In these
circumstances they are one less than the total. The theoretical
justification for this need not trouble the user in practice.)

To gain an intuitive feel for degrees of freedom, consider choosing a
chocolate from a box of n chocolates. Every time we come to choose a
chocolate we have a choice, until we come to the last one (normally one with
a nut in it!), and then we have no choice. Thus we have n-1 choices, or
“degrees of freedom”.

The calculation of the variance is illustrated in Table 2.1 with the 15
readings in the preliminary study of urinary lead concentrations (Table
1.2). The readings are set out in column (1). In column (2) the difference
between each reading and the mean is recorded. The sum of the differences is
0. In column (3) the differences are squared, and the sum of those squares
is given at the bottom of the column.

Table 2.1

Missing alternative textThe sum of the
squares of the differences (or deviations) from the mean, 9.96, is now
divided by the total number of observation minus one, to give the
variance.Thus,Missing alternative textIn this case we
find:Missing alternative textFinally, the square
root of the variance provides the standard deviation:

Missing alternative textfrom which we get

Missing alternative text

This procedure illustrates the structure of the standard deviation, in
particular that the two extreme values 0.1 and 3.2 contribute most to the
sum of the differences squared.

Calculator procedure

Most inexpensive calculators have procedures that enable one to calculate
the mean and standard deviations directly, using the “SD” mode. For example,
on modern Casio calculators one presses SHIFT and ‘.’ and a little “SD”
symbol should appear on the display. On earlier Casios one presses INV and
MODE , whereas on a Sharp 2nd F and Stat should be used. The data are stored
via the M+ button. Thus, having set the calculator into the “SD” or “Stat”
mode, from Table 2.1 we enter 0.1 M+ , 0.4 M+ , etc. When all the data are
entered, we can check that the correct number of observations have been
included by Shift and n, and “15” should be displayed. The mean is displayed
by Shift and Missing alternative textand the standard
deviation by Shift and Missing alternative text. Avoid pressing
Shift and AC between these operations as this clears the statistical memory.
There is another button on many calculators. This uses the divisor n rather
than n – 1 in the calculation of the standard deviation. On a Sharp
calculator Missing alternative text is denotedMissing alternative text , whereas Missing alternative text is denoted s. These
are the “population” values, and are derived assuming that an entire
population is available or that interest focuses solely on the data in hand,
and the results are not going to be generalised (see Chapter
3
for details of samples and populations). As this situation very
rarely arises, Missing alternative text should be used and
ignored, although even for moderate sample sizes the difference is going to
be small. Remember to return to normal mode before resuming calculations
because many of the usual functions are not available in “Stat” mode. On a
modern Casio this is Shift 0. On earlier Casios and on Sharps one repeats
the sequence that call up the “Stat” mode. Some calculators stay in “Stat”
mode even when switched off.Mullee (1) provides advice on choosing and using
a calculator. The calculator formulas use the relationship

Missing alternative text

The right hand expression can be easily memorised by the expression mean of
the squares minus the mean square”. The sample variance Missing alternative textis obtained from

Missing alternative text

The above equation can be seen to be true in Table 2.1, where the sum of
the square of the observations, Missing alternative text, is given as 43.7l.
We thus obtain

Missing alternative text

the same value given for the total in column (3). Care should be taken
because this formula involves subtracting two large numbers to get a small
one, and can lead to incorrect results if the numbers are very large. For
example, try finding the standard deviation of 100001, 100002, 100003 on a
calculator. The correct answer is 1, but many calculators will give 0
because of rounding error. The solution is to subtract a large number from
each of the observations (say 100000) and calculate the standard deviation
on the remainders, namely 1, 2 and 3.

Standard deviation from grouped data

We can also calculate a standard deviation for discrete quantitative
variables. For example, in addition to studying the lead concentration in
the urine of 140 children, the paediatrician asked how often each of them
had been examined by a doctor during the year. After collecting the
information he tabulated the data shown in Table 2.2 columns (1) and (2).
The mean is calculated by multiplying column (1) by column (2), adding the
products, and dividing by the total number of observations. Table 2.2

Missing alternative text

As we did for continuous data, to calculate the standard deviation we
square each of the observations in turn. In this case the observation is the
number of visits, but because we have several children in each class, shown
in column (2), each squared number (column (4)), must be multiplied by the
number of children. The sum of squares is given at the foot of column (5),
namely 1697. We then use the calculator formula to find the variance:Missing alternative textand Missing alternative text.Note that although
the number of visits is not Normally distributed, the distribution is
reasonably symmetrical about the mean. The approximate 95% range is given byFig
      2.19This excludes two children with no visits and
six children with six or more visits. Thus there are eight of 140 = 5.7%
outside the theoretical 95% range.Note that it is common for discrete
quantitative variables to have what is known as skeweddistributions, that is
they are not symmetrical. One clue to lack of symmetry from derived
statistics is when the mean and the median differ considerably. Another is
when the standard deviation is of the same order of magnitude as the mean,
but the observations must be non-negative. Sometimes a transformation will
convert a skewed distribution into a symmetrical one. When the data are
counts, such as number of visits to a doctor, often the square root
transformation will help, and if there are no zero or negative values a
logarithmic transformation will render the distribution more symmetrical.

Data transformation

An anaesthetist measures the pain of a procedure using a 100 mm visual
analogue scale on seven patients. The results are given in Table 2.3,
together with the log etransformation (the ln button on a calculator). Table
2.3

Missing alternative textThe data are
plotted in Figure 2.2, which shows that the outlier does not appear so
extreme in the logged data. The mean and median are 10.29 and 2,
respectively, for the original data, with a standard deviation of 20.22.
Where the mean is bigger than the median, the distribution is positively
skewed. For the logged data the mean and median are 1.24 and 1.10
respectively, indicating that the logged data have a more symmetrical
distribution. Thus it would be better to analyse the logged transformed data
in statistical tests than using the original scale.Figure 2.2

Missing alternative textIn reporting these
results, the median of the raw data would be given, but it should be
explained that the statistical test wascarried out on the transformed data.
Note that the median of the logged data is the same as the log of the median
of the raw data – however, this is not true for the mean. The mean of the
logged data is not necessarily equal to the log of the mean of the raw data.
The antilog (exp or Missing alternative text on a calculator) of
the mean of the logged data is known as the geometric mean,and is often a
better summary statistic than the mean for data from positively skewed
distributions. For these data the geometric mean in 3.45 mm.

Between subjects and within subjects standard deviation

If repeated measurements are made of, say, blood pressure on an individual,
these measurements are likely to vary. This is within subject, or
intrasubject, variability and we can calculate a standard deviation of these
observations. If the observations are close together in time, this standard
deviation is often described as the measurement error.Measurements made on
different subjects vary according to between subject, or intersubject,
variability. If many observations were made on each individual, and the
average taken, then we can assume that the intrasubject variability has been
averaged out and the variation in the average values is due solely to the
intersubject variability. Single observations on individuals clearly contain
a mixture of intersubject and intrasubject variation. The coefficient of
variation(CV%) is the intrasubject standard deviation divided by the mean,
expressed as a percentage. It is often quoted as a measure of repeatability
for biochemical assays, when an assay is carried out on several occasions on
the same sample. It has the advantage of being independent of the units of
measurement, but also numerous theoretical disadvantages. It is usually
nonsensical to use the coefficient of variation as a measure of between
subject variability.

Common questions

When should I use the mean and when should I use the median to describe my
data?
It is a commonly held misapprehension that for Normally distributed data one
uses the mean, and for non-Normally distributed data one uses the median.
Alas this is not so: if the data are Normally distributed the mean and the
median will be close; if the data are not Normally distributed then both the
mean and the median may give useful information. Consider a variable that
takes the value 1 for males and 0 for females. This is clearly not Normally
distributed. However, the mean gives the proportion of males in the group,
whereas the median merely tells us which group contained more than 50% of
the people. Similarly, the mean from ordered categorical variables can be
more useful than the median, if the ordered categories can be given
meaningful scores. For example, a lecture might be rated as 1 (poor) to 5
(excellent). The usual statistic for summarising the result would be the
mean. In the situation where there is a small group at one extreme of a
distribution (for example, annual income) then the median will be more
“representative” of the distribution. My data must have values greater than
zero and yet the mean and standard deviation are about the same size. How
does this happen? If data have a very skewed distribution, then the standard
deviation will be grossly inflated, and is not a good measure of variability
to use. As we have shown, occasionally a transformation of the data, such as
a log transform, will render the distribution more symmetrical.
Alternatively, quote the interquartile range.

References

1. Mullee M A. How to choose and use a calculator. In: How to do it 2.BMJ
Publishing Group, 1995:58-62.

Exercises

Exercise 2.1

In the campaign against smallpox a doctor inquired into the number of times
150 people aged 16 and over in an Ethiopian village had been vaccinated. He
obtained the following figures: never, 12 people; once, 24; twice, 42; three
times, 38; four times, 30; five times, 4. What is the mean number of times
those people had been vaccinated and what is the standard deviation?Answer

Exercise 2.2

Obtain the mean and standard deviation of the data in and an approximate
95% range.Answer

Exercise 2.3

Which points are excluded from the range mean – 2SD to mean + 2SD? What
proportion of the data is excluded? Answers
Chapter 2 Q3.pdf
Answer

Categories
Statistics

1. Data display and summary

Types of data

The first step, before any calculations or plotting of data, is to decide what type of data one is dealing with. There are a number of typologies, but one that has proven useful is given in Table 1.1. The basic distinction is between quantitative variables (for which one asks “how much?”) and categorical variables (for which one asks “what type?”).

Quantitative variables can be continuous or discrete. Continuous variables, such as height, can in theory take any value within a given range. Examples of discrete variables are: number of children in a family, number of attacks of asthma per week.

Categorical variables are either nominal (unordered) or ordinal (ordered). Examples of nominal variables are male/female, alive/dead, blood group O, A, B, AB. For nominal variables with more than two categories the order does not matter. For example, one cannot say that people in blood group B lie between those in A and those in AB. Sometimes, however, people can provide ordered responses, such as grade of breast cancer, or they can “agree”, “neither agree nor disagree”, or “disagree” with some statement. In this case the order does matter and it is usually important to account for it.

Table 1.1

Missing alternative text

Variables shown at the left of Table 1.1 can be converted to ones further to the right by using “cut off points”. For example, blood pressure can be turned into a nominal variable by defining “hypertension” as a diastolic blood pressure greater than 90 mmHg, and “normotension” as blood pressure less than or equal to 90 mmHg. Height (continuous) can be converted into “short”, average” or “tall” (ordinal).

In general it is easier to summarise categorical variables, and so quantitative variables are often converted to categorical ones for descriptive purposes. To make a clinical decision on someone, one does not need to know the exact serum potassium level (continuous) but whether it is within the normal range (nominal). It may be easier to think of the proportion of the population who are hypertensive than the distribution of blood pressure. However, categorising a continuous variable reduces the amount of information available and statistical tests will in general be more sensitive – that is they will have more power (see Chapter 5 for a definition of power) for a continuous variable than the corresponding nominal one, although more assumptions may have to be made about the data. Categorising data is therefore useful for summarising results, but not for statistical analysis. It is often not appreciated that the choice of appropriate cut off points can be difficult, and different choices can lead to different conclusions about a set of data.

These definitions of types of data are not unique, nor are they mutually exclusive, and are given as an aid to help an investigator decide how to display and analyse data. One should not debate overlong the typology of a particular variable!

Stem and leaf plots

Before any statistical calculation, even the simplest, is performed the data should be tabulated or plotted. If they are quantitative and relatively few, say up to about 30, they are conveniently written down in order of size.

For example, a paediatric registrar in a district general hospital is investigating the amount of lead in the urine of children from a nearby housing estate. In a particular street there are 15 children whose ages range from 1 year to under 16, and in a preliminary study the registrar has found the following amounts of urinary lead ( ), given in Table 1.2 what is called an array:

Table 1.2

<Missing alternative text

A simple way to order, and also to display, the data is to use a stem and leaf plot. To do this we need to abbreviate the observations to two significant digits. In the case of the urinary concentration data, the digit to the left of the decimal point is the “stem” and the digit to the right the “leaf”.

We first write the stems in order down the page. We then work along the data set, writing the leaves down “as they come”. Thus, for the first data point, we write a 6 opposite the 0 stem. These are as given in Figure 1.1.

Figure 1.1

Missing alternative text

We then order the leaves, as in Figure 1.2

Figure 1.2

Missing alternative text

The advantage of first setting the figures out in order of size and not simply feeding them straight from notes into a calculator (for example, to find their mean) is that the relation of each to the next can be looked at. Is there a steady progression, a noteworthy hump, a considerable gap? Simple inspection can disclose irregularities. Furthermore, a glance at the figures gives information on their range. The smallest value is 0.1 and the largest is 3.2 .

Median

To find the median (or mid point) we need to identify the point which has the property that half the data are greater than it, and half the data are less than it. For 15 points, the mid point is clearly the eighth largest, so that seven points are less than the median, and seven points are greater than it. This is easily obtained from Figure 1.2 by counting the eighth leaf, which is 1.5 .

To find the median for an even number of points, the procedure is as follows. Suppose the paediatric registrar obtained a further set of 16 urinary lead concentrations from children living in the countryside in the same county as the hospital.(Table 1.3)

Table 1.3

Missing alternative text

To obtain the median we average the eighth and ninth points (1.8 and 1.9) to get 1.85. In general, if n is even, we average the n/2th largest and the n/2 + 1th largest observations.

The main advantage of using the median as a measure of location is that it is “robust” to outliers. For example, if we had accidentally written 34 rather than 3.4 in Table 1.2 , the median would still have been 1.85. One disadvantage is that it is tedious to order a large number of observations by hand (there is usually no “median” button on a calculator).

Measures of variation

It is informative to have some measure of the variation of observations about the median. The range is very susceptible to what are known as outliers, points well outside the main body of the data. For example, if we had made the mistake of writing 34 instead 3.4 in Table 1.2, then the range would be written as 0.1 to 34 which is clearly misleading.

A more robust approach is to divide the distribution of the data into four, and find the points below which are 25%, 50% and 75% of the distribution. These are known as quartiles, and the median is the second quartile. The variation of the data can be summarised in the interquartile range, the distance between the first and third quartile. With small data sets and if the sample size is not divisible by four, it may not be possible to divide the data set into exact quarters, and there are a variety of proposed methods to estimate the quartiles. A simple, consistent method is to find the points midway between each end of the range and the median. Thus, from Figure 1.2, there are eight points between and including the smallest, 0.1, and the median, 1.5. Thus the mid point lies between 0.8 and 1.1, or 0.95. This is the first quartile. Similarly the third quartile is mid way between 1.9 and 2.0, or 1.95. Thus, the interquartile range is 0.95 to 1.95 .

Data display

The simplest way to show data is a dot plot. Figure 1.3 shows the data from tables 1.2 and 1.3 and together with the median for each set.

Figure 1.3

Missing alternative text

Sometimes the points in separate plots may be linked in some way, for example the data in Table 1.2 and Table 1.3 may result from a matched case control study (see Chapter 13 for a description of this type of study) in which individuals from the countryside were matched by age and sex with individuals from the town. If possible the links should be maintained in the display, for example by joining matching individuals in Figure 1.3. This can lead to a more sensitive way of examining the data.

When the data sets are large, plotting individual points can be cumbersome. An alternative is a box-whisker plot. The box is marked by the first and third quartile, and the whiskers extend to the range. The median is also marked in the box, as shown in Figure 1.4

Figure 1.4

Missing alternative text

It is easy to include more information in a box-whisker plot. One method, which is implemented in some computer programs, is to extend the whiskers only to points that are 1.5 times the interquartile range below the first quartile or above the third quartile, and to show remaining points as dots, so that the number of outlying points is shown.

Histograms

Suppose the paediatric registrar referred to earlier extends the urban study to the entire estate in which the children live. He obtains figures for the urinary lead concentration in 140 children aged over 1 year and under 16. We can display these data as a grouped frequency table (Table 1.4).

Table 1.4

Missing alternative text

Figure 1.5

Missing alternative text

Bar charts

Suppose, of the 140 children, 20 lived in owner occupied houses, 70 lived in council houses and 50 lived in private rented accommodation. Figures from the census suggest that for this age group, throughout the county, 50% live in owner occupied houses, 30% in council houses, and 20% in private rented accommodation. Type of accommodation is a categorical variable, which can be displayed in a bar chart. We first express our data as percentages:

14% owner occupied, 50% council house, 36% private rented. We then display the data as a bar chart. The sample size should always be given (Figure 1.6).

Figure 1.6

Missing alternative text

Common questions

How many groups should I have for a histogram?

In general one should choose enough groups to show the shape of a distribution, but not too many to lose the shape in the noise. It is partly aesthetic judgement but, in general, between 5 and 15, depending on the sample size, gives a reasonable picture. Try to keep the intervals (known also as “bin widths”) equal. With equal intervals the height of the bars and the area of the bars are both proportional to the number of subjects in the group. With unequal intervals this link is lost, and interpretation of the figure can be difficult.

What is the distinction between a histogram and a bar chart?

Alas, with modern graphics programs the distinction is often lost. A histogram shows the distribution of a continuous variable and, since the variable is continuous, there should be no gaps between the bars. A bar chart shows the distribution of a discrete variable or a categorical one, and so will have spaces between the bars. It is a mistake to use a bar chart to display a summary statistic such as a mean, particularly when it is accompanied by some measure of variation to produce a “dynamite plunger plot”(1). It is better to use a box-whisker plot.

What is the best way to display data?

The general principle should be, as far as possible, to show the original data and to try not to obscure the desigu of a study in the display. Within the constraints of legibility show as much information as possible. If data points are matched or from the same patients link them with lines. (2) When displaying the relationship between two quantitative variables, use a scatter plot (Chapter 11) in preference to categorising one or both of the variables.

References

1. Campbell M J. How to present numerical results. In: How to do it: 2.London: BMJ Publishing, 1995:77-83.

2. Matthews J N S, Altman D G, Campbell M J, Royston J P. Analysis of serial measurements in medical research. BMJ1990; 300:230-5.

Exercises

Exercise 1.1

From the 140 children whose urinary concentration of lead were investigated 40 were chosen who were aged at least 1 year but under 5 years. The following concentrations of copper (in ) were found.

0.70, 0.45, 0.72, 0.30, 1.16, 0.69, 0.83, 0.74, 1.24, 0.77,

0.65, 0.76, 0.42, 0.94, 0.36, 0.98, 0.64, 0.90, 0.63, 0.55,

0.78, 0.10, 0.52, 0.42, 0.58, 0.62, 1.12, 0.86, 0.74, 1.04,

0.65, 0.66, 0.81, 0.48, 0.85, 0.75, 0.73, 0.50, 0.34, 0.88

Find the median, range and quartiles.

eBMJ — Statistics at Square One- Answers to exercises.pdfAnswer

Categories
Statistics

Preface

It is with trepidation that one rewrites a best seller, and Dougal Swinscow’s Statistics at Square One was one of the best selling statistical text books in the UK. It is difficult to decide how much to alter without destroying the quality that proved so popular. I chose to retain the format and structure of the original book. Most of the original examples remain; they are realistic, if not real, and tracking down the original sources to provide references would be impossible. However, I have removed the chromatic pseudonyms of the investigators. All new examples utilise real data, the source of which is referenced.

Much has changed in medical statistics since the original edition was published in 1976. Desktop computers now provide statistical facilities unimaginable then, even for mainframe enthusiasts. I think the main change has been an emphasis now on looking and plotting the data first, and on estimation rather than simple hypothesis testing. I have tried to reflect these changes in the new edition. I have found it a useful pedagogic device to pose questions to the students, and so have incorporated questions commonly asked by students or consultees at the end of each chapter. These questions cover issues often not explicitly addressed in elementary text books, such as how far one should test assumptions before proceeding with statistical tests.

I have included a number of new techniques, such as stem and leaf plots, box whisker plots, data transformation, the χ² test for trend and t test with unequal variance. I have also included a chapter on survival analysis, with the Kaplan-Meier survival curve and the log rank test, as these are now in common use. I have replaced the Kendall rank correlation coefficient by the Spearman; in spite of the theoretical advantages of the former, most statistical packages compute only the latter. The section on linear regression has been extended. I have added a final chapter on the design of studies, and would make a plea for it not to be ignored. Studies rarely fail for want of a significance test, but a flawed design may be fatal. To keep the book short I have removed some details of hand calculation.

I have assumed that the reader will not want to master a complicated statistical program, but has available a simple scientific calculator, which should cost about the same as this book. However, no serious statistical analysis should be undertaken these days without a computer. There are many good and inexpensive statistical programs. Epi-Info, for example, is produced by the Center for Disease Control (CDC) Atlanta and the World Health Organization (WHO) in Geneva. Another useful program is CIA (Confidence Interval Analysis) which is available from the BMJ.

I am most grateful to Tina Perry for secretarial help, to Simon Child for producing the figures, and to Simon Child and Tide Olayinka for help with word processing. I am particularly grateful to Paul Little, Steven Julious, Ruth Pickering and David Coggon who commented on all or part of the book, and who made many suggestions for improvement. Any errors remain my own. Finally, thanks to Deborah Reece of the BMJwho asked me to revise the book and was patient with the delays.

M J Campbell

August 1995

Categories
Publications

Statistics at Square One

Ninth Edition

T D V Swinscow

Revised by M J Campbell, University of Southampton

Copyright BMJ Publishing Group 1997

NB: Readers occasionally point out errors in this book and remind us that there have been several revised editions since this one, which we would refer our readers to. The text that is replicated here reflects exactly what was in the 1997 edition.

Contents

Preface

1 Data display and summary

2 Mean and standard deviation

3 Populations and samples

4 Statements of probability and confidence intervals

5 Differences between means: type I and type II errors and power

6 Differences between percentages and paired alternatives

7 The t tests

8 The chi-squared tests

9 Exact probability test

10 Rank score tests

11 Correlation and regression

12 Survival analysis

13 Study design and choosing a statistical test

Categories
Academic medicine

The Croatian Medical Journal’s forum on academic medicine

Academic Medicine in Russia. Edward J. Burger, Lilia Ziganshina, Airat U. Ziganshin. CMJ 2004; 45: 674-676

Revitalization of Academic Medicine in Macedonia – An Urgent Need. Donèo M. Donev. CMJ 2004; 45: 677-683

“Complementary and Alternative” Medicine – A Measure of Crisis in Academic Medicine. Matko Marušić. CMJ 2004; 45: 684-688

Academic Medicine: Dream or Nightmare?. Fred T. Bosman. CMJ 2004; 45: 371-374

Free the Dinosaurs into Butterfly Gardens: in a Search for Changing the Profile of the Academic Professional. Stella Fatović-Ferenčić. CMJ 2004; 45: 375-377

Temptation of Academic Medicine: Second Alma Mater and “Shared Employment” Concepts as Possible Way Out? Vladimir J Šimunović, Hans-Günther Sonntag, Axel Horsch, Jens Doerup, Jasminka Nikolić, Henri Verhaaren, Mladen Mimica, Benjamin Vojniković, Dejan Bokonjić, Lejla Begić, Richard Marz. CMJ 2004; 45: 378-384

Academic Cardiac Surgery in Croatia: Perspective through Eyes of an International Collaborator. William M. Novick. CMJ 2004; 45: 384-388

Are Problems of Academic Medicine a New Phenomenon?. Lajos Kullmann, Tamás Kullmann. CMJ 2004; 45: 550-552

Dilemma of an Indigent Country: Is Academic Medicine a Good Investment?. Przemyslaw Kardas. CMJ 2004; 45: 553-555

Academic Medicine in a Southern African Country of Malawi. Adamson S. Muula, Corey Lau. CMJ 2004; 45: 556-562

Family Medicine as a Model of Transition from Academic Medicine to Academic Health Care: Estonia’s Experience. Heidi-Ingrid Maaroos. CMJ 2004; 45: 563-566

The Campaign to Revitalize Academic Medicine Kicks Off: We Need a Deep and Broad International Debate to BeginPeter Tugwell. CMJ 2004; 45: 241-242

Academic Medicine: One Job or Three?. Berislav Marušić. CMJ 2004; 45: 243-244

Academic Approach to Academic Medicine. Stjepan Gamulin. CMJ 2004; 45: 245-247

Academic Medicine – Experiences from Finland and Suggestions for the Future. Vedran Stefanović. CMJ 2004; 45: 248-353

Academic Medicine: What Does an Outsider Have to Offer?. Igor Švab, Mateja Bulc. CMJ 2004; 45: 254-255

Academic Medicine and Quality of Medical Care. Reuben Eldar. CMJ 2004; 45: 256-258

Balancing Traditional Values in Academic Medicine with Advances in Science and Technology. Bruce A. Fenderson, Douglas A. Fenderson. CMJ 2004; 45: 259-263

Caring for Academic Ophthalmology in Croatia. Zdravko Mandić, Zoran Vatavuk. CMJ 2004; 45: 264-267

Categories
Academic medicine

Read more about our campaign

Mayor S. Report calls for action to improve careers in academic medicine. BMJ 2005;330:8. http://bmj.bmjjournals.com/cgi/content/full/330/7481/8

Wilkinson D, Ward RL. International Campaign to Revitalise Academic Medicine (ICRAM): what does it mean for Australia? Med J Aust. 2004 Dec 6;181(11/12):658-659. http://www.mja.com.au/public/issues/181_11_061204/wil10557_fm.html

Underwood TJ. Academic medicine: what’s in it for me? Student BMJ 2004;12:350. http://www.studentbmj.com/issues/1004/editorials/350.html

Kmietowicz Z. Campaign for academic medicine calls for radical thinking. Student BMJ 2004. http://www.studentbmj.com/issues/0704/news/270a.html

Tugwell P. Campaign to revitalise academic medicine kicks off. BMJ 2004 328:597. [Full text] ]

Jocalyn Clark. Polishing the tarnished image of academic medicine. BMJ 2004 328: 604. [Full text]

Peter Tugwell. The campaign to revitalise academic medicine kicks off. Lancet 2004 363: 836. [PDF]

Jocalyn Clark. Academic medicine: time for reinvention: Summary of responses. BMJ, 2004 328: 49. [Full Text]

Academic medicine: resuscitation in progress. Can. Med. Assoc. J. Feb 2004; 170: 309. [Full Text]

Zulfiqar Bhutta. Practising just medicine in an unjust world. BMJ 2003 327: 1000-1001. [Full text]

Paul M Stewart. Improving clinical research. BMJ 2003 327: 999-1000. [Full text]

Iain Chalmers, Cath Rounding, and Kate Lock. Descriptive survey of non-commercial randomised controlled trials in the United Kingdom, 1980-2002. BMJ 2003 327: 1017-0. [Full text]

John Bell. Resuscitating clinical research in the United Kingdom. BMJ 2003 327: 1041-1043. [Full text]

Jocalyn Clark and Richard Smith. BMJ Publishing Group to launch an international campaign to promote academic medicine. BMJ 2003 327: 1001-1002. [Full text]

Categories
Publications

Academic medicine

In 2003 The BMJ, the Lancet, and 40 other partners launched ICRAM, a global initiative that is committed to developing a new vision for academic medicine. This will focus on increasing the relevance to communities of institutions that educate doctors and other health professionals, conduct biomedical and health systems research, and care for patients.

Led by a core working party of medical academics representing 14 countries, ICRAM aims to redefine the core values of and establish an evidence base for academic medicine; develop strategy around reformed academic training; and stimulate a public debate on the future

The campaign arose because of a persistent concern that academic medicine is in crisis around the world. At a time of increasing health burden, poverty, globalisation, and innovation, many have argued that academic medicine is nevertheless failing to realize its potential and global social responsibility.

Through a series of stakeholder and regional consultations, systematic review of the available evidence, and future scenario building, ICRAM intends to produce a series of recommendations for reform in global academic medicine, including:

  1. Developing a vision of how academic medicine should look in 2020;
  2. Recommending strategies for building capacity in academic medicine, including better career paths; and
  3. Proposing how academic medicine improve its relationships with “customers,” including patients, policy makers, practitioners, and others.

At the centre of the campaign are:

  1. International working party of 20 medical academics from all over the world;
  2. Regional advisory groups that are conducting consultations in the Americas, Africa, Europe, Middle East, South Asia, and the Western Pacific;
  3. Stakeholder advisory groups representing the interests of academics, business groups, government representatives and policy makers, patients, professional associations, journal editors, and students and trainees;
  4. Leader of the campaign, Peter Tugwell, from the Centre for Global Health at the University of Ottawa, Canada.

We want your participation!

Read more about the campaign’s work:

  • Future scenario building – Feb 2005In February 2005, ICRAM and the Nuffield Trust (UK) will host a workshop, facilitated by Philip Hadridge, aimed at describing a number of alternative future scenarios for academic medicine. Previous scenario work can be found below:

Imagining futures for the NHS

The Simpsons scenarios

In preparation for this February workshop, advisors from all stakeholder and regional areas have submitted 1000 word articles on their “vision for academic medicine.”

Categories
Epidemiology

Chapter 13. Further reading

More chapters in Epidemiology for the uninitiated

Armitage P, Berry G. Statistical Methods in Medical Research . Oxford: Blackwell, 1994. A full and explicit reference work on statistics.

Barker D J P, Hall A J. Practical Epidemiology . Edinburgh: Churchill Livingstone, 1991. A short practical manual of epidemiology for use in developing countries.

Coggon D. Statistics in Clinical Practice . London: BMJ Publishing Group, 1995. A guide to the interpretation of medical statistics for non-mathematicians.

Gardner M J, Altman D G. Statistics with Confidence . London: British Medical Journal, 1989. A clearly written, short introduction to statistical methods.

Pocock S J. Clinical Trials: a Practical Approach . Chichester: Wiley, 1996. A detailed guide to clinical trials.

Rothman K J. Modern Epidemiology. Boston: Little, Brown, 1986. The most rigorous exposition of epidemiological concepts and principles.

Swinscow T D V. Statistics at Square One. London: revised by Campbell M J. BMJ Publishing Group, 1996. Medical statistics made as simple as possible.

Chapters

Categories
Epidemiology

Chapter 12. Reading epidemiological reports

More chapters in Epidemiology for the uninitiated

Epidemiological methods are widely applied in medical research, and even doctors who do not themselves carry out surveys will find that their clinical practice is influenced by epidemiological observations. Which oral contraceptive is the best option for a woman of 35? What prognosis should be given to parents whose daughter has developed spinal scoliosis? What advice should be given to the patient who is concerned about newspaper reports that living near electric power lines causes cancer? To answer questions such as these, the doctor must be able to understand and interpret epidemiological reports.

Interpretation is not always easy, and studies may produce apparently inconsistent results. One week a survey is published suggesting that low levels of alcohol intake reduce mortality. The next, a report concludes that any alcohol at all is harmful. How can such discrepancies be reconciled? This chapter sets out a framework for the assessment of epidemiological data, breaking the exercise down into three major components.

Bias

The first step in evaluating a study is to identify any major potential for bias. Almost all epidemiological studies are subject to bias of one sort or another. This does not mean that they are scientifically unacceptable and should be disregarded. However, it is important to assess the probable impact of biases and to allow for them when drawing conclusions. In what direction is each bias likely to have affected outcome, and by how much?

If the study has been reported well, the investigators themselves will have addressed this question. They may even have collected data to help quantify bias. In a survey of myopia and its relation to reading in childhood, information was gathered about the use of spectacles and the educational history of subjects who were unavailable for examination. This helped to establish the scope for bias from the incomplete response. Usually, however, evaluation of bias is a matter of judgement.

When looking for possible biases, three aspects of a study are particularly worth considering:

  1. How were subjects selected for investigation, and how representative were they of the target population with regard to the study question?
  2. What was the response rate, and might responders and nonresponders have differed in important ways? As with the choice of the study sample, it matters only if respondents are atypical in relation to the study question.
  3. How accurately were exposure and outcome variables measured? Here the scope for bias will depend on the study question and on the pattern of measurement error. Random errors in assessing intelligence quotient (IQ) will produce no bias at all if the aim is simply to estimate the mean score for a population. On the other hand, in a study of the association between low IQ and environmental exposure to lead, random measurement errors would tend to obscure any relation-that is, to bias estimates of relative risk towards one. If the errors in measurement were nonrandom, the bias would be different again. For example, if IQs were selectively under-recorded in subjects with high lead exposure, the effect would be to exaggerate risk estimates.

There is no simple formula for assessing biases. Each must be considered on its own merits in the context of the study question.

Chance

Even after biases have been taken into account, study samples may be unrepresentative just by chance. An indication of the potential for such chance effects is provided by statistical analysis.

Traditionally, statistical inference has been based on hypothesis testing. This can most easily be understood if the study sample is viewed in the context of the larger target population about which conclusions are to be drawn. A null hypothesis about the target population is formulated. Then starting with this null hypothesis, and with the assumption that the study sample is an unbiased subset of the target population, a p value is calculated. This is the probability of obtaining an outcome in the study sample as extreme from the null hypothesis as that observed, simply by chance. For example, in a case-control study of the relation between renal stones and dietary oxalate, the null hypothesis might be that in the target population from which the study sample was derived there is no association between renal stones and oxalate intake. A p value of 0~05 would imply that under this assumption of no overall association between renal stones and oxalate, the probability of selecting a random sample in which the association was as strong as that observed in the study would be one in 20. The lower the calculated p value, the more one is inclined to reject the null hypothesis and adopt a contrary view – for example, that there is an association between dietary oxalate and renal stones. Often a p value below a stated threshold (for example, 0.05) is deemed to be ( statistically ) significant, but this threshold is arbitrary. There is no reason to attach much greater importance to a p value of 0.049 than to a value of 0.051.

A p value depends not only on the magnitude of any deviation from the null hypothesis, but also on the size of the sample in which that deviation was observed. Failure to achieve a specified level of statistical significance will have different implications according to the size of the study. A common error is to weigh “positive” studies, which find an association to be significant, against “negative” studies, in which it is not. Two case-control studies could indicate similar odds ratios, but because they differed in size one might be significant and the other not. Clearly such findings would not be incompatible.

Because of the limitations of the p value as a summary statistic, epidemiologists today prefer to base statistical inference on confidence intervals. A statistic of the study sample, such as an odds ratio or a mean haemoglobin concentration, provides an estimate of the corresponding population parameter (the odds ratio or mean haemoglobin concentration in the target population from which the sample was derived). Because the study sample may by chance be atypical, there is uncertainty about the estimate. A confidence interval is a range within which, assuming there are no biases in the study method, the true value for the population parameter might be expected to lie. Most often, 95% confidence intervals are calculated. The formula for the 95% confidence interval is set in such a way that on average 19 out of 20 such intervals will include the population parameter. Large samples are less prone to chance error than small samples, and therefore give tighter confidence intervals.

Whether statistical inference is based on hypothesis testing or confidence intervals, the results must be viewed in context. Assessment of the contribution of chance to an observation should also take into account the findings of other studies. An epidemiological association might be highly significant statistically, but if it is completely at variance with the balance of evidence from elsewhere, then it could still legitimately be attributed to chance. For example, if a cohort study with no obvious biases suggested that smoking protected against lung cancer, and no special explanation could be found, we would probably conclude that this was a fluke result. Unlike p values or confidence intervals, the weight that is attached to evidence from other studies cannot be precisely quantified.

Confounding versus causality

If an association is real and not explained by bias or chance, the question remains as to how far it is causal and how far the result of confounding. The influence of some confounders may have been eliminated by matching or by appropriate statistical analysis. However, especially in observational studies, the possibility of unrecognised residual confounding remains. Assessment of whether an observed association is causal depends in part on what is known about the biology of the relation. In addition, certain characteristics of the association may encourage a causal interpretation. A dose-response relation in which risk increases progressively with higher exposure is generally held to favour causality, although in theory it might arise through confounding. In the case of hazards suspected of acting early in a disease process, such as genotoxic carcinogens, a latent interval between first exposure and the manifestation of increased risk would also support a causal association. Also important is the magnitude of the association as measured by the relative risk or odds ratio. If an association is to be completely explained by confounding then the confounder must carry an even higher relative risk for the disease and also be strongly associated with the exposure under study. A powerful risk factor with, say, a 10-fold relative risk for the disease would probably be recognised and identified as a potential confounder.

The evaluation of possible pathogenic mechanisms and the importance attached to dose-response relations and evidence of latency are also a matter of judgement. It is because there are so many subjective elements to the interpretation of epidemiological findings that experts do not always agree. However, if sufficient data are available then a reasonable consensus can usually be achieved.

Chapters