Chapter 8. Case-control and cross sectional studies

More chapters in Epidemiology for the uninitiated

Case-control studies

As discussed in the previous chapter, one of the drawbacks of using a longitudinal approach to investigate the causes of disease with low incidence is that large and lengthy studies may be required to give adequate statistical power. An alternative which avoids this difficulty is the case-control or case-referent design. In a case-control study patients who have developed a disease are identified and their past exposure to suspected aetiological factors is compared with that of controls or referents who do not have the disease. This permits estimation of odds ratios (but not of attributable risks). Allowance is made for potential confounding factors by measuring them and making appropriate adjustments in the analysis. This statistical adjustment may be rendered more efficient by matching cases and controls for exposure to confounders, either on an individual basis (for example by pairing each case with a control of the same age and sex) or in groups (for example, choosing a control group with an overall age and sex distribution similar to that of the cases). Unlike in a cohort study, however, matching does not on its own eliminate confounding. Statistical adjustment is still required.

Selection of cases

The starting point of mostcase-control studies is the identification of cases. This requires a suitable case definition (see Chapter 2). In addition, care is needed that bias does not arise from the way in which cases are selected. A study of benign prostatic hypertrophy might be misleading if cases were identified from hospital admissions and admission to hospital was influenced not only by the presence and severity of disease but also by other variables, such as social class. In general it is better to use incident rather than prevalent cases. As pointed out in chapter 2, prevalence is influenced not only by the risk of developing disease but also by factors that determine the duration of illness. Furthermore, if disease has been present for a long time then premorbid exposure to risk factors may be harder to ascertain, especially if assessment depends on people’s memories.

Selection of controls

Usually it is not too difficult to obtain a suitable source of cases, but selecting controls tends to be more problematic. Ideally, controls would satisfy two requirements. Within the constraints of any matching criteria, their exposure to risk factors and confounders should be representative of that in the population “at risk” of becoming cases – that is, people who do not have the disease under investigation, but who would be included in the study as cases if they had. Also, the exposures of controls should be measurable with similar accuracy to those of the cases. Often it proves impossible to satisfy both of these aims.

Two sources of controls are commonly used. Controls selected from the general population (for example, from general practice age-sex registers) have the advantage that their exposures are likely to be representative of those at risk of becoming cases. However, assessment of their exposure may not be comparable with that of cases, especially if the assessment is achieved by personal recall. Cases are keen to find out what caused their illness and are therefore better motivated to remember details of their past than controls with no special interest in the study question.

Measurement of exposure can be made more comparable by using patients with other diseases as controls, especially if subjects are not told the exact focus of the investigation. However, their exposures may be unrepresentative. To give an extreme example, a case-control study of bladder cancer and smoking could give quite erroneous findings if controls were taken from the chest clinic. If other patients are to be used as referents, it is safer to adopt a range of control diagnoses rather than a single disease group. In that way, if one of the control diseases happens to be related to a risk factor under study, the resultant bias is not too large.

Sometimes interpretation is helped by having two sets of controls with different possible sources of bias. For example, a link has been suggested between the phenoxy herbicides 2,4-D and 2,4,5-T and soft tissue sarcoma. Some case-control studies to test this have taken referents from the general population, whereas others have used patients with other types of cancer. Studies using controls from the general population will tend to overestimate risk because of differential recall, whereas studies using patients with other types of cancers as controls will underestimate risk if phenoxy herbicides cause cancers other than soft tissue sarcoma. The true risk might therefore be expected to lie somewhere between estimates obtained with the two different designs.

When cases and controls are both freely available then selecting equal numbers will make a study most efficient. However, the number of cases that can be studied is often limited by the rarity of the disease under investigation. In this circumstance statistical confidence can be increased by taking more than one control per case. There is, however, a law of diminishing returns, and it is usually not worth going beyond a ratio of four or five controls to one case.

Ascertainment of exposure

Many case-control studies ascertain exposure from personal recall, using either a self administered questionnaire or an interview. The validity of such information will depend in part on the subject matter. People may be able to remember quite well where they lived in the past or what jobs they did. On the other hand, long term recall of dietary habits is probably less reliable.

Sometimes exposure can be established from historical records. For example, in a study of the relation between sinusitis and subsequent risk of multiple sclerosis the medical histories of cases and controls were ascertained by searching their general practice notes. Provided that records are reasonably complete, this method will usually be more accurate than one that depends on memory.

Occasionally, long term biological markers of exposure can be exploited. In an African study to evaluate the efficiency of BCG immunisation in preventing tuberculosis, history of inoculation was established by looking for a residual scar on the upper arm. Biological markers are only useful, however, when they are not altered by the subsequent disease process. For example, serum cholesterol concentrations measured after a myocardial infarct may not accurately reflect levels before the onset of infarction.

Analysis

The statistical techniques for analysing case-control studies are too complex to cover in a book of this length. Readers who wish to know more should consult more advanced texts or seek advice from a medical statistician

Cross sectional studies

A cross sectional study measures the prevalence of health outcomes or determinants of health, or both, in a population at a point in time or over a short period. Such information can be used to explore aetiology – for example, the relation between cataract and vitamin status has been examined in cross sectional surveys. However, associations must be interpreted with caution. Bias may arise because of selection into or out of the study population. A cross sectional survey of asthma in an occupational group of animal handlers would underestimate risk if the development of respiratory symptoms led people to seek alternative employment and therefore to be excluded from the study. A cross sectional design may also make it difficult to establish what is cause and what is effect. If milk drinking is associated with peptic ulcer, is that because milk causes the disease, or because ulcer sufferers drink milk to relieve their symptoms? Because of these difficulties, cross sectional studies of aetiology are best suited to diseases that produce little disability and to the presymptomatic phases of more serious disorders.

Other applications of cross sectional surveys lie in planning health care. For example, an occupational physician planning a coronary prevention programme might wish to know the prevalence of different risk factors in the workforce under his care so that he could tailor his intervention accordingly.