RANDOMISED CONTROLLED CLINICAL TRIALS

Item

Title
RANDOMISED CONTROLLED CLINICAL TRIALS
extracted text
-1

RANDOMISED CONTROLLED CLINICAL TRIALS

CHRISTOPHER J. BULPITT
Senior Lecturer in Epidemiology,
Department of Medical Statistics and Epidemiology,
London School of Hygiene and Tropical Medicine,
Keppel Street,
London WC1
Honorary Senior Lecturer in Clinical Pharmacology,
Royal Postgraduate Medical School, London
and Honorary Consultant Physician,
Hammersmith Hospital, London

MARTINOS N1JHOFF PUBLISHERS
THE HAGUE/BOSTON/LONDON
.L

CONTENTS

Copyright 1983 © by Martinus Nijhoff Publishers, The Hague

All rights reserved. No part of this publication may be reproduced,
stored in a retrieval system, or transmitted in any form or by any means,
mechanical, photocopying, recording, or otherwise, without written
permission of the publisher, Martinus Nijhoff Publishers, P.O. Box 566,
2501 CN The Hague, The Netherlands.
Distributors:

for the United States and Canada

for all other Countries

Kluwer Boston, Inc.
190 Old Derby Street
Hingham, MA 02043

Kluwer Academic Publishers Croup
Distribution Center
P.O. Box 322
3300 AH Dordrecht
The Netherlands

Preface
Bulpitt, Christopher J.
Randomised controlled trials.

vn

1. Introduction

(Developments in biostatistics and epidemiology; v. 1)
1. Medical research—Methodology. 2. Human experimentation in
medicine. I. Title. II. Series. [DNLM: 1. Clinical trials. 2. Research design.
3. Biometry. 4. Epidemiology. W1 DE997VL v. 1 / W 20.5 B939r]
R850.B83 1983
615.5'0724
82-14544
ISBN 90-247-2749-9

1

/

G

2. The history of controlled trials

5 Cr

3. Ethical considerations

\x

12

4. The objectives of a randomised controlled trial
5. Validity of the results

35 (_,

6. Recruitment of subjects

39

~
44 C

7. How to ensure that the control and treated
patients are similar in all important respects

8. How to ensure that the results are free of bias



9. The variability of results 80
10. How many subjects are required for a trial?

(5 11 • Different trial designs

1_18

12. Writing the protocol

136

13. Information to be collected during a trial
14. The conduct of the trial

144

179

16. The evaluation of subjective well-being

5^

96

157

15. Analysis of the trial results

28

194

(9>

17. Early trials on new drugs

209

18. The detection of adverse drug reactions

214

19- Failure to accept the results of randomised controlled trials
~

221

' 20, The advantages and disadvantages of randomised controlled trials

21. References

239

PREFACE



244

-J

• iS
■ • ’ i

-11 as “A carefully and ethically designed
Bradford Hill has defined „a clinical1 trial
of answering some precisely framed question [1].
experiment with the aim <
This definition specifies a careful design and requires the provision of adequate
■ > ensure
controls. Random allocation of treatments to subjects is important to
are similar. Therefore this book is entitled
that the treated and control groups
Randomised Controlled Clinical Trials. Wc can define a randomised controlled
trial by rewriting Bradford Hill’s definition as follows, “A carefully and ethi­
cally designed experiment which includes the provision of adequate and appropriate controls by a process of randomisation, so that precisely framed
questions can be answered.”
,
.
1 am a firm advocate of Randomised Controlled Clinical Trials but intend to
give a balanced view of the advantages and disadvantages of these ethical
experiments. This book is directed primarily at the medical research worker,

i

although certain chapters may find a wider application.
When discussing a randomised controlled trial, it is neither practicable nor
desirable to divorce theory from practice, however the first ten chapters con­
centrate mainly on theory, and the remainder focus on pract.ce. The segment
on trial design is followed by sections on writing the protocol, designing the
forms, conducting the trial, and analysing the results. 1 his book is meant to
serve both as a reference manual and a practical guide to the design and

performance of a trial.

1. INTRODUCTION

1.1 DEFINITION

A randomised controlled trial was defined in the preface as “A carefully and
ethically designed experiment which includes the provision of adequate and
appropriate controls by a process of randomisation, so that precisely framed
questions can be answered.” In medical research, treatment is allocated to
subjects or certain periods of time by a random (chance) procedure.
1.2 WHY PERFORM A RANDOMISED CONTROLLED TRIAL?

The primary objective in writing this book was to demonstrate the importance
of performing randomised controlled trials and the second was to help in the
design and performance of such trials. I do not expect to convince the reader of
the necessity for randomised controlled trials in a brief introduction as an
entire half of the book is intended for this; but a preliminary discussion may be
appropriate. Any reader already convinced of the necessity for randomised
clinical trials should proceed to section 1.3.
In the current medical literature, opinion is often held in high esteem and
randomised controlled trials constitute only a small proportion of research
reports. Articles consisting of observations without randomised comparison
groups can be valuable and often generate hypotheses, some of which arc
subsequently tested by randomised controlled trials, but the preponderance of
observational studies over controlled experimentation is surprising. I leave the

diabetes mcllitus (section 19.6). Third, a treatment may have been subjected to
randomised controlled trials, but the results arc equivocal. Cochrane suggests

reader to imagine the number of controlled trials to be found in literature on
the social sciences [my italics].
Cochrane [2] observed that not only were randomised controlled trials ne­
glected in other fields but that in medicine these trials arc carried out in devel­
oped, capitalist, predominantly Protestant countries. He understood why
underdevelopment mitigated against such trials and could only speculate why
Communist or Roman Catholic countries should inhibit the performance of

tonsillectomy as an example of this group.
1.2.3 The place of randomised controlled trials

The place of randomised controlled trials must not be exaggerated. There was
no necessity for a trial of streptomycin in tuberculous meningitis; one survival
in an otherwise uniformly fatal condition was very conclusive! However,
randomised controlled trials of streptomycin were most useful in pulmonary
tuberculosis [3|. These trials could have delayed the introduction of active
treatment, but a shortage of streptomycin proved to be the limiting factor.
Also, any delay may be of value when the active treatment proves to be toxic.
The advantages and disadvantages of clinical trials are discussed further in
chapter 20. Trials may produce erroneous results but these occur far less
frequently than in uncontrolled observational studies. Moreover, the quality of
medical care given to patients in clinical trials is much higher than for the usual

randomised controlled trials.
Every time a treatment is prescribed for a patient, whether pharmaceutical
agent, operation, diet, psychological counselling, physiotherapy, or other
health care strategy, the medical practitioner is conducting a trial of treatment
in that patient. Similarly, when an administrator organises health care for a
community, for example screening, immunisation, or better housing, an ex­
periment or trial is performed. However, we wish to know whether or not the
experiment works and it is not sufficient to observe that the health of the
patient or community improves as such improvement may have nothing to do
with the experiment. Patients may get better without treatment and the health
of a community may improve without screening for diseases. We therefore
need controls who do not receive the intervention. These controls provide the
baseline against which treatment or intervention can be assessed: hence the
term controlled trials. The control group should be the same as the intervention
group in all respects apart from the intervention procedure. Chapter 7 is
devoted to demonstrating that controls must be observed concurrently with
the intervention group and that those treated and those serving as controls
must be determined by chance alone. The allocation by chance is known as
randomisation, leading to the term randomised controlled trial or RCT. The
allocation can be performed by tossing a coin on each occasion or more usually
by the use of random number tables (chapter 7).

processes of medical care [4|.
1.3

This book is intended as a reference manual for research workers involved in
randomised controlled trials and is aimed at the field of medical research. It is
hoped that the contents of the book arc also relevant to the needs of dental,
veterinary, and social science research workers. Also many aspects of trial
design have been employed initially in agricultural experiments.
After this introduction, the book considers historical aspects of the subject
and the ethical aspects of trial design. We have first to agree the ground rules
for trials in patients and in normal paid volunteers. What risks arc allowable
for the former, if any? It has been suggested that it would be unethical to
perform a trial involving the transmission of infectious hepatitis in man, yet

1.2.1 Trials of new therapy
Randomised controlled trials arc necessary to prove the effectiveness of new
health care strategics or treatment and to prevent the introduction of new but

useless treatments.

1.2.2 Trials of accepted treatment
Many currently accepted treatments require proof of their effectiveness and
trials arc still necessary. Such treatments fall into three groups: first, a therapy
may have been introduced prior to the advent of clinical trials. Cochrane |2]
suggests that psychotherapy, physiotherapy, and surgery for carcinoma of the
bronchus can be included in this group. Second, there may be experimental
proof of biochemical, psychological, or other effect, but no evidence that the
treatment does more good than harm as long-term therapy. Examples arc
provided by anticoagulant therapy for the secondary prevention of myocardial
infarction (section 19.4) and oral hypoglycacmic drugs for maturity-onset

DESCRIPTION OF THIS BOOK

II

such a trial was carried out.
Following the discussion on ethics, subsequent chapters will closely define
the trial objectives, validity, recruitment, randomisation, freeing observations
from bias, the variability of results, and the numbers required for the trial. The
remainder of the book is concerned with practical matters: specific trial de­
signs, writing the protocol, designing the documents, conducting the trial,
analysing the results, trials to measure the quality of life, trials on new drugs,
the detection of adverse reactions, why results arc not accepted, and the advan­
tages and disadvantages of randomised controlled trials.
It is hoped that readers who arc not involved in the performance of RC I s
will find themselves better able to assess the results of such trials. More impor­
tantly, promotional studies, masquerading as important randomised trials,
may thus be given the scant attention they deserve. We will attempt to identify
the necessary qualities of a satisfactory trial so that we may more readily assess
any results. Initially, we will ask if the trial incorporates satisfactory controls.

A trial that is not controlled is often called a study and the results most
favourable to a particular treatment usually derive from an open evaluation.
An example is given by an article sent to general practitioners by a phar­
maceutical company entitled “----------in hypertension. General Practice Study.
Preliminary report on 717 patients treated originally with mcthyldopa. The
dose of mcthyldopa in these patients was reduced but not stopped and -—
was started. The article stated “Whatever the reasons, there is good evidence
that in the majority of this group, control of the blood pressure improved
when the dose of mcthyldopa was reduced and--------- was substituted. After
______ was introduced there was a reduction in unwanted side effects, and
four out of five patients reported subjective improvement.’’ This was an open
evaluation with a predictable result. Give a new drug with enthusiasm and the
patient will feel better. This study made nearly 600 patients feel better, but
some or all of the improvement may have been due to the attention of the
general practitioner rather than the new drug. An article in the Sunday 1 iines of
January 29, 1978 attacked these marketing trials and put it more clearly. “If the
doctor believes the new drug may help, he will almost ccr tainly tell the patient, so the patient will be inclined to prefer it to his previous drug, which
may be just as good. Few of these trials make a proper scientific comparison of
the new drug with other drugs or with a dummy tablet.’’ The newspaper
article went on to consider the profits to be made by pharmaceutical com­
panies who introduce new drugs to general practitioners in this manner if the
drug continues to be prescribed after the end of the trial. Not every one agrees
that making a profit is undesirable, especially in view of the therapeutic ad­
vances made by the pharmaceutical industry. However, there must be ade­
quate proof of benefit and a randomised controlled trial is the method of

2. THE HISTORY OF CONTROLLED TRIALS

2.1 THE EARLIEST TRIALS

The first well documented randomised controlled trial of medical treatment
may have been that organised by the Medical Research Council and reported
in 1948 |3|. However, Rose and Armitage have described a possible RC1
dating from 1662 |5] and R.A. Fisher introduced randomised trials into ag-

choice.

ricultural research in 1920.
Nonrandomised trials date back many years. L’Etang [6] considered that the
story of Daniel contained a report of a clinical trial. Nebuchadnezzar II or­
ganised the trial by giving youths of royal blood, including Daniel, a rigid diet
of meat and wine for three years. The trial was supervised by a eunuch
[monitor]. Nebuchadnezzar’s trial was not controlled but Daniel “persuaded
the monitor to give him and three others a diet of pulse and water for 10 days.
L’Etang reported that these four were “fairer in countenance and fatter in body
than the other subjects who were given meat and wine’’ and concluded that
“Daniel had ruined the trial . . . and the trial had become uncontrolled
Daniel had not ruined the trial but had performed one of the first controlled

i

i

trials: a within-subject cross-over study.
Bull has reviewed the history of clinical trials and 1 am indebted to him tor
much material in this chapter [7], Bull cited a second unintentional trial by
Ambroisc Parc. In 1537 Pare was responsible for the treatment of numerous
wounded and ran out of boiling oil used for cauterising the wounds I le was
“constrained to apply in its place a digestive made of yolks of eggs, oil of roses

and turpentine.” The following day he was surprised to find that those receiv­
ing the new medicant ‘‘feeling but little pain, their wounds neither swollen nor
inflamed ...” Those who received boiling oil ‘‘were feverish with much pain
and swelling about their wounds.” Pare concluded that the digestive was
superior to burning oil but perhaps we would now suggest a longer period of
observation would be appropriate in view of the likelihood of subsequent
sepsis.
2.2 SCURVY

James Lind showed the superiority of citrus fruits in the treatment of scurvy.
Interestingly, he commits one fundamental error: he appears to have given
two of the worst patients a particular treatment (sea water). Perhaps sea water
was his favourite treatment. If this was the ease, and we assume that more
severely affected patients arc less easy to cure, the provision of a favourite
treatment for these patients will mitigate against demonstrating a benefit.
Random allocation to treatment groups prevents this difficulty; this advance

did not occur until 1948.
2.3 VACCINATION AGAINST SMALLPOX

Bull also reported an unintentional trial from 1600 on an expedition to India by
the East India Company. Only one of four ships had lemon juice provided.
The ship in question was almost free from scurvy yet the condition was
rampant on the other three ships. The company provided lemon juice on all its
ships thereafter but presumably this preventive treatment was not fully ac­
cepted until 150 years later, when James Lind performed a controlled trial.
Bradford Hill quotes James Lind in his book Statistical Methods in Clinical and
Preventive Medicine [8|.

In the early eighteenth century inoculation against smallpox with live virus
was introduced from Constantinople by Maitland and Lady Mary Wortley
Montagu. (They, of course, knew nothing about viruses and reported the
inoculation of “smallpox matter”.) They arranged for six convicts to be in­
oculated; all survived and later one was exposed to smallpox and found to be
immune [9]. This trial did not reveal that the use of live virus could frequently
lead to death and that those inoculated could be infectious, thereby leading to

On the 20th May, 1747, I took twelve patients in the scurvy, on board the Salisbury at
sea. Their cases were as similar as I could have them. They all in general had putrid
gums, the spots and lassitude, with weakness of their knees. They lay together in one
place, being a proper apartment for the sick in the fore-hold; and had one diet common
to all, viz. water-gruel sweetened with sugar in the morning, fresh mutton-broth often
times for dinner; at other times puddings, boiled biscuit with sugar etc. and for supper,
barley and raisins, rice and currents, sago and wine, or the like. Two of these were
ordered each a quart of cyder a day. Two others took twenty-five gutts of elixir vitriol
three times a day, upon an empty stomach; using a gargle strongly acidulated with it for
their mouths. Two others took spoonfuls of vinegar three times a day, upon an empty
stomach; having their gruels and their other food well acidulated with it, as also the
gargle for their mouths. Two of the worst patients, with the tendons in the ham rigid (a
symptom none of the rest had) were put under a course of sea-water. Of this they drank
half a pint every day, and sometimes more or less as it operated by way of gentle
physic. Two others had each two oranges and one lemon given them every day. These
they ate with greediness, at different times, upon an empty stomach. They continued
but six days under this course, having consumed the quantity that could be spared. The
two remaining patients, took the bigness of a nutmeg three times a day of an electuary
recommended by a hospital-surgeon, made of garlic, mustard-feed, rad. raphan, balsam
of Peru, and gum myrrh; using for common drink barley-water well acidulated with
tamarinds; by a decoction of which, with the addition of crcmor tartor, they were
greatly purged three or four times during the course.
The consequence was, that the most sudden and visible good effects were perceived
from the use of the oranges and lemons, one of those who had taken them, being at the
end of six days fit for duty. The spots were not indeed at that time quite off his body,
nor his gums sound, but without any other medicine, than a gargle of elixir vitriol, he
became quite healthy before we came into Plymouth, which was on the 16th June. The
other was the best recovered of any in his condition; and being now deemed pretty
well, was appointed nurse to the rest of the sick.

beef and drink milk, we could be vaccinated with cowpox.
Jesty had performed a remarkable trial and 20 years later Edward Jenner, not
knowing of farmer Jesty, was considering the problem and remembered the
words of his teacher, the famous surgeon, John Hunter: ‘‘Don’t think. Do an
experiment.” Jenner proceeded to do a similar trial which he wrote up in 1798
in An Inquiry into the Causes and Effects of the Variolae Vaccinae [11]. He de­
scribed 23 patients with cowpox who were resistant to smallpox inoculation
and assumed that all persons who had neither contracted cowpox nor smallpox
would react positively to inoculation with smallpox matter. I he trial was not
controlled in the strict sense of the word, and controls were necessary owing to
the possibility of either natural immunity or previously acquired immunity.
Such persons could be vaccinated with cowpox and be immune to smallpox

an increase of smallpox in the community.
A similar trial, but using cowpox matter, was next performed by a farmer,
Benjamin Jesty in 1774 [10]. Country folk reported that if they had cowpox
they would not get smallpox and that cowpox was a mild disease. Farmer Jesty
vaccinated himself, his wife, and two children with cowpox material using a
stocking needle. Apparently the children were later inoculated with smallpox
and were unaffected. Mr. Jesty proceeded to vaccinate his milkmaids but his
neighbours considered that such a ‘‘bestial” manifestation of smallpox should
not be given to man! Benjamin Jesty countered that if we were prepared to eat

without cause and effect.
Jenner was partly aware of this problem and stated. Io convince myself
that the variolus matter made use of was in a perfect state, I at the same time
inoculated a patient with some of it who had never gone through the cowpox,
and it produced the smallp'•ox in the usual manner.” A further problem arose
from interpreting resistance to the disease. We now know that inoculating
with smallpox would be unethical as a control procedure. Pearson (12] was less

interested in the resistance of his subjects than the state of his variolus matter
and made further controlled observations, also in 1798. He observed three
patients who had had cowpox and two who had not. Only the three who had
had cowpox were immune to inoculation. Interestingly and commendably,
Pearson recommended “well-directed observation in a thousand cases of in­
oculated cowpox.” Waterhouse performed a similar controlled trial published
two years later [13]. He employed the same number of controls (two) but had
12 in his treatment group.

2.4 THE USE OF PLACEBOS IN THE NINETEENTH CENTURY
When there is doubt about the effectiveness of a particular treatment and when
alternate effective treatment is either not available or not required in the short­
term, the modern controlled trial may employ a period of placebo medication.
In 1801 Haygarth was one of the first to employ placebo treatment [14]. He
used dummy appliances to investigate the effects of Perkin’s tractors, whose
metal rods were supposed to cure by an electrical influence. Haygarth studied
five patients; four were helped by the use of wooden imitation tractors. Not
only was the trial possibly the first to use placebo medication, but Haygarth
quoted Lind, thus “an important lesson in physic is here to be learnt, viz. the
wonderful and powerful influence of the passions of the mind upon the state
and disorders of the body. This is too often overlooked in the cure of dis­
ease ...”
In 1865 Sutton published a trial of mint water in 20 patients with rheumatic
fever [15]. He used mint water, not as an active but as a placebo treatment. On
observing a marked tendency to spontaneous cure he remarked, “the best
treatment for rheumatic fever has still to be determined.” Sutton had been
examining the reports of Or. Gull and he also reported, No selection was
made, but that Dr. Gull treated the eases which happened to be admitted into
his wards on the same plan; and we would further beg to say that these reports
were not kept for any special object, nor arc they as complete as they might be;
yet the facts stated, may be fully relied upon, and so far answer our purpose.”
These interesting admissions make us suspect that the study was not so well
conceived and prospective in design as it first appeared. However, the honesty
of Dr. Sutton led to further qualifications that we now agree would be unlikely
to influence the course of the disease. “. . . these eases cannot be considered to
have been treated solely on the expectant plan, for an occasional dose of
Dover’s powder or half a grain of opium, right or wrong, and two or three
ounces of brandy a day, are remedies that might be fairly expected to exercise
some, although, perhaps, little influence over the course of the disease.” He
wisely stated, “Therefore, eases treated, as the following eases have been, by
such simple means that we might almost consider them to be unassisted by any
remedy, arc invested with no little interest . . . the results . . . will probably
warrant us concluding that we ought not to be too hasty in considering the
apparent sudden and favourable change in the symptoms due to any medicine

administered.”

2.5 SCIENTIFIC BASIS FOR CONTROLLED TRIALS

Although the advent of the randomised controlled trial had to wait until the
twentieth century, much of the related scientific thinking was published in the
nineteenth century. Laplace thought that probability theory ought to be ex­
tended to help explain the results observed in medical practice [16], and P.C. A.
Louis advocated numeracy in assessing results [17]. Louis stated. As to differ­
ent methods of treatment, if it is possible for us to assure ourselves of the
superiority of one another among them ... it is doubtless to be done by
enquiring if. . . a greater number of individuals have been cured by one means
than another. Here again it is necessary to count.” He went on to consider the
necessity for controls “in order that the calculation may lead to useful or true
results ... we ought to know the natural progress of the disease.” He also
appeared concerned about noncompliancc: “we ought to know . . . whether
the subjects have not committed errors of regimen.” Amusingly, he thought
his numerical method offered “real difficulties in its execution . . . this method
requires much more labour and time than the most distinguished members of
’our profession can dedicate to it. But what signifies this reproach, except that
the research of truth requires much labour, and is beset with difficulty.
Louis used his numerical method in investigating the effect of venesection in
78 eases of pneumonia [18]. Some patients were bled and others were not.
Louis not only examined mortality but also symptoms and signs and con­
cluded that bleeding made no difference in outcome. This result was not in
keeping with the medical practice at this time, and, not unexpectedly, caused
for the
an uproar. However, his findings came to be accepted, a triumph
t

clinical trial.
2.6 THE PROVISION OF HISTORICAL CONTROL GROUPS

By the middle of the nineteenth century, rigorous methods of observation had
been defined, the necessity for controls realised, and even the statistical theory
of probability could have been used in the analysis of results. However, the
selection of controls still led to biased results. Elisha Bartlett (19] described the
essential requirements for control and treated patients: they should

have equal disturbing factors of location, social class, and the like
1. 1
_
2. be susceptible of a clear and positive diagnosis
3. not be selected
4. be subjected to a clearly defined method of treatment.

The provision of a more appropriate control group came to be recognised as
important and trials started to employ carefully followed historical controls. In
1870 Lister [20] compared the mortality of 35 historical controls with 40
patients treated with antiseptics. Forty-three percent of the controls died but
only 15 percent of the treated group. Bull [7] first pointed out that Lister was
cautious about drawing conclusions and, second, that more appropriate con-

trols “might have prevented the bitter and profitless controversy which raged
for many years.” Lister stated, “These numbers arc, no doubt, too small for a
satisfactory statistical comparison ...” Bull pointed out that, “The chisquared test shows them to be highly significant”; perhaps the controversy
would have raged less if Lister were a more effective lecturer and a more
dogmatic writer. However, historical controls arc not appropriate for ran­
domised controlled trials. Pocock [21] has listed 19 instances of the same
intervention being used twice in consecutive groups of patients in the same
institution. In four of these instances mortality was significantly different be­
tween the groups (see also section 8.2.3).
2.7 THE PROVISION OF CONCURRENT CONTROL GROUPS

The lack of acceptance for Lister’s trial can be contrasted with that of Pasteur’s
vaccine for the prophylaxis of anthrax in animals. Pasteur used 60 sheep in the
experiment; 25 were inoculated and then infected and 25 were not inoculated
but were infected. An additional ten sheep were neither inoculated nor in­
fected. Chance allocation appears to have been employed to some extent in this
trial as critical observers suggested the order in which pairs of inoculated and
control animals should be infected [22], All the animals who had been in­
oculated survived; the 25 controls died. The results of this trial were im­
mediately accepted.
Another early controlled trial was performed by Fibiger. In 1898 he reported
a trial of anti-diphtheria serum in alternate patients [23]. He studied 488 pa­
tients and showed a reduction in mortality in the patients treated with scrum.
He also recorded the fact that the diphtheritic membrane disappeared quicker
in the treated cases.
In 1945 a trial of penicillin in the treatment of wounds was attempted in the
21 Army Group [24]. The control group was to be those who were given any
alternative treatment. Unfortunately, the surgeons were unwilling to with­
hold penicillin in the presence of serious wounds and the group treated with
penicillin were more seriously affected. Despite this bias, the wounds healed
quicker in the pcnicillin-treated group.
2.8 THE DELIBERATE USE OF RANDOMISATION TO
PRODUCE SIMILAR TREATMENT AND CONTROL GROUPS

The first modern trial to deliberately employ randomisation may have been
the Medical Research Council trial of streptomycin reported in 1948 [3], The
introduction to the trial pointed out that the natural history of pulmonary
tuberculosis was so variable that “evidence of improvement or cure following
the use of a new drug in a few eases cannot be accepted as proof of the effect of
that drug.’’ The introduction further pointed out that there had been only one
report of an adequately controlled trial in tuberculosis and that was a trial of
gold therapy [25]. This trial was negative and counteracted the exaggerated
claims for gold treatment that had been made for over 15 years. Patients

entering the trial of streptomycin were restricted to those who were both
unlikely to improve spontaneously and yet were likely to respond to an active
chemotherapeutic agent. It was therefore decided that patients chosen had to
have acute progressive bilateral tuberculosis; subjects were excluded if they
had long-standing disease. The control treatment was to be bed rest and pa­
tients were excluded if they required pulmonary-collapse therapy.
The new feature of the trial was the randomisation of patients into control
and treated groups. The report stated:

Determination of whether a patient would be treated by streptomycin and bed-rest (S
case) or by bed-rest alone (C ease) was made by reference to a statistical series based on
random sampling numbers drawn up for each sex at each centre by Professor BradfordHill; the details of the scries were unknown to any of the investigators or to the co­
ordinator and were contained in a set of sealed envelopes, each bearing on the outside
only the name of the hospital and a number. After acceptance of a patient by the panel,
and before admission to the streptomycin trial, the appropriate numbered envelope was
opened at the central office; the card inside indicated whether the patient was to be an S
or a C ease, and this information was then given to the medical officer of the centre.
Subsequent analysis showed that random allocation had led to the two
groups being comparable at entry to the trial. After six months, 51 percent of
the treated group showed considerable radiological improvement (radiographs
were assessed without knowledge of the treatment group); only eight percent
of the control group showed such improvement. Seven percent of the treated
group were dead in six months as opposed to 27 percent of the control group.
The ethical considerations did not present a problem as bed rest was consid­
ered to be the only possible alternative treatment and only limited supplies of
streptomycin were available. As not all eases could be given the new drug, it
was reasonable and practicable to give it to a random half. The randomised
controlled trial was therefore born just over 30 years ago and has since gone
from strength to strength.

to taking penicillin or aspirin but most subjects would be willing to face these
risks in a controlled trial. This is an example of a reasonable or acceptable risk.
3.2 LEGAL CONSIDERATIONS

3. ETHICAL CONSIDERATIONS

As Wade has pointed out [27], “Although the subject needs protection, the
community needs knowledge.” He considered how a subject should be
indemnified if matters go wrong. The institution where the trial takes place
must have a public liability insurance policy in case anything untoward hap­
pens to a subject as a result of negligence. With a new drug not in ordinary use,
the policy may not cover such a contingency and, where applicable, the phar­
maceutical company should agree to carry the risk. I also support Wade’s idea
that institutions should have no fault liability insurance so that subjects in trials
may claim compensation for injury even when negligence docs not occur. For
example, a patient who experiences an adverse drug reaction while taking part
in a trial could be recompensed.
3.3 DECLARATIONS ON MEDICAL ETHICS

3.3.1 The Nuremberg Code
i

3.1 DEFINITION

The Concise Oxford Dictionary defines ethics as the “science of morals.” Glaser
considered that a discussion of ethical problems should embrace both an as­
sumption of right and wrong and a definition of how things arc and not just
how things should be [26]. Moreover, ethical problems concern the individual
rather than the community. The community may benefit from the results of a
trial but no individual should be asked to take an unreasonable risk to benefit
the community. Problems arise when we arc forced to consider what is rea­
sonable.
Ethical considerations are not legal requirements, but the law may support
an ethical stance. Lawyers usually consider precedents and determine the truth
of matters by discussion. We can emulate this process for a definition of the
term reasonable. At one extreme, it is obviously not ethical to force (or even
request) a subject to take part in a dangerous study. Such trials were performed
on non-Aryan prisoners in Nazi Germany. Subjects were exposed to extremes
of temperature and trials of resuscitative techniques were employed. These
experiments were obviously detrimental to and often fatal for the subject.
Even if the experiments had revealed a resuscitative technique capable of sav­
ing the lives of many in the community, these trials were obviously unethical
and the risks to the individual unreasonable. At the other extreme, every
patient who agrees to take a medication must accept some risk. There arc risks

Following the trials of Nazi war criminals, ten standards were laid down in
1947 [28].

1. The subject must give his or her voluntary consent, knowing the nature,
direction, purpose, inconveniences, and hazards of the experiment.

2. The experiment should be necessary both in yielding fruitful results for
the good of society and in the sense that the information cannot be gained
without the experiment.
3. The anticipated results justify doing the experiment (sec section 3.3.2.
Clinical Research Combined with Professional Care and Nontherapeutic
Clinical Research).
4. All unnecessary physical and mental suffering must be avoided (see The
Use of Sham Operations, section 3.8.3).
5. There should be no a priori reason to believe that death or injury will
occur.

6. The degree of risk shall not exceed the humanitarian importance of the
problem (see section 3.1 and the discussion on reasonable).
7. Preparations should be made and adequate facilities provided against the
remote possibility of adverse effects.
8. Those who conduct the experiment shall exercise the highest degree of
skill and care and be scientifically qualified.
9. The subject must always be free to bring the experiment to an end.
10. The investigator must terminate the experiment if its continuation may be
detrimental to the patient.

I

3.3.2 The declaration of Helsinki
The World Medical Association produced the following declaration [29], pref­
aced by binding the doctor with the words, “the health of my patient will be
my first consideration.”

I. Basic Principles
1. Clinical research must conform to the moral and scientific principles
that justify research, and should be based on laboratory and animal
experiments or other scientifically established facts.
2. Clinical research should be conducted only by scientifically qualified
persons and under the supervision of a qualified medical man.
3. Clinical research cannot legitimately be carried out unless the impor­
tance of the objective is in proportion to the inherent risk to the
subject.
4. Every clinical research project should be preceded by careful assess­
ment of inherent risks in comparison to foreseeable benefits to the
subject or to others.
5. Special caution should be exercised by the doctor in performing clini­
cal research in which the personality of the patient is liable to be
altered by drugs or experimental procedure.
II. Clinical Research Combined with Professional Care
1. In the treatment of the sick person the doctor must be free to use a
new therapeutic measure if in his judgement it offers hope of saving
life, re-establishing health, or alleviating suffering.
If at all possible, consistent with patient psychology, the doctor
should obtain the patient’s freely given consent after the patient has
been given a full explanation. In ease of legal incapacity consent
should also be procured from the legal guardian; in ease of physical
incapacity the permission of the legal guardian replaces that of the
patient.
2. The doctor can combine clinical research with professional care, the
objective being the acquisition of new medical knowledge, only to the
extent that clinical research is justified by its therapeutic value for
the patient.
III. Nontherapeutic Clinical Research
1. In the purely scientific application of clinical research carried out on a
human being it is the duty of the doctor to remain the protector of the
life and health of that person on whom clinical research is being car­
ried out.
2. The nature, the purpose, and the risk of clinical research must be
explained to the subject by the doctor.
3a. Clinical research on a human being cannot be undertaken without his

free consent, after he has been fully informed; if he is legally incompe­
tent the consent of the legal guardian should be procured.
3b. rhe subject of clinical research should be in such a mental, physical,
and legal state as to be able to exercise fully his power of choice.
3c. Consent should as a rule be obtained in writing. However, the
responsibility for clinical research always remains with the research
worker; it never falls on the subject, even after consent is obtained.
4a. The investigator must respect the right of each individual to safeguard
his personal integrity, especially if the subject is in a dependent rela­
tionship to the investigator.
4b. At any time during the course of clinical research the subject or his
guardian should be free to withdraw permission for research to be
continued. The investigator or the investigating team should discon­
tinue the research if in his or their judgement it may, if continued, be
harmful to the individual.

The Helsinki declaration clearly differentiated between the situation when
the subject, usually a patient, can hope to benefit from the experiment and the
situation where no such benefit can be expected.
Section II. 2 of the declaration stated that clinical research can be combined
with professional care “only to the extent that clinical research is justified by its
therapeutic value for the patient.” This must be the overriding ethical consid­
eration and the use of patients as volunteers for experiments not relevant to
treatment presents great difficulties and will be discussed in section 3.5.
Sir Austin Bradford Hill has taken issue with two recommendations of the
World Medical Association [30]. He found that there are experiments such as
“in industrial psychology—which arc not the prerogative, or even within the
special competence, of the medically qualified,” and he therefore objected to
item I. 2, which insisted on the supervision of a qualified medical man. Hill
also disagreed with the idea that the nature and purpose of the trial must be
explained to the subject and stated “. . . I have no doubt whatever that there
arc circumstances in which the patient’s consent to taking part in a controlled
trial should be sought. I have equally no doubt that there arc circumstances in
which it need not—and even should not—be sought.”

3.3.3 Bradford Hill’s specific questions

Bradford Hill was unhappy with codes that deal in generalities and take no
heed of “the enormously varying circumstances of clinical medicine’’ [30]. He
stressed the necessity for “The close and careful consideration in the specific
circumstances oj each proposed trial" and formulated a series of questions to be
answered for each trial.

1. Is the proposed treatment safe or, in other words, is it unlikely to do harm
to the patient?
2. Can a new treatment ethically be withheld from any patients in the doctor’s
care?
Tuberculous meningitis was a universally rapidly fatal condition and when
the first case reports revealed that streptomycin treatment had resulted in
the patients’ recovering, this fact was conclusive evidence of the effec­
tiveness of the new treatment. It was then not ethical to perform a clinical
trial of streptomycin in tuberculous meningitis. However, respiratory
tuberculosis runs a more variable course and it was ethical to perform the
randomised controlled trial of streptomycin in this condition. Moreover,
only a limited amount of streptomycin was available at that time (1947) and
as all eases could not be treated, it can be argued that it would be unethical
not to have performed the trial.
3. What patients may be brought into a controlled trial and allocated randomly to different treatments?
4. Is it necessary to obtain the patient’s consent to his inclusion in a controlled
trial?
5. Is it ethical to use a placebo or dummy treatment?
6. Is it proper for the doctor not to know the treatment being administered to
his patient?

3.3.4 Medical Research Council

A statement by the Medical Research Council (MRC) [31] gave two examples
of when informed consent may not always be desirable. For example, when
the patient has a possibly fatal illness without effective treatment being avail­
able, and second, when a placebo is employed. The MRC considered in 1964
whether any supervision of the conduct of controlled trials (or other experi­
ments) was necessary and concluded ‘‘controlled clinical trials should always
be planned and supervised by a group of investigators and never by an individ­
ual alone.” The MRC report also suggested that no paper should be accepted
for publication if there arc any doubts about the ethical conduct of the study
leading to the report.
3.4 RESEARCH ETHICAL COMMITTEES

In 1967 a committee appointed by the Royal College of Physicians of London
suggested the formation of ethical committees consisting of “a group of doc­
tors including those experienced in clinical investigation” [32|. By 1973 the
functions and constitution of these committees had been formalised.
The final report made the following recommendations:
,
1. A Research Ethical Committee shall be a small committee set up solely to
supervise the ethics of clinical research.

2. Thc medical members should be experienced clinicians with knowledge
and experience of clinical research.
3. Fhe Research Ethical Committee should have a lay member.
4. To remove any uncertainty about which procedures should be submitted to
a Research Ethical Committee, all proposed research investigations in hu­
man beings should be submitted.
5. Whenever a research investigation was not expected or intended to benefit
the individual patient a full explanation should be given and the patient
should be free to decline to participate or to withdraw at any stage.
6. Whenever possible the consent of a patient should be obtained in the pres­
ence of a witness.
7. When there are circumstances in which it is genuinely inappropriate to
inform a patient fully, it is the duty of the Research Ethical Committee to
examine the situation with special care.
8. Particular care is needed if clinical investigation is proposed in children or
mentally handicapped adults who cannot give informed consent. The par­
ents or guardian should be consulted.
9. Particular care is needed if clinical investigation is proposed on a subject or
patient who has any sort of dependent relationship to the investigator, for
example, student, laboratory technician, or employee.
3.5 RANDOMISED CONTROLLED TRIALS
WITHOUT POSSIBLE BENEFIT TO THE PARTICIPANT

Examples of these trials arc provided by early drug studies in normal men and
women and trials of drug interactions in patients on chronic treatment. Early
drug trials in normal men arc usually dose-finding experiments to assess the
human counterpart of observations made in animals. They arc not, initially,
randomised controlled trials but slightly later studies may constitute a ran­
domised trial of the new treatment (in the predetermined dose) versus an
established drug (chapter 17).
In trials on patients, those on chronic treatment with one drug may be asked
to take a second drug to assess the effect of the drugs in combination. This may
be suggested when the second drug cannot be expected to benefit the patient.
An example can be given of patients on long-term antihypertensive drugs who
arc also asked to take an antidepressant or antiinflammatory drug to assess
whether or not the second drug worsens blood-pressure control.
For trials without possible therapeutic benefit for the individual, all subjects
and patients must be true volunteers, receive full information about the study,
give written consent (preferably in front of a witness), and not receive an excessive
reward. If the subjects arc paid a considerable amount they may be tempted to
participate in a study, whereas without this remuneration they may refuse.
This restriction does not exclude an allowance for fares, meals, and compensa­
tion for lost earnings as volunteers should not be expected to experience a
financial loss.

3.6 WHO SHOULD PARTICIPATE IN TRIALS
WITHOUT POSSIBLE THERAPEUTIC BENEFIT?
3.6.1 Employees of the pharmaceutical industry
Glaser outlined the ease very clearly for using employees of the pharmaceutical

industry [26].
Those who decide that a new substance can be safely tried in man should have enough
confidence to take it themselves. If they will not take it themselves, they should not
give it to others. Those who know the most about the substance and who arc the most
experienced scientists can make the best personal decisions about it and they arc also the
best able to observe their own subjective effects. I bus the first to take a new substance
might be the research director, the medical director, the senior toxicologist, or advisers
in pathology.

Glaser also considered that a volunteer’s family doctor should be informed
about the trial. This may ensure that trials that arc unacceptable to general
practitioners are not performed, and if there is some medical reason why an
individual volunteer should not participate, then the investigator may be in­
formed of this fact. Lastly, if the volunteer should become ill during or after
the trial, then the general practitioner will be aware that the trial is in progress.
Volunteers must not be solicited from subordinates by their seniors. Only a
comparatively junior person should perform this task and the supervisors
should be told only who is suitable. Glaser reported “anyone unwilling is
unsuitable’’ and children under the age of 14 and mental patients cannot
volunteer.
3.6.2 Prisoners

Prisoners arc used in medical experiments in the United States of America.
The problem with this procedure is that a reduction in prison sentence may
constitute an excessive reward and result in the subjects not being free volun­
teers. The report of a Committee appointed by the Governor of Illinois stated

[33]:
A reduction in sentence in prison, if excessive or drastic, can amount to undue in­
fluence. If the sole motive of the prisoner is to contribute to human welfare, any
reduction in sentence would be a reward. If the sole motive of the prisoner is to obtain a
reduction in sentence, an excessive reduction of sentence which would exercise undue
influence in obtaining the consent of prisoners to serve as subjects would be inconsistent
with the principle of voluntary participation.
The committee considered the function of imprisonment, for example,
whether this is to protect society or to reform the prisoner. The members
discussed whether a prisoner would volunteer from good social consciousness
or in a desire to reduce his sentence. In view of the latter incentive, the

committee concluded that a prisoner should not be allowed to volunteer if he is
a habitual criminal or if he has committed a notorious or heinous crime.
Presumably, the committee members were worried about having such a per­
son released early.
The committee also concluded that any proposed reduction in sentence must
not be excessive.Glaser [26] also worried that the incentives for prisoners may
be too high. He considered the possibility of prisoners getting privileges for
participation and even that the relief of boredom might prove a great incen­
tive, possibly a coercion inconsistent with voluntary participation.

3.6.3 Patients
Patients arc the ultimate beneficiaries of advances in medical care and Claude
Bernard considered it their duty to assist with research. However, should the
individual patient in the trial be the possible beneficiary or should the benefit
go to other patients with different conditions? If we wish to assess the interac­
tion between an antihypertensive and an antiinflammatory drug, we may ask
any hypertensive patient or patients with both hypertension and arthritis to
cooperate. In the latter instance, the treatment is relevant to the patient’s
condition and the patient may benefit. However, when the patient has hyper­
tension alone, he must be considered as a normal volunteer and great care must
be taken that the doctor-patient relationship is not used to exert too much
pressure on the patient to participate. The patient must not volunteer from a
sense of gratitude or in the hope of better medical attention and it is a wise
precaution for the doctor treating the patients to ask them to discuss taking
part in a trial with another colleague. The doctor undertaking the usual treat­
ment should make it clear that the patient’s failure to participate in a trial will
not affect his usual medical care in any way.

3.7 INFORMED CONSENT
3.7.1 Information for the patient or subject

The patient or subject should be fully informed of the nature of the trial: that
is, the number of investigations and visits required and the duration of the
trial. The objectives of the trial should be stated, provided such statements arc
compatible with the usual doctor-patient relationship. It may be unethical to
give full information to patients with, say, cancer, either when the diagnosis
cannot be revealed or when it is not in the patients’ best interest to describe the
inadequacies of available treatment. However, in most instances the patients
can be given all the relevant information and should be told when a placebo
(dummy) treatment is to be employed in the trial. In conclusion, the patients
should be informed of the following, in writing, and preferably in the presence
of a witness:
1. the nature of the treatments being compared
2. the objectives of the trial



3. the duration of the trial
4. what the trial involves for the patients (number of visits, investigations, ct
cetera)
5. possible benefits to be derived from the treatments
6. possible hazards of the treatments
7. what to do if the patients become unwell, run out of tablets, ct cetera

3.7.2 Written consent

Written consent should be obtained; otherwise, there can be no proof that
consent was given and such evidence may be necessary in a court of law. The
patients should be asked to sign a document giving the full information dis­
cussed in section 3.7.1 and including a declaration similar to the following: “I
----------, have read the above description of the trial and agree to take part. I
understand that 1 may withdraw my co-operation at any stage should I so
wish.” The patients therefore sign to say they have been informed about the
trial and have agreed to take part. Some authorities may insist that the declara­
tion be signed in the presence of a witness. This is desirable in all volunteer
studies but many researchers would not insist on the presence of a witness
when the trial is of possible therapeutic benefit to the individual involved.
3.7.3 How to avoid asking consent of some of the patients

When a new experimental treatment is to be compared with an acceptable
routine treatment it can be argued that only the patients receiving the new
treatment need give consent. The usual trial design requires consent to be
obtained prior to randomisation but Zclen has suggested that randomisation
can precede informed consent so that only those allocated to the new treatment
are asked to consent [34]. There may well be a circumstance in which this
strategy is desirable; however, the approach is impossible in double-blind or
single-blind trials. Moreover, Zelcn’s suggestion may be unsatisfactory if the
new treatment proves to represent an important new advance: the patients
who have benefitted will have consented to take part but not those who have
fared badly. Most important, however, is the fact that those who do not wish
to take part will fail to receive the trial treatment in the consent group but will
receive it in the no consent group. Patients who do not receive the treatment
cannot be excluded; otherwise, the two groups may be dissimilar for impor­
tant characteristics and one major purpose of randomisation will be lost. Anal­
ysis has to be conducted on all randomised patients on the intention-to-treat
principle (section 15.7). However, the effect of the new treatment may be
diluted by the results in patients who do not receive this treatment.
3.8 PLACEBO TREATMENT

The reasons for using placebo drugs and the methods for using such drugs arc
discussed in section 8.10. There are two circumstances where it is ethical to
employ placebo medication; when no effective treatment is available for a

particular condition and when, if such treatment is available, it can safely be
withheld for a certain period.

3.8.1 No effective treatment has been identified
A placebo cannot be employed if there is definite evidence that withholding
standard treatment would be detrimental to the patient’s health. Beecher [35]
discussed a trial of treating streptococcal respiratory infections in which
placebo was given to 109 men while benzanthine penicillin G was given to the
others. No patient treated with penicillin developed cither rheumatic fever or
acute nephritis. However, three patients developed these complications when
given placebo. Beecher considered that at the time the trial was performed it
was known that penicillin prevented rheumatic fever, and therefore the use of
a placebo was unethical.

3.8.2 Use of a placebo for a short period
when active treatment is known to be required
We can agree that it is unethical to withhold necessary treatment. However, a
placebo may have a powerful pain-relieving effect and constitute acceptable
treatment under certain circumstances. Beecher [36] reviewed the effects of
placebo in severe postoperative wound pain. Four studies used an injection of
saline as a placebo and satisfactory pain relief was achieved in about a third
of patients. Similarly, placebo treatment produced relief from angina pectoris
in a similar proportion of patients (section 8.10).
In patients with severe postoperative pain, we can argue that placebos
should not be used, as active drugs such as morphine arc available. But argu­
ments for using a placebo in this situation can be advanced. A proportion of
patients achieve pain relief from placebo and they do so without the adverse
effects associated with active pharmacological agents. Although it would be
unethical to withhold an active analgesic for a prolonged period, administra­
tion may be delayed for a short interval, say, 15—30 minutes following a
placebo injection. If pain relief is not achieved at the end of this period, active
treatment can then be given. Many patients will agree to wait a short period to •
help in evaluating a new drug. Care has to be taken in a double-blind study of a
new drug against placebo that there is no likelihood of an adverse effect if the
new drug is ineffective and is closely followed by an active drug such as
morphine.
With less severe degrees of pain the ethical problems arc reduced and it is
even more necessary to employ a placebo to assess a possibly active mild
analgesic. Many patients will respond adequately to a placebo and not all
patients receiving active treatment will experience pain relief. For example, if
the improvement rate is 33 percent with placebo and 50 percent with an active
treatment, the use of a placebo is necessary to confirm a superior effect for the
active compound. Without knowledge of the placebo response rate, a 50 per­
cent result could be a nonspecific response to an inactive compound.
Placebo treatment has also been employed for short periods in trials in

J

chronic diseases when active treatment is known to be required m the long
term An example is provided in hypertension where placebo treatment m the
long term is justified for patients with mild hypertension as the benefits of
active treatment have not been established. However, active treatment is
known to be beneficial in preventing cerebrovascular events in young or
middle-aged patients with moderate or severe hypertension. Yet placebos arc
prescribed when immediate treatment for heart failure, renal failure, or malig­
nant hypertension is not required. Placebo treatment is traditionally employed
in two broad circumstances.

1 The first is when the patient has not received antihypertensive treatment in the
past. Antihypertensive treatment is not usually started the first time the patient
sees a doctor. The physician may wish to confirm that the blood pressure is
elevated on a second or third occasion and may require certain investigations
to be completed before commencing treatment. It is therefore reasonable to
give placebo treatment during this period of observation and possibly to ex
tend the interval to, say, four to six weeks. It must be appreciated, however,
that the preventive effects of antihypertensive treatment are being denied the
patient during this period. Although the risk of a cerebrovascular event occur­
ring during a short interval is low, it still constitutes an ethical problem, fhe
theoretical risk can be calculated from the Veterans Administration trial of
antihypertensive treatment [37], Male patients less than 60 years old w.th an
initial diastolic blood pressure of 105-114 mm Hg experienced, on average a
0.058 chance of a mortal or morbid cardiovascular event per year of placebo
treatment. If they were given active treatment the chance of a cardiovascular
event was 0.023 per year. The excess risk of being on a placebo was therefore
0 035 events per year and the probability of a mishap with placebo treatment
may be 0.003 per month. If 220 such patients receive six weeks of placebo
treatment in a trial, the investigator may expect one adverse event. Ihese
events include stroke, myocardial infarction, heart failure, and the retina
changes of accelerated hypertension. Many trials have subjected patients to a
one in 220 risk of one of these events.
2 The second circumstance is when placebo treatment interrupts a period of active
treatment. The risks of taking placebo treatment may be the same when a
period of placebo treatment interrupts active treatment as when it precedes it,
and many patients on treatment have been entered into trials that incorporate a
period of placebo administration. Again, for a six-week period we must ask if
the patient will accept a one in 220 chance of an adverse cardiovascular event.

3.8.3 The use of sham operations
The immediate reaction among physicians is to consider all sham operations in
man as unethical. Bradford Hill considered that it would not have been rea­
sonable to use placebo injections as a control in the Medical Research Council

(MRC) trial of streptomycin [3|. In this trial the control patients would have
suffered a considerable amount of discomfort from repeated injections of
placebo. Admittedly, if injections alone can have a life-prolonging effect inde­
pendent of the substance injected, then the MRC trial was biased in favour of
streptomycin. However, such a strong placebo effect was unlikely and could
not justify inflicting so much discomfort on the control patients.
Before dismissing sham operations, however, we must consider a trial discussed by Bccchcr |38|:
In 1939 it was suggested, in Italy, that the pain of angina pectoris could be greatly
lessened by ligation of the internal mammary arteries. Eventually this suggestion was
adopted in the United States and quite spectacularly favourable results were obtained.
Not only were the objective results impressive, the patients said they felt better and the
objective evidence supported this: there was great reduction in the number of nitroglyc­
erin tablets taken, and exercise tolerance was greatly increased. Several individuals [3942 j began to wonder if this might not be a placebo effect. They therefore went to their
patients, explained the situation, and told them they would like to carry out a study in
which the patients would not know what had been done, nor would the observers
know until the study was completed. They told their patients that half of them would
have the internal mammary arteries exposed and ligated and the other half would
simply have them exposed, but not ligated. These studies were carried out . . . ligation
had no real effect beyond that of a placebo effect.

Bccchcr thus argued very persuasively that a sham operation can be ethical
even though the control patients suffered an anaesthetic and much discomfort.
Many patients would not agree to take part in such a trial.

i
i

i-

I.
i!

i
!

3.9 SELECTION (EXCLUSION OR INCLUSION) CRITERIA

Exclusion and inclusion criteria arc the two sides of the same coin. A trial
confined to young patients may be said to exclude patients above the age of 60
or include only patients below the age of 60.
Two objectives arc met by using these criteria. First, only those patients
who arc intended for study arc entered into the trial. The results of the trial are
then only valid for a similar group of patients; this concept will be discussed
further in chapter 5. The second reason for having selection criteria is ethical.
Patients must be excluded from a trial if inclusion in the trial may produce
adverse consequences for them.
Table 3-1 gives the selection criteria for a placebo-controlled trial of antihy­
pertensive treatment in the elderly, being conducted by the European Working
Party on Hypertension in the Elderly (EWPHE) [43]. The selection criteria are
rearranged into criteria defining the group of patients to be studied, and crite­
ria excluding patients from the study who should not be included for ethical
reasons. The trial involves the random allocation of patients either to five years
of active treatment or five years of placebo treatment. The selection criteria
therefore exclude patients who should not receive a placebo for five years. At

:i

;!

I

Si

Table 3-2. Withdrawal Criteria for the European
Working Party on Hypertension in the Elderly (EWPHE) trial.
Selection criteria defining the group of patients to be studied

I. Aged more tha
placebo) above 160 mm Hg.
2. Systolic blood pre
Hg___
(on placebo) above 90 mm Hg.
3. Diastolic
blood pressure
pi
4. ’Patients give their informed consent.
5. Regular follow-up possible.
isessed by pill count.
6. Compliant with medication as as;
to
suspect
secondary
hypertension
7. No reason to
diseases unrelated to
I hypertension (e.g., carcinoma).
8. No severe life-threatening
1...
Selection criteria included for ethical reasons
9 Systolic blood pressure (on placebo) not above 239 mm Hg^
10 Dhstolic blood pressure (on placebo) not above 119 mm g.
II. No history of accelerated or malignant hypertension.
12. No congestive heart failure.
0.
0/x
13 No severe renal failure (scrum creatinine > 2.5 mg /o).
cnccphalopathy.
14. No previous history of a haemorrhagic stroke or hypertensive
15. No history of dissecting aneurysm.
16. No previous history of gout or scrum uric acid > 10 mg .
17. No acute hepatitis or active cirrhosis.

Withdrawal criteria which are end-points for the trial and not ethical considerations.
1. Completion of five years follow-up.
2. No follow-up for more than six months.
3. No trial treatment for more than three months.

Withdrawal criteria for ethical reasons.
(In parentheses—the corresponding numbers for the selection criteria)
4 Systolic blood pressure rising by 40 nun Hg or exceeding 250 mm Hg on three visits (9)
1 'Diastolic blood pressure rising by 20 mm Hg or exceeding 130 mm Hg on three visits (10).
6. Development of accelerated or malignant hypertension (11).
7 Development of congestive heart failure (12).

8 Serum creatinine increasing by 100% or above 3.9 mg /o on two occasions (13).
9’ Development of cerebral or subarachnoid haemorrhage or hypertensive encephalopathy (1 ).

1“ IS2««-.I..,.. l.toto>
12 A^O^
cardio-thoracic ratio as measured on a chest radiograph.
13. Any reason why continuation in the trial would be detrimental to the patient s interest.

may continue in the trial taking methyldopa. Similarly, the development of

the time of initiating the trial (and at the tune of writing), there was little or no
u
rhot the elderly hypertensive patient would benefit from antihypcr
tensivTtreatment. However, it was considered undesirable to include patients
with very high levels of blood pressure (criteria 9-10). Similarly, hypertensive
patients known to require treatment were excluded, for example, those with
accelerated or malignant hypertension (criterion 11), those with conges ivc
heart failure, and those with conditions that would possibly benefit fro
those who
treatment, such as patients with renal impairment (criterion 13)
had previously suffered a haemorrhagic stroke (criterion 1 ).
Xion catena must exclude not only those patients who would suffer

Srf by lhe active treatment, employed in the trial (hydtochlotothui.de with
triamterene; and mcthyldopa).
3.10 WITHDRAWALS FROM THE TRIAL

If a patient with a certain condition cannot enter the trial for ethical reasons,
theV'he should be withdrawn from the tnal if he develops the condition.
Withdrawal criteria should therefore be the same as exclusion criteria. Table
TXeX withdrawal criteria for the EWPHE trial. Criteria 1-3 are e

pomts for the tnal and not ethical considerations. Criteria 4-10 correla w th
the exclusion criteria given in table 3-1. The criteria .n the wo table are
cross-referenced in table 3-2. Selection criteria 16-17 (table 3-1) do. not
_.L_J ’ini table
their counterparts in the ’withdrawal1 criteria
t------ 3-2, as the development
.
the discontinuation of diuretic treatment, but the patient
of gout may lead to t------------

liver disease may lead to slopping mcthyldopa, with the patient continuing to
take a diuretic and remaining in the trial. Withdrawal criteria 11 and 12 have no
counterpart in the selection criteria but indicate that the patient is not progres51 "Howcver'car'cfully a trial is designed, and even after the completion of a

pilot trial there will still be patients whose continuation in the trial would be
against their future well-being. Criterion 13 allows for these unforeseen con­
tingencies and is a necessary statement in any trial protocol.
3.11 DECISION RULES FOR STOPPING THE WHOLE TRIAL

In a short-term trial, the patients are usually entered into the trial quickly and
alysed. The exception to this rule
the trial completed before the results arc analysed.
is the sequential trial where a decision is made whether or not to continue with
the trial as the individual results become available (section 11.7). When patients
are followed for several years or when recruitment persists for many years, the
opportunity exists for interim analyses to be made. For ethical reasons the tnal
must be terminated if an interim analysis demonstrates a statistically significant
and important adverse effect of treatment or a significant benefit from treat­
ment If interim analyses fail to reach these end points the trial will be ter­
minated when the intended number of patients has entered the trial or the tnal
participants and organisers run out of time or money (section 14.6).
3.11.1 Problems with significance testing in interim analyses
Care has to be taken that the overall level of significance of a trial is not
reduced by the repeated analysis of results. These interim an alyscs arc somc-

times termed repeated looks. If a statistical test is repeated on several occasions
on increasing data and five percent is taken as the level of significance to be
achieved, then after the first test the probability of a falsely positive result is
five percent. After two tests this probability rises to almost ten percent. After
13 tests, the chances of a falsely positive result is almost 50 percent.
McPherson [44] has calculated that ten interim analyses with a decision rule
to stop the trial if the level of significance exceeds one percent is equivalent to
an overall level of significance of five percent (see section 10.7.3). In other
words, if the trial must be stopped for an adverse effect significant at the five
percent level and ten interim analyses arc planned, then the result of an interim
analysis must be significant at the one percent level to stop the trial.

3.11.2 Terminating the trial when an adverse effect of treatment is observed
Treatment with either conjugated oestrogens or dextrothyroxine in the Coro­
nary Drug Project Trial [45] had to be terminated owing to the adverse effects
of these drugs. Similarly the University Group Diabetes Project (UGDP) trial
was stopped owing to the adverse effects of phenformin and tolbutamide [46]
(see section 19.6). In both trials the groups treated with certain drugs fared
significantly worse than the placebo-treated groups and the trials of these

drug effects. Large trials arc designed to estimate the efficacy of treatment but
careful attention must be paid to possible toxicity (chapter 18).
3.12.1 When may a trial to detect both efficacy and toxicity be desirable?

When a single trial reports a benefit from treatment, it is often desirable to
repeat the trial and ensure that the benefits can be demonstrated for different
patients and on another occasion. However, when a trial detects toxicity, it is
ethically impossible, although scientifically desirable, to conduct a trial to
confirm an adverse effect of treatment. If there arc doubts about whether or
not there was any serious toxicity in the earlier trial, and the efficacy of the
treatment is thought to be high, then possibly a second trial of benefit can be
mounted. The question is of some practical importance. For example, in the
University Group Diabetes Program Trial [46] oral hypoglycacmic agents were
associated with an increase in cardiovascular mortality. However, when a
patient cannot adhere to a diet these drugs may relieve the symptoms of
hypcrglycaemia. It may be reasonable to reassess efficacy in these patients and
in view of the criticism levelled at this particular trial (section 19.6), the trial
could be repeated. I would be reluctant to take part in a trial where toxicity
may be a disadvantage not counteracted by important gains from therapy.

active treatments were terminated.
3.13 CONCLUSIONS

3.11.3 Terminating the trial when a
statistically significant benefit is observed

The Veterans Administration trial of antihypertensive medication provides a
good example of a trial’s being terminated when an interim analysis provides
evidence of a benefit from treatment. Patients were entered into the trial when
the diastolic blood pressure ranged from 90-124 mm Hg while taking a
placebo and they were randomly allocated to receive either active or placebo
treatment. After an average of 18 months’ follow-up the trial was stopped for
patients with an initial diastolic blood pressure greater than 1 14 mm I Ig [47],
as the patients receiving active treatment had fewer cardiovascular events than
those receiving placebo (P <0.001). I he trial was continued for patients with
an initial diastolic blood pressure of 90-114 mm Hg [37]. Another interesting
example comes from the Anturanc Rcinfarction 1 rial [48] (section 19.1). In
this trial a significant benefit from treatment was observed in an interim analy­
sis and recruitment to the trial was stopped. However, the patients already in
this double-blind trial were advised of the results of the interim analysis and
asked to continue in the trial. Nearly all agreed to continue and the final
analysis showed a similar benefit.
3.12 TRIALS TO DETECT TOXICITY

It is unethical to design a trial to detect toxicity. However, as discussed in section
18.2, large long-term trials have resulted in the detection of unexpected treat­
ment toxicity, although some large trials have failed to detect rare adverse

This chapter reviews the ethical requirements in the design and conduct of
clinical trials. Declarations of ethical principles have been reviewed and the
place of research ethical committees considered. Emphasis was placed on the
importance of obtaining informed written consent and a distinction has been
drawn between trials of possible benefit to the participant and trials involving
volunteers who cannot expect an improvement in their health from participat­
ing in the trial. The investigator must remain convinced that none of the
available treatments offer a clear advantage and this is especially important
when placebo treatment is to be employed. Provided the investigator is
genuinely in doubt as to the best treatment, he can explain the situation to
potential participants and ask them to enter the trial. He may even ask himself
the standard question, “Would I allow a member of my family to enter the
trial?” Even if he can answer yes to this question, the public must be protected
from a small proportion of eccentric enthusiasts; research ethical committees
should provide this safeguard.
Large trials should incorporate an ethical committee in the administration
that is independent of the investigators and rules on whether any observed
toxicity is acceptable, when the trial should be terminated, and whether to
make any changes in the criteria for entry or withdrawal from the trial.
If a trial shows one treatment to be superior, patients who received the
inferior treatment may have suffered as a consequence. Trial designs that limit
this problem arc discussed in sections 11.7 and 11.8 and the ethical disadvan­
tages of randomised controlled trials arc summarised in chapter 20.

its importance, the likelihood that it will be achieved by treatment, and the
ease of measurement of the chosen end point.

4.1.1 The importance of the objective
4. THE OBJECTIVES OF A RANDOMISED CONTROLLED TRIAL

Systolic blood pressure may be easier to measure than diastolic pressure but
the investigators may decide that diastolic pressure is more important in deter­
mining the future health of the patient. Similarly, a death from myocardial
infarction may be more important than the occurrence of an infarct from
which the patient recovers. Mortality is usually a more important end point, '
than morbidity and total mortality is a clearer measure of outcome than mor- I
tality from one specific cause. It must be noted that a treatment may reduce
one cause of death and increase another.
4.1.2 The likelihood that the objective will be achieved by treatment

An antihypertensive drug may reduce both stroke mortality and total mortal­
ity. However, the proportional reduction in deaths may be expected to be
greater with stroke mortality. A trial with stroke mortality as its end point is
therefore likely to reach a conclusion more quickly than a trial to detect a
reduction in total mortality. The end point of stroke mortality is to be pre­
ferred in this example for ethical reasons and efficiency. However, total mor­
tality has also to be considc red and this is discussed under the decision rules for
stopping a trial (section 14.6).
A trial may be conceived to test more than one hypothesis but it is good
practice and usually essential when calculating the numbers required for a trial
to determine one major objective. For example, an investigator may be inter­
ested in a trial of a new antihypertensive drug in elderly patients. The major
objective could be either to demonstrate the efficacy of the drug in lowering
blood pressure or in preventing cardiovascular deaths. The first objective
could be answered in a few patients studied for six months, but the seconc

4.1.3 Ease of measurement of the end point

objective would require the study of hundreds of patients over many years
(chapter 10). In order to calculate the numbers required for a trial, the major
objective has to be identified and the smallest effect of treatment to be detected

It may be easier to measure systolic blood pressure than diastolic pressure and
this measurement may have greater repeatability. Similarly, the fact of death is
easier to determine than a particular cause of death and a cardiovascular event
may be particularly difficult to ascertain. If the patient dies it has to be decided
whether or not sudden death should be regarded as a cardiovascular death,
and, if so, how quickly must the patient die to be considered as a sudden death.
Also, if the patient survives a myocardial infarction, the diagnostic electrocar­
diographic or enzyme changes have to be agreed in advance.

must be defined.

4.2 WHAT CHANGE IN THE END POINT MUST BE DETECTED?

4.1 IDENTIFICATION OF THE MAJOR OBJECTIVE

We must distinguish between biological and statistical significance, consider
whether the end point is a continuously distributed or qualitative variable,
and, if qualitative, whether or not the end ppint occurs frequently or rarely.

The major objective will involve the detection of a change in a particular end
point. In the first example given earlier, where the effect on blood pressure has
to be determined, the end point of interest could be diastolic, systolic, or mean
pressure. If the effect on mortality or morbidity has to be determined, total
mortality, total cardiovascular mortality, stroke mortality, total cardiovascu­
lar events (cither fatal or nonfatal), or stroke events could constitute the major
end point of primary interest. The investigators must determine this end point
at the outset of the trial and also decide the amount by which it should change.
In the definition of the major objective the investigators must take into account

4.2.1 The distinction between biological and statistical significance

The distinction has to be made between what is statistically significant and
what is biologically important. It may be observed that an antianxiety drug
lowers systolic blood pressure by, say, 2 mm Hg and this result, given a large
number of patients, could be highly statistically significant. However, the
biological importance of the result would be small and the drug would not be

patients. We shall consider the infrequent and frequent end point in more
detail.

Table 4-1. Average results and standard deviations for the results of a
survey on 634 London Civil Servants aged 35-64, 34% of whom were female.

Systolic blood pressure
Diastolic blood pressure
Blood haemoglobin
Blood glucose

Scrum cholesterol
Scrum urate
Serum creatinine

(mm Hg)
(mm Hg)
(gm/100 ml)
(mmol/1)
(mmol/1)
(mmol/1)
(mmol/1)

Mean

Standard
Deviation

Biologically
Important Change

133

20

-10

(-8%)

82

13

14.5

- 8
+ 2

(-10%)

1.1

5.5

1.4

- 1

(- 18%)

6.3

1.0

- 0.6

(- 10%)

0.32

0.07

- 0.05

(-16%)

93

15

-15

(-16%)

(+14%)

used as an antihypertensive agent. In hypertension, a drug is useful if it lowers
systolic blood pressure by more than 10 mm I Ig. The objective of the trial
would therefore be to test the hypothesis that the drug lowers systolic blood
pressure by 10 mm Hg (or conversely, the null hypothesis that the drug docs
i

not do so).

4.2.2 Changes in continuously distributed variables
What sort of changes in continuously distributed variables arc of biological
importance? Table 4—1 gives some biochemical and other results from a
screening of London civil servants [49]. The table also gives some very arbi­
trary suggestions for changes that could possibly be produced by treatment
and be considered biologically important. The suggested changes arc of the
order of ten and 20 percent or about one standard deviation. For example, a
reduction in systolic blood pressure of 10 mm Hg; an increase in haemoglobin
of 2 gm/100 ml and a reduction in blood glucose of 1 mmol/litcr could all be
considered biologically important.

4.2.3 Changes in qualitative end points
When defining a change in proportion, we may have more difficulty in iden­
tifying a biologically important effect. For example, if we arc considering a
reduction in mortality it could be argued that any reduction, however small, is
important. On the other hand the cost and adverse side effects of treatment
may negate small benefits. In addition, with treatment to prevent an uncom­
mon event, small benefits become less acceptable. It has been suggested that
ten individuals with mild hypertension would have to take antihypertensive
medication for 20 years to reach an even chance of avoiding one cardiovas­
cular event. If the probability of an event is low, a treatment used for preven­
tion must be highly effective, whereas a patient with an incurable illness will
be interested in a trial treatment that offers a cure only in a small percentage of

4.2.3.1 The end point occurs infrequently
Cardiovascular events, though not rare in the general population and common
in patients with a previous history of cardiovascular disease, may occur very
infrequently during a controlled trial. A trial of secondary prevention is in­
tended to prevent a recurrence of a condition and is to be contrasted with a
primary prevention trial intended to prevent the condition initially. Even with
a secondary prevention trial of myocardial infarction, fewer than 15 percent of
patients who leave hospital will die over a onc-ycar period. For the purpose of
this discussion such events will be considered infrequent. Table 4-2 illustrates
some trials that have suggested a benefit in preventing cardiovascular disease;
indicates whether or not the authors of the trials have suggested the treatment
be adopted; and considers whether the benefits have been accepted and the
findings implemented by the medical community at large. The trials included
in the table include the Veterans Administration Cooperative Study Group on
Antihypertensive agents |37, 47] and trials designed to test sulphinpyrazone
(Anturanc), aspirin, elofibrate, and anticoagulants in the primary and second­
ary prevention of ischaemic heart disease. The 93 percent and 77 percent
reductions in morbid or mortal events in the Veterans Administration antihy­
pertensive agents trial have been accepted and acted upon by the medical
profession. However, when treatment in this same trial produced only a 33
percent reduction in cardiovascular events, the results were not widely ac­
cepted and many physicians do not treat a diastolic pressure of 90-104 mm
Hg. Also, although it can be argued strongly that a 20 percent reduction in
deaths is worthwhile, the medical profession has not considered such benefits
warrant the expense and difficulty of long-term treatment. Anticoagulant
treatment has therefore fallen from favour and the use of sulphinpyrazone
(Anturanc), aspirin, and elofibrate treatment has not been widely accepted. It
must be admitted that inconsistencies in the data have not helped. For ex­
ample, anticoagulants have little effect on the death rate in women and treat­
ment can only be provided for men. Similarly, total deaths were increased by
elofibrate and it would not be acceptable to employ this treatment.
Clinicians tend to ignore small benefits, and when the event rate for a disease
is low the patient may also be unwilling to take treatment to reduce a small risk
by only, say, 20 percent. The patient will be more interested in therapy that
almost guarantees freedom from the disease.

4.2.3.2 ll7hen the end point for the (rial is very frequent
If a condition gives rise to a high mortality, as with certain cancers, both the
doctor and patient may be interested in a drug that reduces mortality by less
than 20 percent. A patient faced with no hope of recovery may be pleased to
accept a one in five chance or less of survival.

I.

i!
!

I
I

l|

Table 4-2. The results of several large trials to detect a reduction in cardiovascular disease.
The benefits are listed, together with the authors’ recommendations and subsequent use of the treatments.
Did authors
recommend
treatment?

Has the treatment
been subsequently
widely used?

77%
33%

Yes
Yes
Yes

Yes
Yes

Trial

Reduction

Primary prevention

End point

Veterans Administration
Co-operative study
group on Antihyperten­
sive agents [37, 47]
Diastolic 115-129 mm Hg
Diastolic 105-114 mm Hg
Diastolic 90-104 mm Hg

Morbid or mortal
events

Clofibrate [50]

IHD incidence
Nonfatal MI
Fatal IHD
All deaths

20%
25%
(8% increase)
(22% increase)

No
No
No
No

No

Anturane [48]
Aspirin [51]

Sudden deaths (males)
Total mortality

43%
25%

Yes

?

No

No

Anticoagulants [52]

All deaths (males)
All deaths (females)

20%
8%

Not clear

No

No

No

93%

No

Secondary prevention of
myocardial infarction

IHD = Ischaemic heart disease; MI = Myocardial infarction.

tj —

£

o

iJ I-S &
=■ 2 § § 5 I5

¥
Is5 r=
hiiH
=1
2

■=:

§ go s -■ S
l-< 5-< S- S'

3?
1
is- s

IJ

= 3 i
2. =

h

O'
B i3i g

ii

r
=■

I

I

t

I

o M’

d5 —

H
!i

41
h

H

—t

SJ

3

cr

*'B-p£1
X
' '
m

3’



2- °
n

q
O>'CP

2 2

09

&

■>

a-

itih
^2 IS
i
=
F

M

5'
-

O

5s

=S

I

r
<
X
a?

0J

I
•<

O 2

2 tri
3 X

o 3
o o

si

1 If

r
I

3t

B’ ?? 3 3 3
" 2 =

31

»VI

O
8

ir °
jj

Q

W

O s B
3 3='

2
fl 0

a.

s
5

3 = 5 Fl'
7Q ®

o

It
Iz
i=

c?

. ° ^ r?2 0
o
8. 5 2
— 3 <3 2. <
w 2
o 3 —
N

O

2Z_J

“■ 09

h

r
7

O

SJ

K
o

3
CT
CO



-t

3
co
Q- o

3
3

co

co

-t

*
O

a-

n £ Q-

o
Q- n

c

□- <’ 5’

8 ° to

1

n
Q-

Cl

I

o

O

n

Ci

S .
o -2 Q

3
2 h
7'
-

s H- g § S'

O

2 3 « -

Q

!7)

-o

n

5o

>

fi 5z

H

I ^5-75g- gS .2 =. .s- £ S t o
O

3 P
— ■ ci

o

2 2
■I 8
W

n

Cl

a|
cu

=

O

— =
’ 3 8
aS a 8a. a.
<
o p s;

Cl

-ET

3- 2. 3

OQ

~

aj

c

□ cr

Q

5
§ 5
O

r-r

= O S

-• :
tz’ 2*

p so

_.



S’

-t
— TJ
Cl


-t
O

*<

' '

—•

aH..= 1: J3 I Hi

j p § S
3 —.

2. 3
P- r-r

<: ”*
~ a

-

Q

n

2
cr =. B K
Q
<

Fb

M

S’

iI
3. 1 </>
o- p F
- □. 2
3 S
aS' ? o

2
_ g2
“ (/>—

<

22. 5
OJ

-r;

O — §
"2oB?^'d.o5i;2o
o5ci
3
aP ST
5“ =
TJ
CO
r-r
_
- a 8
" aj
M
r-r
t/>
r-r

■(/)

1 “ 8 5 = § 7

an
2

°SsS3 S“■ 3oXS<. O a s. q M s: 5
r-r
CJ
Q_? - = s'
3 S'
S'"
Q-

O

r

co

3’

~"CS

2

Q

<—T

dq

— 2
o CTO

c/>
"■
3

?'
tU
—■


C/1

_* r
~

co
8’

VI
M

o

c
3


£- 3“

2

o
o

_
Cl

7 S* K •

r-r

d
O

-■
-■
-


oj

S-5 5 t =

Ci

““
—t

!3

Sts a

rt

co -t
— 8
ra -?
-TO co
“3
-o g _Q
Cl
p n o -.
_
< cn 3

-I
Q
ci

•n
2

z
? £ 0
2 is
fl

w

Q-

2 a
s Si*
q

Q*

Oi
O

2 *=



rt
Q

O

15 d i

a*.

2.°

tE B'lJ § | B
§ s.4 o’ S-i c?2 §S '>B

P C

o g TO

o

2

■•S' 2

DJ

71
r-r

Q- O



Cl

o

n
<
ffl
(/)

2 P 3“ 2
6’ rr.
“*

O

O

3 ^.
~ O

=•

qn
p O
n
= 2
0
£ ?q
2^
3
£
s: 2
n
2 cr nn
8 q “ <n q*
Cl

Ci

M

1- 8 3

I i
iz° 3 q
3- s s- & 2

rt

O
-t

3
C/1

3

3.

n

Q

8 o
5'’ 2 3_
=. *3
TJ

o'
n

O’

2
O CT 3
o •
co
— NJ - 23^ '-r
C
3

->
p
¥

2-. " 3 OSL o.
co


Vi

-

O



Q-

OJ

2- 8"~1

Cl

CL
—I
~

vi
r-r
a-


ii2

=.
M
Cl

g

co

=

^5 “2
Cl

'Z.

0*
5

-

0)

5

2 s? S- s’
3

cr 2, —> n

o
O

2 o
3
22J <2.

s-

hif
Lar 8
co

Q-

Ci
Q-

fl

significance to be achieved and, in the event of the effect not being demon­
strated, the confidence with which it is excluded (power).
The essential minor objectives in this type of trial will include an assessment
of both total mortality and any serious but nonfatal adverse effects of treat­
ment. The objectives can be illustrated by considering again the European
Working Party trial of Hypertension in the Elderly (EWPHE) [43]. The major
objective was defined as a reduction in stroke events (mortal plus morbid
events) of 50 percent, with a level of significance of 5 percent and a power of 90
percent. Certain minor objectives listed in table 4-3 have been studied and
resulted in a description of the natural history of untreated hypertension in the
elderly and an examination of the changes in cardiac and renal function with
increasing age in these subjects [53]. In addition, the biochemical changes with
active treatment were estimated and a reduced glucose tolerance in the group
treated with a diuretic reported [54]. Other minor objectives listed in table 4-3
include the interrelationship between the condition under treatment and other
diseases, the side effects of treatment, and items of general interest such as the
factors influencing compliance with treatment or default from follow-up.

5. VALIDITY OF THE RESULTS

4.3.2 Factors limiting the investigation of minor objectives
The greater the number of objectives, the greater will be the complexity of
conducting the trial. A trial with many minor objectives may require repeated
biochemical and other investigations and impose a great burden on the inves­
tigators and the subjects, both of whom may be required to make more time
available for the trial. Additional expenses will also be incurred and clerical
duties increased. However, without answering certain important questions, it
may be difficult to assess the results of a trial.
4.4 CONCLUSIONS

It is very important to specify a major objective for a trial. This must include
the most important end point to be measured, the size of any treatment effect
that has to be determined, the significance with which the effect must be
observed, and the power of the trial in the event of a nonsignificant result
being reported. Armed with this information and an estimate of the variance of
the end point under consideration, the numbers required for the trial can be
calculated (chapter 10).
Minor objectives may have to be specified in order to fully appreciate the
outcome of the trial. Some will be necessary for this purpose and others
incidental to the main objective of the trial. How many minor objectives arc
defined will depend on the resources available, but the collection of a large
amount of information during the course of a trial may hinder its successful
completion.

Having defined the objectives for a trial, we must consider the validity of any
results wc may obtain. One dictionary definition of validity is “so executed
etc. as to have binding force” [55]. A trial result may not have binding force
when executed incorrectly or when the trial provides a certain result but the
interpretation of that result is incorrect. An incorrect interpretation can arise
when the trial reveals a particular result and the investigator jumps to a further
conclusion or when he decides that the same result will be true in different
subjects.
5.1 THE TRIAL SHOWS THAT THEREFORE THIS MUST BE TRUE

We shall consider a hypothetical trial that provides an example of a non
sequitur. The trial is designed to determine whether or not a particular phar­
maceutical agent can stop cigarette smokers from indulging in their habit. The
trial shows that the drug stops smoking in a significant proportion of smokers.
However, the authors believe that smoking causes heart disease and conclude
that the drug will reduce total cardiovascular mortality. This docs not follow
because when a causative factor is removed the previous adverse effects may
persist. A different trial is required to show that stopping smoking prevents
cardiovascular death and, in fact, one trial of antismoking advice showed that
stopping smoking was associated with a reduction in respiratory symptoms
but not total mortality [56].
Similar examples in the cardiovascular field arc provided by trials showing

that antihypertensive drugs lower blood pressure. Such trials do not prove that
the drugs reduce stroke mortality. The results of a trial must not be ex­
trapolated beyond the observations. It is not valid to conclude that a treatment
has wider effects than those observed.
5.2 THE VALIDITY OF THE RESULTS FOR
SUBJECTS OF A DIFFERENT AGE, SEX, OR RACE

The response to a treatment may vary between men and women, alter with
age and sometimes differ among the races. We cannot assume that the results
of a trial including, for example, only young men are applicable to elderly
women as well. This mistake commonly arises when trials arc conducted on
laboratory staff (these persons tending to be young and male); an example of
this problem is often provided when the clinical pharmacology of a new drug

is determined on volunteers.
5.3 THE VALIDITY OF THE RESULTS FOR OTHER
PERSONS OF THE SAME AGE, SEX, AND RACE

Even after allowing for age, sex, and race the selection of subjects into a trial
makes it difficult to apply the results to even a superficially similar group. With
patients, selection may favour those with either a particularly severe or mild
form of the disease. Also, patients taking part in clinical trials arc often more
compliant with therapeutic advice than patients subsequently offered the treat­
ment. Lastly, a trial result may occasionally depend heavily on a subgroup of

subjects.
5.3.1 The severity of the disease
The Veterans Administration Co-operative Study Group on Antihypertensive
Agents trial [47] admitted only men who were patients attending Veterans
Administration hospitals; these men appear to have had moderate to severe
hypertension. First, the level of diastolic blood pressure leading to entry to the
trial was from 90-129 mm Hg after four to six days resting in hospital. This level
of hypertension had also to be observed in an outpatient department after a
two-to-four-month period of placebo medication, at which time the pressure
was designated the untreated pressure. In addition the diastolic blood pressures
were measured at the point of disappearance of the Korotkoff sounds, not the

point of muffling of these sounds.
A patient in a doctor’s office with a similar level of untreated pressure may
not be comparable to those subjects in the Veterans Administration trial. Fie
may prove not to have hypertension after hospital admission or he may re­
spond to prolonged placebo therapy. Also, the point of muffling of sound may
be used to determine a (higher) diastolic pressure in the doctor’s office. A
patient with a casual diastolic pressure of 100 mm Hg may therefore be equiva­
lent to a patient with diastolic blood pressure of only 90 mm Hg in the
Veterans Administration trial. If the doctor remembers to allow for the possi-

Table 5-1. Criteria for withdrawal from the Veterans Administration Co-operative
Study Group on Antihypertensive Agents trial during the placebo run-in period.

1. Failure to appear for a regularly scheduled clinic appointment
2. Failure of the urine to contain the prescribed placebo
3. Failure on a tablet count
a. Over 10% too many tablets left
b. Five percent or more too few tablets left

blc effects of hospitalisation, placebo, and different measurement techniques
he must also recall that 58 percent of the Veterans Administration patients had
been categorised as having a preceding cardiac, central nervous system, or
renal abnormality [57]. The doctor must ask himself how valid arc the results
of the trial for the patient confronting him in his office.
5.3.2 Subject cooperation and compliance
Those entering a trial tend to be cooperative and willing to make several visits,
undergo investigations, and sign consent forms. They arc probably more
likely to adhere to therapeutic advice than the average patient. I he adherence
to advice has been termed conipliaticc and a trial result may not be valid for a
more representative group of patients including a high proportion of noncompliant individuals. In addition, a trial may be specifically designed to exclude
noncompliant patients. Table 5-1 gives the withdrawal criteria for the Veter­
ans Administration trial discussed above. Nearly half of the eligible patients
were excluded for not attending a clinic appointment, or for not having any
placebo marker in their urine, or for failing to produce an approximately
correct number of remaining placebo tablets. 1 he trial was a test of active
treatment in those who were prepared to take antihypertensive medication
regularly and the results arc valid for those who do so. This is not a criticism of
the trial but there is a limit to how far we can generalise from a particular
group of patients to the general population. The results for male veterans arc
not necessarily valid for women, nor arc the results applicable to patients who
usually forget to take their tablets. From the results of such a trial, it will be
difficult to estimate the response in a group of patients including noncompliant

persons. This problem is discussed more fully in section 14.4.
5.3.3 Results are only applicable to certain subgroups of patients in the trial

A difficult problem is provided by a trial with a significant overall result that
relics heavily, after subgroup analysis, on some subgroups and not others. For
example, in the Hypertension Detection and Follow-up Program trial [58], an
overall beneficial result was observed but not for the subgroup of white
women. It may be dangerous to generalise from the overall result when such
subgroup differences arc apparent.

5.4 CONCLUSIONS
The results of a trial may be invalid if the trial is not performed correctly. The
provision of adequate controls, avoidance of bias in the results, and reduction
in variability are discussed in chapters 7, 8, and 9. In addition, the results of the
trial must not be extrapolated beyond the observations made; some examples
have been presented in this chapter. The investigator must refrain from jump­
ing to conclusions that he cannot support and must not assume that the
results of his trial are valid for subjects of a different age, sex, or race, or
noncompliant persons. However, the results of a particular trial can be ex­
pected to be repeatable fora demographically similar group of subjects selected
and treated in an identical fashion. The results should be valid for such a group
of subjects.

6. RECRUITMENT OF SUBJECTS

I

I.

The recruitment of subjects must be considered very carefully to ensure that
sufficient persons with the characteristics required by the investigators arc
enrolled within an appropriate period of time. The number of subjects is of
great importance and will be discussed in chapter 10.

6.1 SUBJECTS WITH THE REQUIRED CHARACTERISTICS

6.1.1 General selection criteria

Selection criteria arc discussed in section 3.9. These criteria are required to
ensure that the subject has the condition being investigated in the trial and that
he cannot be expected to experience an adverse event as a consequence of
entering the trial. The general selection criteria will usually include the sex,

H
■!

age, and race of the subjects.

6.1.2 Criteria not usually defined in the protocol
The type of patients recruited will greatly affect the generality of the results
(chapter 5). The protocol may not state the social class of those to be recruited
and there has been considerable anxiety in the United States that clinical trials
arc conducted on “people who arc least likely to complain or who are least
likely to have the power to make their objections felt’* |59J. On the other hand,
Jeremiah Stamlcr reported that in the Hypertension Detection and Follow-up
Program (HDFP), “we elected to have a sizable group of (black) patients from
the slums of Baltimore, Md., Birmingham, Ala., and Washington, D.C.”

6.3.3.1 Physicians' notes
[60]. When recruiting patients it may be important to consider social class and
other features relevant to generality and the recruitment policy for the HDFP
trial allowed the outcome to be compared for whites and blacks.

6.2 THE PERIOD OF RECRUITMENT
The period of recruitment is often not stated in advance and this may have
severe disadvantages as the interval tends to become prolonged. Fast recruit­
ment reduces the total length of the trial, the costs involved, and the difficulty
of keeping the investigators’ enthusiasm at a high level. Even when recruit­
ment times arc fixed in advance for a given number of patients, these times can
be easily doubled. For example, recruitment to the LRC Coronary Prevention
Trial increased from one to two and one-half years [61 ]. In the Aspirin Myo­
cardial Infarction Study (AMIS) recruitment was limited to one year but was
slow at first with a sharp rise prior to the deadline. “Doctors proved to be
crisis orientated and most of the recruitment came in the last quarter of the
recruitment phase” [62].

6.3 METHODS OF RECRUITMENT

Patients have been recruited for randomised controlled trials from the inves­
tigator’s own medical practice and from medical colleagues. Suitable patients
have also been identified from medical records, from screening programs, and
even from volunteers responding to advertisements.
6.3.1 From the investigator’s own practice

Patients are most frequently selected from the investigator’s own practice.
This method may be very effective when only small numbers of patients arc
required but is rarely applicable to large-scale trials.
The investigator will already be known to the patient and hold a position of
trust. This relationship may render the patient anxious to take part in a trial in
order to please the investigator and great care has to be taken not to coerce the
slightly unwilling patient into taking part.

6.3.2 Referrals from other medical colleagues
The details of the trial must be explained to colleagues who arc not taking part
in the trial and despite a considerable amount of effort, very few referrals may
be forthcoming [61, 62]. Independent clinicians may not be motivated to refer
their patients and may forget about the existence of the trial. When practition­
ers receive fees for items of service, referral of patients to the trial investigators
may deprive the referring clinician of financial income for the duration of the
trial because many trials provide patients with free attention and medication.

6.3.3 Identification of possible patients from medical records

Physicians’ notes and laboratory records have been examined in order to iden­
tify suitable patients.

Physicians’ records were examined in the AMIS trial of secondary prevention
of myocardial infarction. The notes usually gave the entry criteria of age, sex,
and electrocardiographic and enzyme changes so that suitable patients could be
selected. Schocnbergcr [62] reported that this was a better method of recruit­
ment than physician referral and the medical profession was not opposed to
this method of selection. I lowever, only 10-20 percent of those patients in­
vited to take part actually entered the trial.

6.3.3.2 Laboratory records

In the LRC Coronary Prevention Trial, patients with type II hyperlipidacmia
were entered into the trial. Having failed to achieve many physician referrals,
the investigators asked commercial laboratories for suitable patients [611. The
laboratories appeared to divide into three groups.
1. One laboratory wanted payment for the names (this laboratory was not
used).
2. There were laboratories who felt a breach of confidence could occur but
who agreed to write to the physicians caring for the identified patient to
suggest referral for the trial.
3. One laboratory allowed the trial staff to review the results and write di­

rectly to the physician.
In this trial both physician and laboratory referrals led to the entry of very

few patients.
6.3.4 Screening to detect suitable subjects

Having recruited only 11 patients in Baltimore the LRC Coronary Prevention
Trial employed screening methods to search for suitable male patients with
elevated scrum cholesterol concentrations. These methods arc given in table 61. Screening of men attending public gatherings, living in Columbia, Md.,
being considered for another trial or donating blood for the Red Cross led to
the recruitment of a further 246 subjects. The screening required a recruitment
coordinator, a blood-drawing team, and contact with the news media and film
makers for television in order to advertise the activity.
6.3.5 Volunteers with the disease under study (self-referral)
Many patients entered into the AMIS trial referred themselves for inclusion in
the trial. Patients were eligible for this trial if they had sustained a myocardial
infarct in the preceding five years and volunteers were sought by the advertis­
ing methods in table 6-2 [62]. The patients’ doctors supported the trial and
over a third of those randomised were recruited in this fashion. However, a
high proportion of volunteers were not suitable; some of these were excluded
when they were first interviewed by telephone.

Table 6-1. Methods of screening men for the LRC Coronary I’rcvential Trial in Baltimore |611

A. Screening at:
1. Health fairs
2. Bcforc-church groups
3. YMCA meetings
4. Shopping centres
5. Baseball games
6. 57 Baltimore industries
7. American Heart Association local programs

B. Screening of:

1. The entire city of Columbia, Maryland
2. Blood from American Red Cross donors
3. Potential candidates from the Multiple Risk Factor Intervention Trial (MRFIT)
4. Volunteers responding to mass newsletter mailings

Table 6-2. Methods of getting self-referrals for the AMIS trial |62|

1. National Media Coverage
2. Locally
Mass mailings in utility bills
Public rallies
Radio and TV announcements
Newspaper articles
Paid advertisements

6.4 FACTORS RELATED TO RECRUITMENT

Recruitment can be increased by reducing the exclusion criteria, offering
financial support to the investigators, or threatening to withdraw support. The
principal investigators also have a great responsibility in generating en­
thusiasm by visiting, lecturing, and publishing preliminary information.

6.4.1 Changes in protocol

Making the entry criteria less stringent can increase recruitment. For example,
in the National Co-operative Gallstone Study recruitment was increased by
raising the upper age limit from 69 to 79 [63].
6.4.2 Financial

In the Gallstone Study recruitment increased after it was threatened that re­
search contracts would not be renewed. Similarly, in the AMIS trial contract
support was reduced for centres who did not recruit as many patients as they
should, and centres who recruited an excess were given increased contract
funding.
6.4.3 Threats to withdraw a centre from the trial

In the AMIS trial all centres had to achieve a critical number of patients to
remain in the study. It is not known whether increased recruitment in clinics

who were stimulated to achieve the critical number compensated for the num­
ber of patients lost in clinics who were removed from the study.

6.5 FACTORS NOT RELATED TO RECRUITMENT
Crokc has reported some factors that did not influence recruitment to the
various centres in the National Co-operative Gallstone Study [63]: population
density, frequency of performing cholecystectomies, expertise of the clinical
directors, and incentives provided by ancillary studies were not related to
recruitment rates.
6.6 CONCLUSIONS

Recruitment is always a major difficulty in clinical trials. Muench’s third law
states, “the number of patients promised for a clinical trial must be divided by
a factor of at least 10’’ |64|.
In large trials recruitment may occur from clinical practice, laboratory rec­
ords, population screening, and volunteers with the appropriate disease. Re­
cruitment may be increased by widening the entry criteria and by financial

incentives.
Richard Reto, speaking at a meeting in Lyons in November 1981, empha­
sized the importance of recruiting sufficient patients for trials in cancer pa­
tients. He suggested that 1,000 patients arc usually the minimum number
required and that recruitmen*- can be increased by collaboration between
centres, simplification of entry and follow-up procedures, and possibly, by
payments to compensate for secretarial expenses.

nates bias in patient allocation, renders the treatment groups equal in most
important respects, ensures statistical tests are valid, and improves the acccp- I

tance of results. It also has disadvantages.
7.2.1 Treatment allocation is free of bias
7. HOW TO ENSURE THAT THE CONTROL AND TREATED
PATIENTS ARE SIMILAR IN ALL IMPORTANT RESPECTS

It is essential that the treated and control patients arc similar in order that any
differences in outcome can be attributed to the treatment and not to other
factors. Thus far everyone agrees, but how to obtain similar groups is open to
some discussion. In this chapter we discuss the two classical methods: (1) using
the patient as his own control (cross-over studies) and (2) random (chance)
allocation of patients to distinct and concurrently treated groups. Randomisa­
tion is expected to result in the groups being similar. The futility of using
historical controls is discussed in section 8.2.3; this chapter deals mainly with
the advantages, disadvantages, and methods of randomisation.
7.1 THE PATIENT AS HIS OWN CONTROL

When a patient is given one treatment and then a second the patient acts as his
own control. The trial is known as a cross-over trial and the design is discussed
in section 11.2. The order in which the treatment is given may be important if
the baseline is changing with time (section 11.2.5), and randomisation can be
employed to ensure that the patients who receive a certain treatment first arc
similar to those who initially receive a different treatment. A cross-over design
is only feasible for a chronic condition that reverts to its original state with the
cessation of treatment (for example, high blood pressure or diabetes mcllitus).
7.2 RANDOMISATION

The technique of randomisation in clinical trials refers to the “assignment of
treatments to patients using a chance procedure’’ [65]. Randomisation climi-

If an investigator allocates patients to cither of two treatments and not at
random, he may do so on certain criteria. For example, he may allocate on the
basis of the severity of disease. In section 2.2 we discussed the fact that James
Lind allocated patients with the most severe scurvy to a particular treatment.
The results in such a severely affected group may be biased against Lind s
selected and possibly favourite treatment. A more recent example was given
by a trial of anticoagulant therapy in myocardial infarction [66] conducted by
the Committee on Anticoagulants of the American Heart Association. The
investigators considered that there was sufficient doubt as to the efficacy of
anticoagulant therapy to warrant a clinical trial of this therapy against the usual
basic medical care. Unfortunately, they did not randomise the patients but
allocated those admitted on even days to the control group and those admitted
on odd days to the anticoagulant group. The results were published and the
patients given anticoagulants fared much better than the controls. However, it
was noticed that many more patients were allocated to anticoagulant therapy
than to control treatment (a ratio of 1.4:1, table 7-1).
It appeared that some referring doctors favoured anticoagulants, and if a
patient had a myocardial infarction on an even day they preferred to admit
Table 7-1. The number of episodes of myocardial infarction involved
in the American Heart Association trial of anticoagulants [66],

Excluded

Control group

Treated group

Admitted on even days (total 490)

395
35

1. Not given anticoagulants
2. Given a/c’s for complications
3. Given a/c’s to prevent
complications
4. Excluded

31

29

Total 490
Admitted on odd days (total 604)

546

1. Given anticoagulants
2. Not given a/c’s
Miscellaneous reasons
3. Not given a/c’s
Contraindications
4. Excluded

12
12

34

Total

63

442

589

Note: Twelve patients had two admissions and one had three. The number of patients involved in the trial was
therefore only 1.080.

them on an odd day. Seriously ill patients could not wait and had to be
admitted on an even day giving a smaller control group with severe disease
who did badly.
Problems were not limited to recruitment in this trial and difficulties arose
both in performance and analysis. Patients in the control group could be given
anticoagulants at the request of a private physician and 31 such patients were
transferred to the treated group. Moreover, 35 patients who developed throm­
boembolic disease had to receive anticoagulants but remained in the control
group. If the protocol cannot be followed after entry to the trial and the patient
is withdrawn or transferred to another treatment group, then the patients
should remain in their original randomisation group in order to avoid bias (see
the intention-to-treat principle, section 15.7). This rule was followed for 35
patients given anticoagulants in the control group and 12 not given anticoagu­
lants in the treated group owing to contraindications. However, the 31 epi­
sodes of active treatment in the control group discussed earlier and 12 episodes
of control treatment in the treated group were not analysed on the intentionto-treat principle. Lastly, 12 patients had contraindications to anticoagulant
therapy and should not have entered the trial (section 3.9). We should refer to
episodes rather than patients when describing the results of this trial as 13
patients were allowed to enter the trial more than once. This duplication
should not be allowed (section 15.1.1).
The benefit from treatment reported from this trial was a 27 percent reduc­
tion of mortality in men and a 36 percent reduction in women. The greater
benefit in women docs not agree with other studies and the ratio of treated to
controls entered in this trial was 1.5:1 for women and 1.3:1 for men suggest­
ing that the failure to randomise may have distorted the results more for
women than for men.

with 14 variables there is a greater than even chance of one item differing
significantly at the 5 percent level. But most confounding variables will not
differ with statistical significance between the two groups. On the other hand,
nonsignificant differences may be of biological importance. This is discussed
in section 7.2.5.

7.2.2 The treatment groups are similar with
respect to important confounding variables

7.2.5 Disadvantages of randomisation

Confounding variables arc those factors other than treatment that arc known
to be related to the outcome. They arc otherwise known as interfering or
nuisance variables and the effect of such variables is to confound (alter) the
effect of treatment. If, for example, survival from cardiovascular disease is
being examined according to whether or not the patients arc given a lipidlowering drug or placebo, it is important that the active and placebo groups
are similar for the known risk factors for cardiovascular disease such as high
blood pressure or cigarette smoking. Randomisation can be expected to lead to
two groups similar with respect to these characteristics.
If the groups arc similar, it docs not imply that they arc identical. However,
it can be anticipated that the two groups, for a given confounding variable,
will not differ at the 5 percent level of significance with respect to this item.
When considering a large number of confounding variables it is unlikely that
randomisation will ensure that the groups are similar in every respect; in fact.

When large numbers of patients have to be randomised, the numbers in each
group arc likely to be very similar. With small numbers (as in a small trial or in
one centre of a multiccntrc trial), the number receiving one treatment may
differ considerably from those receiving a second treatment. Zclcn [65] gives
an example for 24 patients where random allocation led to nine on one treat­
ment and 15 on another. This problem may be overcome by restricted ran­
domisation (section 7.3.4).

7.2.3 Randomisation ensures that many statistical tests are valid
Many statistical tests assume that the populations to be compared arc ran­
domly drawn from a larger single population. This is manifestly true for
randomised controlled clinical trials where the treatment groups arc randomly
drawn from the total group of patients.
7.2.4 Randomisation improves the chance that the results will be accepted

Weinstein [67] has pointed out “one randomised trial with 100 patients can
dramatically change physician behaviour, whereas the experience of 100,000
patients might be neglected.’’ Aware that many accept new treatment uncriti­
cally, the physician will naturally be influenced more by an objective unbiased
study such as the randomised controlled trial. When randomisation has not
been carried out but the data adjusted afterwards to correct for differences
between the groups, Weinstein went on to say, “Although matching or
covariance analysis may satisfy the investigator himself that all important
nuisance variables have been washed out of the analysis, there will always be
someone somewhere who will complain that the analysis did not control for
colour of eyes or sunspots . . . that he believes ... to be confounding the
results.’’ Randomisation usually provides groups similar with respect to the
colour of their eyes—and sunspots!

7.2.5.1 Randomisation may lead to unequal numbers in each group

7.2.5.2 Randomisation may still provide groups that differ in some important respect
After randomisation, bad luck may lead to the groups being different for an
important variable and such a difference may be both statistically and biologi­
cally important. With small numbers the differences may not be statistically
significant, but they may still be of biological importance. Considering the pre­
vious example of 24 patients, it is possible that even if 12 patients received one

and 12 the other drug, eight of one group but only four of the other group
could be male. This difference would not reach the five percent level of statisti­
cal significance but the two groups could hardly be said to be similar with

respect to the proportion of men in each group.
Important confounding variables can be allowed for prospectively by
stratification (that is, randomisation within certain strata such as age groups)
(section 7.3.6) or matching one patient with another. Retrospectively an im­
balance can be allowed for in the analysis [5].
1.2.5.3 Randomisation alone may be less efficient than matching

Weinstein [67] stated that “If a confounding factor is known to be present, a
smaller sample size would be sufficient to achieve comparable statistical infor­
mation content by matching or adjustment procedures than by randomisation
alone.” But attempts to get matched patient pairs may have two undesirable
consequences. First, as the trial proceeds it may become apparent that another
important confounding variable should be considered and the pairs will not be
matched for this variable; and second, partners may not be found for certain
patients and they may have to be omitted from the trial. Randomisation within
a limited number of strata may provide similar groups when more than a few
subjects arc to be entered.

1.2.5.4 Is it desirable for some patients to be allocated to an inferior treatment by chance?
The result of the trial may show that one treatment is inferior and it has been
suggested it is immoral to decide this outcome by chance. Randomisation can
be adapted to limit the number of patients who receive an inferior treatment
(section 11.8). But if two treatments arc not equal some patients must suffer. A
superficially attractive way around this problem is to ask the patients to select
their treatment. This is usually not appropriate for drug treatment but has been
suggested for operative treatment [67]. For example, one operation may be
thought to have a high initial mortality but an increased likelihood of long­
term survival and a second operation may have a lower initial mortality but
worse long-term survival. A young patient with a family may well opt for the
second operation. As in this example, it is very unlikely that the patients who
opt for one operation will be similar to those who choose the other operation,
leading to the problem that the effect of treatment will be confounded (con­
fused) with the different characteristics of the patients. Randomisation remains
the only safe method of selecting the two groups and when the investigator is
in genuine doubt concerning the preferred treatment, only chance allocation

can be ethical.
7.3 METHOD OF RANDOM ALLOCATION

We must discuss at what stage of the trial we should randomise, how to use
random number tables for randomisation, whether to restrict randomisation
to give an equal allocation of subjects between treatment groups, whether to

choose an unequal allocation, arnd lastly whether or not to randomise within
certain groupings or strata.
7.3.1 The stage of the trial at which a patient should be randomised

In clinical trials the investigator should first decide that a patient is eligible for
the trial, then randomly allocate the patient to a treatment group. He must not
be aware of the next treatment to be allocated as with this knowledge, bias in
allocation may occur. For example, if the investigator is aware that the next
patient to be entered will receive placebo treatment, he may be unwilling to
enter a patient with severe disease. The placebo group will then contain an
excess of less severely affected patients. The investigator must be certain that
trial treatments and then determine the
the patient is suitable for any of
c. the
----------

result of randomisation.
7.3.2 Method of determining the result of randomisation
To ensure that the investigator is not aware of the next treatment to be al­
located he may be required to contact a central coordinating office to deter­
mine the result of randomisation. Alternatively, if the result of randomisation
is held in the investigator’s office, the treatment to be
L allocated should be held
in sealed numbered envelopes, opaque when held up to the light, to be opened
sequentially as required and only after the patient s name and other details have
been written on the appropriate envelope.

7.3.3 Method of simple randomisation
When there arc only two treatments, randomisation can be achieved by simply
tossing a coin. However, it is usually administratively more simple, and less
open to manipulation, to decide the randomisation in advance. This can be
done conveniently using a table of random numbers as illustrated by table
2, the four-digit random numbers having been generated by a computer pro­
gram. The table may be read as four-digit numbers or single figures or any
number of digits either horizontally or vertically. Table 7-3 gives the results of
using single numbers horizontally and allocating treatment A when the ran­
dom number is even and treatment B when the number is odd. Fhc starting
number was selected at random in the first row, the sixth number from the
left. After 20 patients the ratio of patients on A:B was 1:1.5 and after 100
patients 11. With a small trial grossly unequal numbers of patients may be
allocated to two treatments. This will not be a problem with large trials unless
they arc multiccntre, where the small numbers allocated to individual centres
may result in some centres having a markedly unequal distribution of treat­
ments. The problem of unequal treatment groups is overcome by restricted
randomisation [68], otherwise known as block randomisation [65].

Table 7-3. Allocation of two treatments, A and B, to 100 patients according to
whether the random numbers in table 7-2 arc even (A allocated) or odd (B allocated).

Table 7-2. Four-digit, computer-generated random numbers
0011
1893
6658
5147
7886
9675
7756
6931
7134
3921
4893
5848
1849
9462
9843
2322
1705
0987
7453

0858
2018
9411
6128
1512
5715
1808
9953
5116
4119
7074
8690
8756
9739
8360
4221
7965
4087
9080

4371
1425
5940
9689
3488
8699
5655
6577
3817
9007
0631
1833
6811
0522
7114
6663
7440
6593
1084

7901
6942
5568
6802
1699
2543
4214
3369
6314
5648
7365
7555
(1913
9183
3640
9846
8084
4548
1354

2691
0755
7667
0238
7461
0769
6091
2761
3651
3664
2477
6690
8525
8797
3097
8555
6412
1165
1610

i

i

7.3.4 Restricted or block randomisation
Restricted randomisation ensures that within a block of patients equal numbers
are allocated to each treatment. For example, when it is decided to restrict the
randomisation so that every group of four patients includes two on treatment
A and two on 13, then Zclen [65] suggested allocating a number to each of the
six ways that the treatments can be allocated in groups of four. As before, the
random number table is consulted, not to allocate a single patient to a single
treatment, but to allocate a sequence of four patients to four treatments as in
table 7-4.
The disadvantage of restricted randomisation is that the investigator may be
able to predict the next treatment to be allocated. For example, an investigator
who is aware that randomisation is restricted for blocks of four subjects would
be able to predict the allocation for the fourth, eighth, or twelfth patient, and
so on. With a multiccntre study, restricted randomisation at the coordinating
centre may lead to balanced numbers in the trial as a whole but not for an
individual participating centre and the local investigator would be unable to
predict the treatment. However, it is desirable to have roughly equal numbers
at each centre as patients may do particularly well, or badly, at one centre. If a
certain centre has a predominance of one treatment the result may appear to be
due to that particular treatment. It is therefore necessary to randomise within
each centre to ensure that a roughly balanced allocation occurs at the centres.
Zclen has described a method, called balanced block randomisation [65],
which gives a balanced allocation for each centre while making it difficult to
predict the treatment sequence. The method requires the use of an auxiliary
random number table whereas variable block allocation can achieve the same

Patient

Random Number

Treatment
A or B

Cumulative number on
A
B

1
2
3
4
5

8
5
8
4
3

A
B
A
A
B

1
1
2
3
3

0
1
1
1
2

6
7
8
9
10

7
1
7
9
0

B
B
B
B
A

3
3
3
3
4

3
4
5
6
6

11
12
13
14
15

1
2
6
9
1

B
A
A
B
B

4
5
6
6
6

7
7
7
8
9

16
17
18
19
20

1
8
9
3
2

B
A
B
B
A

6
7
7
7
8

10
10
11
12
12

40

9

B

19

21

60

6

A

28

32

B

42

38

B

50

50

I

80

100

5

Note: Single numbers have been selected woiirking horizontally and starting at the first line (see text).

ends and be more easily understood. Variable block allocation is therefore
described here.

7.3.5 Variable block randomisation
In variable block randomisation the investigator does not know the number of
patients to be recruited before balance is achieved. Equal numbers may be

to variable block randomisation is adaptive randomisation. This is a complex
procedure whereby the probability of selecting a treatment can be reduced if an
excess of that treatment has been allocated [65].

Table 7-4. The use of a random number table
to achieve restricted randomisation

Random number

Sequence

1
2
3
4
5
6

A B A B
A A B B
B B A A
B A A B
A B B A
B A B A

7.3.6 Randomisation within strata

I!

Note: Randomisation is restricted to give two patients
on drug A and two on drug B for every four patients
entered into the trial. The random number is selected
from a table of random numbers and gives the treatment
allocation for the next four patients to be entered into
the trial.

I

Table 7-5. The 20 different possible sequences
for two treatments and blocks of six patients.
Allocation number

Treatment sequence

07
08
09
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26

A A A B B B
A A B A B B
A A B B A B
A A B B B A
ABA A B B
ABA B A B
ABA B B A
ABB A A B
ABB B A A
ABB ABA
B B B AAA
B B A B A A
B B A A B A
B B A A A B
B A B B A A
B A B A B A
B A B A A B
B A A B B A
B A A A B B
B A A B A B

Note: The random number can be searched for in random number tables to decide on the treatment alloca­
tion. Random numbers 1-6 can be used to allocate treat­
ment to blocks of four subjects (tabic 7-4).

reached after, say, four or six patients and this makes it difficult for the inves­
tigator to predict the next treatment to be allocated. Balance after four to six
subjects can be achieved by extending table 7-4 so that random numbers seven
to 26 cover the 20 different sequences for blocks of six (table 7-5). The random
numbers in table 7-2 must be examined in pairs and four block or six block
sequences allocated according to numbers 01, 02, . . . 26. A further alternative

I

In the same way as restricted randomisation prevents a disproportionate num­
ber of patients being allocated to one treatment, stratification ensures that one
treatment group docs not include an excess or deficit of subjects with a certain
confounding characteristic (for example, male). Inequality between groups is
usually only a problem when the numbers in a trial arc small. However, most
investigators would stratify: (1) by centre (in a multiccntre trial); and (2) by

sex.
Randomisation is carried out within each centre and for both sexes sepa­
rately. Weinstein [67] has argued that “matching, blocking or adjusting may
be far more efficient devices than purely randomizing. Why let chance do what
one can do for oneself?” Matching can be considered to be the most extreme
form of stratification with the subjects in matched pairs being randomly al­
located, one to receive one treatment and one the other. Retrospective match­
ing or adjusting the data is less desirable than starting with similar groups.
Reto and associates [69, 70] considered stratified allocation to be an unneces­
sary complication for large trials. In such trials chance will usually ensure that
the treatment groups arc comparable and retrospective stratification can be
used to compare the treatment effects in one stratum (for example, males) with
the effect in a second stratum (in this example, females). Stratification or
matching cannot ensure that the treatment groups are equal in all important
respects. Often the variables that arc known to be of importance are too
numerous and stratification for these features would lead to several strata
containing too few patients. Also, if previously unrecognised but important
features arc discovered during the course of the trial or during analysis,
stratification or matching will not have coped with these. In conclusion, the
number of strata should be kept at a minimum and restricted randomisation
may be carried out within each strata if desired.
7.3.7 Randomisation schedules that do not allocate
two treatments to an equal number of patients
When comparing a new treatment with established therapy, Pcto [69] has
suggested that the numbers allocated to a new treatment may be increased by
giving two patients the new treatment for every one patient given the control
treatment. He states, “there will be an unbiased randomised comparison
which is very nearly as sensitive as an ordinary cqual-groups randomised trial
In long-term trials of survival, when the new treatment is expected to
reduce mortality by 50 percent, then 2:1 randomisation may be expected to
yield an equal number of deaths in two groups. Fwo-to-onc randomisation
can be achieved by using twice as many random numbers for allocation to the

I'

new treatment as arc used to allocate to the control treatment. Unequal ran­
domisation is very attractive when little is known about the new treatment
whereas a good deal is known about the old or control treatment.
7.4 ALTERNATIVES TO RANDOMISATION

There is usually no good alternative to randomisation. The following section
gives some alternatives that have been suggested and the problems that may be
encountered with them.

7.4.1 Allocation according to date of entry to the trial
This was used in the trial discussed in section 7.2.1 [66]. The allocation was
known in advance, odd dates for one treatment, even dates for another, and
the trial entry was manipulated to give unequal groups. Prior random alloca­
tion, using random number tables, is simple and not open to such
manipulation.

7.4.2 Allocation according to the hospital number
The hospital number has been employed, even numbers being allocated to one
treatment, odd numbers to the other. This can be open to manipulation as
with the date of entry to the trial.

II

Play-thc-winner rules arc discussed in section 11.8. Not only must the results
be known quickly but the allocation may be open to manipulation.
7.5 CONCLUSIONS

It cannot be stressed too strongly that randomisation is necessary to ensure that
the different treatment groups arc similar in most respects. Occasionally bad
luck will still lead to unequal groups and retrospective adjustment of the data
becomes necessary. However, randomisation is the best safeguard that the
groups arc equal and that the investigator docs not consciously or uncon­
sciously manipulate entry to the trial, thereby producing groups of unequal
numbers or differing confounding factors.
A subject must be eligible for the trial and then randomised to a treatment
group. Randomisation may be restricted to give equal numbers in each treat­
ment group and restricted randomisation may be employed within strata to
ensure equality of confounding variables. Randomisation can be employed
when there arc more than two treatment groups and when more patients arc to
be allocated to one treatment group than another. In a within-patient cross­
over trial, randomisation is also required to determine the order of treatment
and to make sure that whether treatment A precedes treatment B, or vice
versa, is a matter of chance and not open to manipulation by the investigator.

7.4.3 Allocation according to the initial letter of the subject’s name
If the initial letter of the subject’s name is A to M, he can be allocated to one
treatment, and if it is N to Z he will go to the other. This is a very unsatisfac­
tory method. Not only can allocation be manipulated but the treatment groups
may be very different. For example, in the United Kingdom there arc more
patients with surnames starting A to M than N to Z. Moreover, A to M will
include an excess of Scotsmen with names starting with Me; N to Z will
include a greater proportion of Irishmen with names starting with O’ and also
Asians with the surname Singh or Patel.
7.4.4 Allocation according to the wishes of the patient

This procedure has been proposed in the context of trials of surgical proce­
dures [67]. Although it is desirable to conform with the patient’s wishes, it is
both possible and probable that patients choosing one form of treatment will
differ from those choosing another. Obvious differences can be adjusted for
retrospectively in the analysis, but differences may remain that are not recog­
nised and can be confused with treatment effects. Random allocation is by far
the safest procedure and cannot be recommended too strongly.

7.4.5 Allocation according to the preceding results
An example of such a strategy is given by the play-thc-winner rule: if a
treatment is followed by success the next patient also receives this treatment; if
a treatment results in failure the next patient receives the alternative treatment.

I

show as much benefit from treatment as most investigators assume will be the
ease. There arc several reasons why an experimental treatment appears so
much more effective in observational studies than in a controlled trial and these
include the placebo effect, regression to the mean, and time trends in the
condition being studied.

8. HOW TO ENSURE THAT THE RESULTS ARE FREE OF BIAS

I

Biased results are distorted and prejudiced in favour of one treatment or an­
other. When discussing the assessment of a patient’s progress in a trial, Brad­
ford Hill [8] wrote, “the judgements must be made without any possibility of
bias, without any overcompensation for a possible bias, and without any
possibility of accusation of bias.’’
The results of a trial may be biased if patient allocation favours one group
rather than another; if an adequate control group is not provided and therefore
the results cannot be interpreted; when noncomphancc influences the results,
when patient or investigator bias affects the assessments; and when analytical
bias distorts the presentation of the data.

8.2.1 The placebo effect
In section 2.4 two early examples of placebo responses were reported; Hay­
garth’s use of dummy wooden appliances to investigate the use of Perkin’s
metal tractors which were supposed to cure by electricity [14] and Sutton’s
report of the effect of mint water in rheumatic fever [15|.
In the 1920s the Western Electric Company carried out some experiments in
its Hawthorne plant in Chicago. Illumination was either increased, decreased,
or held constant. The workers were interviewed and the increased attention
paid to them led to a rise in production, independent of changes in lighting
intensity [71]. There was therefore an unplanned effect in the control group
and this became known as the Hawthorne effect. Edcrcr [68] considered that
the placebo effect may arise either from a change in the social situation or from
suggestion. Table 8-1 gives measurements of placebo effects mostly reported
in a review by Beecher [72]. Even severe postoperative pain and angina can be
satisfactorily relieved by placebo in up to 40 percent of patients, headache
alleviated in 52 percent, and cough in 37-40 percent.
Shapiro [85] has concluded, “we are led to the conclusion that the history of
Table 8-1. Placebo response in painful conditions and
cough (largely derived from a table by Beecher |72] ).

Complaint

Placebo

Postoperative pain

Intravenous
Saline

Subcutaneous
Saline
Lactose by mouth

8.1 BIAS DUE TO AN UNEVEN ALLOCATION TO THE TREATMENT GROUPS

This bias may arise due to the failure or absence of randomisation and is dealt
with in chapter 7. To summarise, the patient must first be considered eligible
for the trial and then randomised to his particular treatment group. The inves­
tigator must not have prior knowledge of the treatment that will result from
randomisation.
8.2 AN ADEQUATE CONTROL GROUP MUST BE PROVIDED

Controls are necessary in clinical trials as otherwise any improvement or dete­
rioration associated with giving the experimental treatment cannot be
confidently attributed to the treatment. Randomised controlled trials rarely
56

_

Percentage
patients
responding
to placebo

Reference

21
26
31
39
33

73
74
75
76
77

Angina pain

Tablets by mouth

26
38
38

78
79
80

Myocardial infarction

Intravenous saline

54(10 min)
31(10-30 min)

81

Headache

Lactose by mouth

52

82

Cough

Lactose by mouth
Subcutaneous saline

40
37

83
84

'^c rtCn: Peio

gr
lU'
*
'co-workers considered that ^“^-dmnised

Eh h-Sa wrssx

very persuasive
historical control as t'..:

S’itSl iras been concavely demonstrated I

as=r~ssi=?=
and when naloxone^ e2 pJm relieving, effect of this ^l^^^ve
to
drug,
be so powc.
treateu --

8.10.
8.2.2 Regression to the mean
^>□1 nracticc, patients will usually b

"■ ‘ T„ Aohc P

•"

havc developed
iiinitin2 illness, then they

fr°"' ‘„A“

n»7

8-'”

Noncompliancc may
'interpretation of a trial
total
occurrence may ^^XessmJnts of the effect of ^PP'^^^lfference
illustrates two possiblefrom an observat>onal st.
mortality "1.nl^wbo stopped smoking and those wno con
observanonal

“’’“‘ASofeXn.■ ■ ™>a”"‘A.Xh“ic Ao

group

p—

«a *7

measurement is■ hrgh,
sc)cctcd bccausc I the reading is low, the
will be lower. Simllar^/ nf bc higher. Regression to> the mean is the move­
subsequent mean rcadi g; 11
n ncarcr t0 a more moderate value as a

8.2.5 Time trends

This problem has been re

reduction in the seventy of disease

effected by changing refer

”iS™« .be h«. 8.«”P
.*««. of
A''S'’o.’.ns“”'« ”P“"',S "' *'
that ignoring a icw

p<

Tte’ 7“ ““
'"'

» >"«*'

xt“ ArAXAy «oprj
sa
SSSs^S^XR,
The result o

observation may revea

be corrected.

XXbbxes
therapeutic advice. Not al It
percent had stopped three ye
control group d.d stop smokmg
facts may not explam a I d-d

"EX-- 7”

thcrapcutlc advice and its

8.3 NONCOMPLIANCE MAV BIAS THE

advicc Was given). A*°
thesc
af(cr tbrcc years). Ho
bctwccn the results of the o
mtcrvcntion 5tudy s
may
P^

tional and intervern
nQt otherwise doh_conscious group
suaded to stop smoKir g
wcrc presumably a hea
lse Not
study, those.who stoPpecn
in then die« an ^e^
'"'L these men had a low mortahty and
iscd controlled trial
surpismgly the
lowest mortahty. i he
Ctritcsv of giving

:XSAAo,..piu«;^
It would appea

"A-



relationship between stopping

to considering only h

only for a

OBSERVATIONAL STUDY

RANDOMISED CONTROLLED TRIAL

SMOKERS

HIGH RISK SMOKERS

NEVER SMOKED

MORTALITY
1.80Z/year

MORTALITY

1.32Z/year

CONTROL GROUP
(14Z stopped after
3 years)

ADVISED TO STOP SMOKING
(36Z stopped after

MORTALITY
1.65Z/year

MORTALITY
1.63Z/year

MORTALITY
1.74Z/year

J

I

SPONTANEOUSLY
STOP SMOKING

CONTINUE
TO SMOKE

I

'Effect' of anti-smoking advice +0.1IZ/year.

'Effect’ of stopping smoking - 0.15Z/year.

I

3 years)

I

’Effect’ of never smoking
- 0.48Z/year
Figure 8-1. The effects of stopping smoking as determined from an observational study and a
randomised controlled trial (see text).

~ W s’*3. m

r?

tn

■=r
Q

-

o. 0

“• '-*

C\

0

O

cr



0

^h

2

i-^|


n

0 X

O OQ

v>

r— rCT

2. 0
=

-

aq 9BJ

-

H
0
X

s is
s-= 3
° 3 3

£ =r

>

=■« S-- §

9-

0- n xj

: 3 3£
2.5 2 "= O
3’ TS
3. S
2 3 £ o 2 ■n -*
2 3
tn

S 2 2



= I=' 0A cr

— • O. 2
'
CT
w
u

era
O

2

■<

CT

o

1

Z

3

o 2

<
0
<

QO

-

0

9- 3 2
o

0

o
=r

2 = 2“
Q.
S' - „ qi
? OM

0

8 s q
Q-

.
<

hll

X

2tn '<

3 a. 2•— •

Z3

’• 0 T3

=r 0~ 9 2

tn
l

V

7T

0

c^
H

Z

o

H

Z

■fl

ctn

Z
tn

o

H
X
rn

' 3 a. •

='££=■

□ A H-qS

9* - 2 2 "
£ r-»
Q-

0

>—»

=

5'
<

<

vi
r-r

? w

3

O

A

0

3 2
- 30

i’l 2

C
<z>

s s* s 11 o
H
o
£ n
9- f *nZ5
r

S: §■

0 X

• o

A
0

Q
0

rr

ids
n 2 f

rg

2"

w

5

0

Z

„ 2*
cZ S' 3
3o- -

s' o

Q-

so

0

tn

tn

— tn
?0w

Q_ 0

O


SZC

O X

0

0
3

sj^

0

0

0

o

sS3z
30 'c_
2.
aq r-»M„ - era
n 0
90
“ §
q

0

tu

m

W

■■

X

. 85
g§: S' 3 2^5'


a.
o_
9.
=LI
o- s= i
E 2 3
~

—t

r-t

? W

3“X$
O - al5 y.'
£’ <

<" ~
O o

<9 £
=’ <" “

0.

□_ 0 w 3 do 2 0 SI 2
89-r-T

Q

3^

2. 1 i

012 J<-r
2
2, 0
- * CJS S
9- o o S' n ~
° 9 o
9 QCL. 2
0 q
0
s* 2 9 — S’ 0
-3^2
X 2
o

1

o

9 3

0
CL.

l 1

n

8.2

aj

O

I

r"t

3-

0

2

2
era


r~ o

s3

~

O

3

2 2- g S 5- tn
5
so
~ ... „& =
tn

i
23 °5 S£ =£■ 3a. “2.
q C
r
"
9-era
- • 0
o 05 w= —
H
</>

” ?


ra
c
2 3

1-1

S-T* "

i 3

O- o 0

CO

q S- 3 2?
q cs. 8. o

"
~ §

M

C
n

CT =- 0

O

3
x>

O. L
—. oj

- 8CT
XS 2 is
X 3 £
a.9918
9

S 9

0

rt

'C

tn

<-T

S^o I

“ 5 O


era’

tn

=
ss

I/)

I—*

O

O

i
s

S’ G
era o o

o
M

CL 3

tr
w

<
0

era

£

1tn2 O< q
o

XI 3’=

O

2 era S c
0
cr □.
3
2Z

5' ■: tnS' 0?

3

era

C/3

3‘

3 O g

5 o -• H.
E. 2. Si.

H o ,n
=

— Q-

oj

O

□. <



tn

»is

3^1 o

tn

-

C

2

8 s<T

—•

3 2
-t
O. 3
0

S’ °
S’ q

r
3

M

~ £

S

O

<-r

w

0

O
3

—■

3 0 £’ 2
?T < S 3

tn

CJ

w
tn

T5

tn
C>

0

o

o’ 8
*
s 3" "C
X 3

X AJ. O-

0

—•

-

O 9
cn
Q- 3 ?"
• era
« 'Tx
CL- _

Q
Q- QO

- 0

3

era
er w 0
n 3 2

M O8 SW’ 8 3 S

2 < O?

s'I-tB
r-t

<-T

0

0

« ? -• cr q £ $ S“ x <
a s?n “Z
"
g “
o cr
o
0

2 r 2S 8s %q i i S'

tn

• 0

_

C/i

r
H

« w

2. X
23
° C- § 3 35 g“52< S2 9 a5-h.o
2 ” <91 0 f 9
iL Ctn
i %- r z g 'S’ 9^
- 5
- 2 3
9 era Z
O
3 8 ?Ssn
-> - m
i'jsll
p5- o' g- g - 2: o « s' B.
n &*
H
< I
- 0 91 <-» o S g. ° X
o. __ cr _ 5 w v’’ 3

JO

C

0’ SO



0-0.0

-. CT. _ __

r-T

5 9^

X 0

>

3 2



— Q- •

2 £• 2-0

^7-

o 3 5

M
tn

2 2

2.
<3
q 9
2 9 X 'H.
U o3
rt

—. 0

?= 5 5 =

• d
05 3 |
£3 5.
tn
2 2 CO x 9 a, „

Z
o’s
-. g
B sI
g
Bio
^30
SH
5
s>
SJL^SS'^
0
S'x
3’“B
Q.
W
q- P
X>

2 1§-* 9- o3

9n
=•’ < % 2
S 2 3§ I

tn

= ? □ S'



M

3* 3

e-r r-f
T

Sh'5

tn

C/5

v>

o

--

■n

Vo. 32 °2cg o-°O' 8<= "S0g. 2C "S q- -1 82
<

n _
<-t —1

3

_ 9 0

O

:

S 5 ~ 0
~ 5
2

00

2
z
I §
§•0^1;
<
o
.
iT

2
r
o
Z
".
4
°
o
rn
2
2 =r □- ?_
s: d 3 § 0 §
5- - HS° & A
S- d
o "
I
>
l q C. S S'X
S’
0

H
-t
. tr cr

era'

o

0

£2:S S ° 5■-S
? 3.0 J s °
1 b g- 3.
2tn — 3tn q —
T3
A E- w
0 0 3
a. x 3 3
Dr, " 2 ° °
3 3
a 2„
tn 0
8 *8^
Cl.
tn <-r M
S O 0
x
-5. 2 -

q

1

0

r

5J

w

0
oj

Q-

5-3

0

8

n

O

3

3
x;

0
tn 0
-■ 0
Q- Cl- tn

O

0



82^58

OJ

<

I"

T TO 2 3

q § 3g 2
0* CT

X"

E s’ w

g q 2. - B S'
o 5 3 2
?! “ 2. o

O
X
XJ

..an

<-*
Q
0
Q-

r-»

Q

tn
SJ

■ 2. O
O

Q. 3

9- S
3n S*
£
n

OBSERVER AWARE THAT THE PATIENT IS TAKING
A)

ACTIVE TREATMENT

B)

DOUBLE BLIND TRIAL

PLACEBO TREATMENT

HEIGHT OF MEASURED BLOOD PRESSURE
LOW

HIGH

HIGH

HIGH

LOW

LOW

A PROPORTION
OF READINGS
MAY BE
REPEATED

A PROPORTION
OF READINGS
MAY BE
REPEATED
(perhaps after
further rest)

I

LOW

V
HIGH

STILL
LOW

HIGH

ACCEPTED

AND

RECORDED

LOW

STILL
HIGH

J

L
RESULT

Figure 8-2. Hypothetical blood-pressure readings in an open and a double-blind trial.

room. The existence of these rays was disproved by a double-blind study by
Pozdena [91] and also by a trick in which the American physicist, Wood,
substituted a wooden ruler for a metal file and asked Blondlot whether the file
was producing n-rays. Blondlot assured Wood that the file enabled him to see
much better [92]. Pozdena performed a straightforward but elegant double­
blind trial in which an assistant opened and closed a shutter, releasing and
interrupting the flow of hypothetical n-rays. Pozdena recorded increased
luminosity as often when the shutter was open as when it was closed [91].
Blondlot’s error arose from the difficulty of making subjective assessments.
Observer bias can present great problems when determining subjective
impressions such as the presence of symptom side effects; this is discussed in
chapter 16. More objective measures may also be influenced by observer bias;
section 9.1 discusses the repeatability of measurements, and section 9.2 the
quality control of data. The measurement of blood pressure in a trial of an
antihypertensive drug may be taken to illustrate the difficulties. Let us assume
that the protocol specifics that the blood pressure has to be taken after 5
minutes rest in the lying position. Figure 8-2 illustrates the possible sequences
of events, first when the observer is aware of the treatment being prescribed
and second when the trial is double-blind and the observer is not aware of the

treatment being given. For each instance the hypothetical result of detecting a
high or low reading is charted. It is possible that all first readings arc accepted
and recorded except for high readings when the patient is known to be receiv­
ing active treatment. In this case the observer may assume, possibly rightly,
that the patient has not relaxed sufficiently or that the blood pressure has not
been taken correctly. In this case the cuff may be applied more carefully (for
example, with the inflatable section more accurately over the brachial artery);
the patient may be asked to relax for a further two minutes, and the readings
repeated. We assume that the second reading is accepted and recorded. The
observer may be correct when he substitutes the second lower reading for the
initial high reading. However, high measurements will not necessarily be
repeated when the patient is known to be taking a placebo and the blood
pressure recordings, on average, will be biased towards lower readings on the
active treatment. Similarly, it is possible that low readings on placebo will be
repeated, making matters even worse. No such bias is to be expected in the
double-blind trial.
Fletcher [93] stated, “Both in initial assessment of the patients and the subse­
quent assessment of their progress the tests should be applied by observers
who remain unaware of which patient is undergoing treatment and which is a
control. If this is not done, the subjective judgements which are inseparable
from nearly all tests in clinical medicine may prejudice the results ...”
Another example where objective measurements arc subject to observer bias
was given by Kahn and his colleagues [94] when reporting on measurements
of scrum cholesterol. They found that when technicians were given blind
duplicate bloods to measure, the standard deviation of the duplicates was 2.5
times as large as when they were given labelled duplicates. Wilson concluded,
“No human being is ever approximately free from these subjective influences;
the honest and enlightened investigator devises the experiment so that his own
prejudices cannot influence the result. Only the naive or dishonest claim that
their own objectivity is a sufficient safeguard ...” [95]. Observer bias can
favour or prejudice a positive result.

8.5.1 Observer bias favouring a positive result

Muench’s Second Law states: “Results can always be improved by omitting
controls” [96]. A control group can demonstrate the occurrence of spontane­
ous improvement and thereby reduce the magnitude of any overestimated
treatment effect. Even with controls, the observer may tend to report more
favourably on a new treatment under investigation. When the trial is double­
blind, observer bias may be prevented and produce a more correct estimate of
the effect of treatment.
Foulds reviewed studies of antidepressant drugs conducted between the
years 1951 and 1956 [97|. He identified 36 studies in the American literature
and 36 in British journals. Only four trials in the American literature included
controls. In the British literature 16 trials included controls and, on average.

these papers reported a 19 percent success from treatment. In contrast, the 20
uncontrolled studies reported an 85 percent success rate. Foulds also made the
interesting comment that patients with anxiety states tended to improve spon­
taneously, thus when subgroups were analysed these patients appeared to have
responded best to treatment. Without adequate controls, it might be errone­
ously concluded that the antidepressant drugs were most effective in patients

8.7.2 Selective withdrawal of patients from the placebo group

with anxiety states.
8.5.2 Observer bias prejudicing a positive result

This theoretical consideration has been discussed by Edcrcr (68], who sug­
gested that a bias may occur against the experimental treatment “when the
investigator’s bias is against the treatment or when he ovcrcompensatcs for his
known bias in favour of treatment. Thurber’s moral, ‘you might as well fall
flat on your face as lean over far backward’ [is apt].”
8.6 BIAS ARISING FROM THE ANALYSIS

There are many ways in which the analysis of trial results could bias the
conclusions. An example was discussed in section 8.3. In the trial of antismoking advice [56] the results in thc intervention group could be improved by
omitting those who did not stop smoking and made worse in the control
who did stop smoking. The intervention, howgroup by leaving out those
t
thc
giving
of
antismoking
advice and the total intervention group
ever, was
has to be compared with the total control group> as the trial was not intended to
be an explanatory study. Similarly, the Anturanc Rcinfarction Trial [48] has
been criticised for leaving out patients in the intervention group simply be­
cause they died while the scrum concentration of Anturanc (sulfinpyrazone)
was thought to be negligible (see section 19.1). It has therefore been suggested
that trials should be analysed without the statistician’s being aware of the
treatment given to the various groups, the treble blindness defined in section
8.4.
The greatest problem in analysis arises from the difficulty in deciding whom
to include in the analysis and whom to omit. Similarly during the course of the
trial withdrawal of patients may bias the results.
8.7 BIAS DUE TO THE WITHDRAWAL OF PATIENTS FROM A TRIAL

Bias may result from withdrawal from the group as a whole (for example,
during a placebo run-in period) or by selective withdrawal from the different

treatment groups.

8.7.1 Withdrawal of patients during a placebo run-in period
The withdrawal of noncompliant patients prior to randomisation results in
compliant subjects being entered into the trial and the conclusions from the
trial results arc only valid for such patients. The problem is discussed in
chapter 5.

I

Placebo treatment may be less effective than active treatment and patients may
be withdrawn owing to treatment failure. In long-term trials of antihyperten­
sive agents to determine their effect on mortality, more patients arc withdrawn
from the placebo group than from the actively treated group owing to marked
rises in blood pressure. If a withdrawal criterion states that a patient must be
withdrawn with a certain elevation in blood pressure then the assumption is
that, left in the trial, this patient would be likely to suffer an end point such as a
stroke. If the withdrawn patients arc omitted from the analyses, the placebo
and actively treated groups will no longer be comparable for important charac­
teristics, as less patients with increasing blood pressure will be removed from
the actively treated group [98]. If withdrawn patients arc included in the
analysis, it must be assumed that they suffered an adverse event with a defined
probability. It would appear unreasonable to define the probability as 1 since
no adverse event was in fact observed. An estimate may be available of the
probability from observational studies or from earlier trials when patients with
the given level of blood pressure were not withdrawn but continutcd in the
study. It may then be possible to allocate some estimate, say, three cardiovas­
cular end points for every ten such patients withdrawn from the trial.

8.7.3 Selective withdrawal of patients from the actively treated group
In a double-blind trial, an excessive withdrawal of patients from the actively
treated group may be expected to represent the frequency with which the
active treatment produces adverse effects. However, in a trial that is not dou­
ble-blind patients with possible adverse reactions will only be withdrawn from
the actively treated group.
In a single-blind trial a patient may suffer an episode of illness that he
attributes to the drug he is receiving. Figure 8-3 illustrated the sequence of
events that may occur. The patient may have a gastrointestinal upset, rash,
influenza, or other condition unrelated to treatment. If he has just started
therapy, he may attribute the symptom to the drug or other treatment and not
to an intcrcurrcnt illness. When the patient is on active treatment, the doctor
may agree with the diagnosis and withdraw the patient from the trial with a
possible adverse event. When on placebo treatment, the patient will not be
withdrawn and the losses from active treatment will be greatly in excess of the
withdrawals from placebo treatment. This situation is avoided in double-blind
trials.
It must be remembered that placebos do occasionally produce an adverse
effect. If a placebo tablet contains lactose it is possible that diarrhoea will be
produced in susceptible individuals and in this event any excess incidence of
diarrhoea with the active treatment may be underestimated rather than overes­
timated. Usually however, a placebo group and double-blinding gives an
unbiased estimate of the frequency of adverse drug reactions (sec chapter 18).

SINGLE-BLIND TRIAL
PATIENT ON PLACEBO

PATIENT ON ACTIVE TREATMENT

GASTRO-INTESTINAL UPSET SHORTLY AFTER STARTING TREATMENT AND
UNRELATED TO TREATMENT.

PATIENT ADVISED TO
CONTINUE WITH PLACEBO

PATIENT WITHDRAWN WITH A
POSSIBLE ADVERSE REACTION

Figure 8-3. Mechanism whereby the adverse eficcts of treatment may be overestimated in a
single-blind trial.

8.8 BIAS DUE TO FAULTY METHODOLOGY

When the trial protocol is not identical for both the treatment under investiga­
tion and the control treatment, then biased results may be obtained. Two
examples may be provided: (1) when the treatment has an additional but
nonspecific effect this may be difficult to reproduce in the control group, and
(2) when the protocols for the two groups differ in order to save time or
money.
8.8.1 The treatment may have a nonspecific effect
that is difficult to reproduce in the control group

The nonspecific improvement resulting from giving drug treatment can be
estimated from a placebo group in a double-blind trial of drug treatment. The
nonspecific effect cannot be estimated when surgery is the intervention strat­
egy and may also be difficult to assess when therapeutic advice is given. In a
trial of antismoking or dietary advice, the patients in the intervention group
may be seen repeatedly and it will be difficult to provide placebo counselling
on any neutral topic. First, it would not be ethical to advise the control group
against an activity that does not harm them. For example, in a controlled trial
of antisalt dietary advice in hypertensive patients poorly controlled on antihy­
pertensive drugs, we decided that only the patients randomised to the diet
would get dietary counselling. However, interviews were arranged for both
the intervention and control groups to assess compliance with drug medication
and the patients’ quality oflife. The prohibition of a neutral dietary constituent
as control advice might have been acceptable in the short term (for example,
ice cream could have been prohibited). However, such advice would not have
carried much conviction and the day-to-day dietary habits of the patients
would have been largely unaltered. To give the control group major and

therefore comparable advice (such as a low-cholcstcrol diet) would not have
provided an untreated control group. There may be no satisfactory method of
performing a double-blind trial of dietary advice but the effect of a low salt
intake can be determined by advising all patients to take the diet and add cither
salt tablets or placebo tablets according to whether they were in the control or
intervention group.

8.8.2 The protocol differs between the intervention and
control groups in order to save time or money
A theoretical example may be employed to illustrate a faulty protocol that
requires the intervention and control groups to be treated separately. A new
drug has to be tested against placebo treatment and blood tests and blood
pressure measurements arc required for the active treatment group but only
blood pressure measurements in the placebo group, as a considerable amount
of data has shown that placebo treatment docs not affect the blood tests.
However, taking blood in the actively treated group may raise the blood
pressure in that group either because the blood test precedes taking the blood
pressure or because the patient anticipates the blood test. No such rise will be
observed in the group given placebo and whenever possible the protocol for
the actively treated and control groups must be identical.

8.9 THE USE OF THE DOUBLE-BLINDING TECHNIQUE
The double-blinding or double-masking technique is very important and
should be used, whenever possible, in all randomised controlled trials.
Knowcldcn [99] has commented,
When everyone is in the dark, subjective measures can be used with confidence as there
can be no bias introduced by patient or observer. If an observer had to diagnose a slight
paroxysmal cough in a child known to have been vaccinated against whooping cough
he might, because of bias in favour of the vaccine, decide it could not be pertussis. What
is probably more likely than such cheating is that the clinician would attempt to
compensate for his known bias and label the case one of whooping cough against his
better judgement.

Double-blindness, however desirable, is not always possible and many trials
have been performed without any blinding or with only masking of the pa­
tients (single-blinding). We must discuss how double-blinding can be
achieved, when it cannot be utilised, why it is often not attempted, and what
may happen if it fails.

8.9.1 How to achieve double-blinding

The following steps will lead to double-blinding:

t M4-S

1. The patient must understand that the trial is double-blind and give written
informed consent.

2. The control treatment and experimental treatment must be identical. If
tablets are used, they must be the same size, shape, colour, taste, and smell.
3. The exact nature of the treatment being given must be held in a secure
position and be immediately accessible in an emergency. The treatment is
often written on a card and enclosed in an opaque, sealed envelope. When
the treatment is a drug these envelopes may be conveniently held in the
pharmacy responsible for dispensing the treatment.

8.9.2 When masking is not desirable

Blindness is not desirable when it would result in unacceptable suffering or
discomfort for the patient. The taking of dummy tablets will not be expected
to produce any discomfort but sham operative treatment or dummy treatment
involving repeated injections would do so. In the early Medical Research
Council trial of antitubcrculosis therapy [3], it would not have been desirable
to inflict dummy injections on the control group for several months. Simi­
larly, although placebo operations have been performed (section 3.8) it is only
possible to perform sham operations in humans in very exceptional circum­
stances. These reservations do not apply to animal experiments where sham
operations are the rule.
8.9.3 When masking is difficult

Double-blinding techniques may be difficult when the treatment has obvious
effects or when the treatment involves a change in life-style.

8.9.3.1 The treatment has an obvious effect

Certain drugs have specific actions which, if detected, would enable the patient
or observer to break the treatment code. Examples arc given by the betaadrcnoccptor blocking drugs that lower the pulse rate, and diuretics that usu­
ally result in an increase in scrum uric acid. An observer could use this infor­
mation to break the code and steps should be taken to ensure that he docs not
become aware of these results (section 8.5).

8.9.3.2 The treatment involves a change in life-style
Dietary change is an example of an alteration in life-style where masking is
difficult. Unlike trials of drug treatment the advantages of double-blind trials
of dietary advice are balanced by a number of disadvantages. In a double-blind
trial, items of diet have to be provided by the investigator, some of which
contain the dietary constituent under investigation and arc given to one inter­
vention group and some that appear identical but do not contain the con­
stituent and are given to a second group. An example is given by the National
Diet Heart Study [100] in which saturated fats were the dietary constituent
under investigation.
Table 8-2 lists the advantages of double-blind and nonblind trials of dietary
advice, assuming that the trial end points arc not affected by masking, being

Table 8-2. Advantages of double-blind and nonblind trials of dietary advice. With a double­
blind trial the subjects arc given certain items of food whose composition is unknown.

Double-blind dietary study

Nonblind dietary study

1. Allows effect of diet to be assessed
over and above nonspecific effect of
dietary advice.
2. Changes in diet in control group less
likely to occur (less contamination of
control group).
3. Easier to secure unbiased ascertain­
ment of all end points.
4. Easier to secure motivation of partici­
pants in control group (no difference
in dropout rates).

1. Assesses combined effect of dietary
advice and taking the diet.
2. Control group has an unrestricted
diet.

3. Easier to administer the trial and less
expensive.
4. Fewer ethical problems (patients and
physicians know the treatment).

5. Monitoring of blood or urine may be
used to improve compliance in inter­
vention group.

either easily ascertained (for example, death) or determined in a blind manner
(for example, measurements of scrum cholesterol). A nonblind trial of dietary
advice tests the combined effect of diet and increased medical attention. Those
who receive dietary advice must have greater contact with the research work­
ers and this may affect the subjects’ well-being in a nonspecific manner. The
subject is aware that he has altered his diet and an increased attention to his
food may either improve or reduce his sense of well-being. In a double-blind
trial the intervention and control group will be equally affected by nonspecific
dietary changes whereas in a nonblind trial the control group may have an
unrestricted diet. However, in a nonblind trial some of the control group may
alter their diet in a similar manner to that of the intervention group. If the
control group hears about the intervention and alters its diet, the group is said
to be contaminated.
As in randomised trials of other treatments double-blindness should ensure
that the trial end points arc determined similarly in the different groups and
that the dropout rates arc not affected by knowledge of the treatment. These
advantages are balanced by a great increase in complexity and expense, possi­
ble ethical problems, and the fact that adherence to the diet cannot be closely
monitored by blood or urine tests as these results, although useful in identify­
ing noncompliant patients, will allow the treatment to be determined by the
investigator if employed to improve compliance.
8.9.4 Strategies to be employed when double-blindness is not possible

When a treatment cannot be provided blind or when it produces obvious
effects it may not be possible for one investigator to conduct a masked trial on
his own. If the effects of treatment arc determined from laboratory investiga­
tions the trial can be designed so that laboratory results arc withheld from the

trovcrtible the cause of death may be open to doubt. For example, sudden
death may or may not be due to myocardial infarction and it may be difficult
to distinguish cardiac from respiratory death. The difficulty is compounded
when the investigator only knows the cause of death as written on the death
certificate. These certificated causes arc often not supported by postmortem
examinations and arc subject to error. Moreover, if an observer believes that a
particular treatment prevents, or causes, a specific cause of death, this precon­
ception may lead to a biased assessment of the cause of death.

investigator. When a clinical measurement would detect the treatment, one
investigator can determine the result and a second can be responsible for the
care of the patients. Measurements can also be limited to those that arc not
open to observer bias (hard end points).

8.9.4.1 Blind the investigator to certain laboratory results
An example of this procedure may be provided by considering a trial of the
blood-pressure-lowering effect of tienilic acid, a drug that markedly reduces
the serum uric acid. During a double-blind trial the investigator measured
blood pressure and interviewed the patients but the results of scrum uric acid
measurements were withheld from him [101]- It was planned that if a scrumuric-acid result was grossly abnormal and action required, the investigator
should be told and the treatment code broken.

8.9.5 The disadvantages of blinding

It is possible that masking will adversely affect patient recruitment and inves­
tigator participation. Identical control treatment will have to be provided at
extra expense and it may be difficult to ascertain the exact treatment in an
emergency. With drug treatment, it will be necessary for the patients to take
extra tablets or capsules and labelling errors may occur. Lastly, masking may
require additional personnel.

8.9.4.2 Two investigators: one to assess the patient and the other to provide treatment
Two investigators may have to be involved when therapy cannot be provided
blind or when a treatment affects a measurement in a constant manner. One
investigator provides the treatment or takes the measurement and the second
assesses the end points of the trial. An example is provided by a trial of dietary
advice where one investigator gives the therapeutic advice and a second as­
sesses the outcome in terms of weight or blood pressure without knowing the
treatment allocation. Similarly, in a trial of medical versus operative treatment
of peptic ulceration (for example, by cutting the vagus nerve), it may be
possible for a second observer to assess the results of the treatment from
radiological plates or photographs of the ulcer. Care would have to be taken
that such materials did not reveal whether or not surgery had been performed.
It is possible that a moving image of barium past the ulcer may enable a
radiologist to determine whether an operation had been performed or not.
In certain circumstances, the person providing the treatment may be masked
and the person taking the measurements may not. Such an arrangement would
be appropriate in a trial of the antihypertensive effect of increasing doses of a
beta-adrenoceptor blocking drug compared with a different antihypertensive
drug. Beta-blocking drugs lower both pulse rate and blood pressure and if the
investigator knows the pulse rate, blindness would be lost. One investigator
should measure blood pressure and pulse rate. The second investigator should
remain blind and be given measurements of blood pressure but not pulse rate.
He would then be responsible for prescribing increased doses of the drugs and
assessing any side effects of treatment.

8.9.4.3 Hard cud points

When masking is not possible, the analysis may be confined to so-called hard
measurements. A hard end point is one that is not open to biased ascertain­
ment; the best example is the fact of death. When survival is to be determined
blindness is not necessary. However, although the fact of death may be incon-

8.9.5.1 Fall in patient recruitment
When patients arc informed that they and the investigator arc unaware of the
treatment they arc to receive, some may be dissuaded from taking part in the
trial. Similarly, they may dislike the idea of taking placebo tablets. In my
experience this has only rarely been a problem.
8.9.5.2 Failure of investigators to take part in the trial
Potential investigators in multicentrc trials may be unwilling to take part if a
protocol is double-blind, although there is no data to substantiate this claim.

8.9.5.3 Provision of identical control treatment

I

Making tablets that are identical in appearance, touch, and taste may be
difficult. The problem may be simplified by making identical capsules to con­
tain the active drug or the placebo. However, if an active pharmaceutical
compound is dispensed in a capsule, this formulation will differ from that of
any tablets normally available and may affect the bioavailability of the drug.
If a control tablet is manufactured of an identical size, colour, and shape it
may still smell and taste differently or differ in some other way from the
experimental treatment. There is a story of a patient who reported that he
knew his tablets had been changed at the last visit to the clinic as the new
tablets were difficult to flush down the toilet! It is to be hoped that most tablets
will not be disposed of in this or any similar manner.
It may prove possible to make an acceptable placebo for an active treatment
but not to make two active treatments identical. In this instance, the patients
can be asked to take two sets of tablets throughout the trial: one representing
treatment A (active or placebo) and one B (active or placebo). This procedure
has been called the double-dummy technique and may be useful in a cross-over

trial. In the first phase of the trial a patient takes one active drug and the
placebo corresponding to the second drug, and in the second phase the active
second drug is taken together with a placebo copy of the first drug.
8.9.5.4 The patients have to take more tablets or capsides

In a drug trial, taking a placebo always increases the number of tablets or
capsules to be consumed. This is acceptable in order to make an unbiased
comparison of an active treatment with an inactive treatment, rather than a test
of an active treatment against no treatment at all. However, extra tablets may
be required to preserve blindness when a placebo is not required, as with the
double-dummy technique discussed previously.

8.9.5.5 Expense
When identical tablets or capsules arc provided, it is expensive to manufacture,
label, and distribute them. Dispensing may also be more complex in a doublcblind drug trial.

8.9.5.6 Errors in labelling
In the labelling and distribution of identical tablets, great care must be cxcrciscd. The containers arc only identified by code numbers or letters and the
labels have to be checked and all stages care fully documented. I know of two
trials where errors of labelling have occurred so that active treatment has been
given instead of placebo and vice versa. Fortunately the errors were limited to
only one or two patients. In one trial the errors were discovered when patients
were checked for compliance: a patient on placebo had active drug metabolites
in his scrum when he had had no access to any active drug. In the second trial a
patient on large doses of placebo suddenly received large doses of an active
antihypertensive drug. The resulting side effects revealed the error.

8.9.5.7 Difficulties in breaking the treatment code tvhen necessary
As discussed earlier, the code must be available for consultation in an emer­
gency; it should be held in scaled envelopes, one envelope for each patient, and
accessible 24 hours a day. The codes can only be held centrally if a coordinat­
ing office is manned day and night and otherwise must be held at a convenient
place (for example, in the hospital pharmacy). The subjects in the trial should
be given details of whom to contact in an emergency and these persons must
know the location of the codes.
8.9.5.8 The provision of extra personnel
The need for two investigators to ensure double-blindness (see section 8.9.4)
may be both difficult and expensive.

8.9.6 The consequences of a breakdown in double-blindness
The breakdown of double-blindness may affect the observations made in a
trial and have serious consequences. In one trial of vitamin C aeainst placebo

the active compound reduced the frequency, duration, and severity of the
common cold |102|. However, this result was contrary to the results in other
trials and the subjects were asked which tablets they thought they were taking.
The subjects guessed correctly more frequently than they should have done by
chance alone. This was a worrying finding, as those who knew they were on
placebo may have reported more colds and those who realised they were on
active treatment, less colds. Alternatively, a positive answer could have been
masked if those on vitamin C, as a consequence of this knowledge, had not
eaten as much fresh fruit and vegetables as those who thought they were on
placebo. As the trial was to determine whether or not the vitamin had a
nonplaccbo effect on the incidence of colds, the wrong answer may have been
obtained.
It is good practice to ask each patient about the treatment they guess they arc
taking. In the National Diet-Heart Study [100|, participants were asked to
purchase special dietary foods with different fat contents and the trial was
conducted in a double-blind manner. The participants were asked about the
amount of fats in their diets and 43 percent in each dietary group considered
that their diet included a large reduction in total fat. There had been no loss of
double-blindness and dropout rates were independent of diet as were the pro­
portions volunteering for further study. Similarly, the doctors and nutri­
tionists involved in the trial were asked to specify the diet that had been
assigned to the individual patients. The correspondence between the actual
diets and their guesses was no better than would be expected by chance.
A consequence of the breakdown of blindness has been examined in a trial
where the investigators were apparently able to identify the active treatment.
Heaton-Ward [103] performed a double-blind trial to assess the effect of a
monoamine oxidase inhibitor (Niamid) on the activity and behavior of mongol patients. The observers were told that the trial was of a cross-over design
but at the time of cross-over the same treatment was continued. The observers
reported an initial improvement in the actively treated group but not in the
placebo-treated group. However, after the supposed cross-over, they reported
a deterioration in those who had initially improved and an improvement in
those they first imagined had not improved. The objectivity of the observa­
tions appeared to be in some doubt. Abraham [104] termed this failure of
blinding and its result the Eieatoii-lVard effect, after the author of this trial. The
trial was concerned with activity and behaviour and in this field subjective
impressions bedevil the interpretation of results (sec section 16.6). A Medical
Research Council Trial into the effects of antidepressant drugs [105] showed
that a breakdown of blindness due to side effects led to differences between
treatments being observed in subjective assessments. The effects of treatment
were not confirmed by a more objective reduction in the length of stay in
hospital.
Double-blinding is very important, and Beecher commented, “. . . there is
evidence in surgery as in other fields, that the enthusiast actually gets results
which arc better than those of the sceptic” [38]. In a double-blind trial both the

Hov

ure ’

rest

tree

subjects and investigators should be interviewed to determine whether or not

known treatment confers definite benefit in terms of reduction in mortal or
morbid events. In hypertension, the use of a placebo is particularly important
as the control group may exhibit a reduction in blood pressure owing to the
reassurance offered by supposedly active treatment. However, a specific
placebo effect in reducing blood pressure has not been proven and the reduc­
tion in pressure may reflect the process of familiarisation with medical atten­
dants and clinical surroundings.

masking has been preserved.
8.10 THE USE OF PLACEBOS

When effective treatment is available this is usually employed m the control
group. However, in the short term, or when no effective treatment has been
discovered, placebos should be employed as control treatment whenever pos-

siblc (section 3.8).

8.10.2.3 Other uses of placebos

8.10.1 Why should placebos be used?

In the chronic diseases discussed previously, placebo treatment is often em­
ployed in the short term to determine a baseline level of blood pressure or
blood sugar even when the patient is known to require active treatment in the
long term. A frequent example of the use of a placebo is in young patients with
moderate or severe hypertension. The investigators know that active treat­
ment must be given eventually, but the treatment under trial is to be compared
with a baseline untreated period. Taking a placebo ensures that the control
period is identical with the period of active treatment. Two strategies partially
resolve the ethical problems: the length of the placebo treatment is limited to a
short period of 3-8 weeks, and the placebo treatment is stopped if blood
measurements exceed an arbitrarily defined level (see section 3.8).

Bradford Hill stated, when discussing the clinical trial, “To some patients a
specific drug is given, to others it is not. The progress and prognosis of these
patients arc then compared. But in making this comparison in relation to the
treatment the fundamental assumption is made—and must be made that the
two groups are equivalent in all respects relevant to their progress, except for
the difference in treatment’’ [1]. As discussed in sections 8.4 and 8.5, the use of
placebo treatment in the control group ensures that any difference between the
actively treated and the control group is due to the active constituent employed
in the trial and is not a nonspecific effect of giving any treatment.

8.10.2 When should placebos be used?
Placebos should not be used as control treatment when there is definite evi­
dence that withholding available treatment may be detrimental to the patient’s
health. Other reasons have been advanced for not employing placebo treat­
ment: the increase in cost of the trial due to placebo materials, distribution,
coding, documentation, and the increase in the work load of the investigator
(section 8.9.5). However, when there is no treatment of proven worth, a
placebo-controlled trial is to be preferred to one using an untreated control
group. In some trials the use of a placebo is obligatory; in other trials it is
advisable but optional.

I
8.10.3 How placebos should be employed
8.10.3.1 The agreement of the subject must he obtained

I

I

|

8.10.2.1 Essential uses of placebo
Placebo treatment is essential in trials of anxiolytic, hypnotic, and antiinflam­
matory drugs. The use of a placebo regime allows the day-to-day variation of
subjective sensations such as pain to be measured together with any spontane­
ous improvement with time or as a nonspecific response to tablets.

8.10.2.2 Important uses of placebo
Placebos are often employed in chronic conditions such as hypertension and
diabetes mcllitus when available treatment is known to correct the pathophys­
iological abnormality but has not been shown to reduce mortality or morbid­
ity. Three examples where active treatment has not been proven to be of
re mild
benefit to the patient’s health arc
mi hypertension (diastolic pressure 90-99
in
the elderly, and mature-onset diabetes. The
mm Hg), benign hypertension
i

,

..........

..............

in

nr-

.no

I

The patient must agree to take either the placebo or the experimental treat­
ment. Fortunately, many patients arc prepared to serve as experimental sub­
jects and contribute to the common good. Patients presenting to a doctor arc
usually willing to adopt the advice they arc given and if the doctor suggests
taking part in a placebo-controlled clinical trial, the patients tend to accede to
the request. They may be unwilling to make an independent choice between
entering or not entering the trial. Such indecision might be analogous to the
situation where an airline pilot, with a faulty aeroplane, asks his passengers
whether he should go on or turn back. The doctor’s advice will be taken and
the trust of the subjects must not be abused.
All relevant information about the trial must be provided for the subject,
including the following:

1. The fact that a placebo is being used in the trial and the patient may receive
it.
2. Whether or not the trial tests a concept that could lead to a benefit for the
patient.
3. That any new treatment has been adequately tested in the laboratory and
may be an advance over existing therapy.
That
the probable risks involved from taking the new idrug or placebo arc
4.

The patient should be given a written copy of this information to facilitate
full comprehension and be asked to provide written consent to taking part in
the trial. Such consent does not reduce the investigator’s responsibility but
docs provide proof that the patient read about the trial and agreed to take part
on the basis of the information. It also" proves that the investigator discussed
the trial with the patient (section 3.7).
Bradford Hill questioned whether the patients should be told that they may
receive a placebo and he wrote, “Having made up your mind that you arc not
in any way subjecting either patient to a recognized and unjustifiable danger, pain
or discomfort, can anything be gained ethically by endeavouring to explain to
them your own state of ignorance and to describe the attempts you arc making
to remove it? . . . Once you have decided that either treatment for all you ktiou1
may be equally well exhibited to the patient’s benefit, and without detriment,
is there any real basis for seeking consent or refusal?” [30]. Many would
support this view for a trial of acute treatment in a stressful situation (for
example, on admission to a coronary care unit with acute myocardial infarc­
tion). Under less stressful conditions and with long-term treatment informed
consent should be obtained.
8.10.3.2 Single or double blind'’

The trial employing a placebo must be single or double-blind (section 8.9), and
the placebo treatment must be identical to the active treatment. If a tablet, it
should look, smell, feel, and taste the same.

8.10.3.3 Randomise (he order of treatment in a cross-over trial

A placebo run-in period is often employed in randomised trials to ensure that
only patients with certain characteristics enter the trial. An initial period on
placebo treatment allows patients to be excluded if they do not have, for
example, a persistent elevation in blood pressure, blood sugar, or scrum
cholesterol. Subjects who do not comply with the trial protocol may also be
identified during this period. However, when a period of placebo treatment is
to be compared with an interval on active treatment in the same patient, the
order of treatment must be varied, using random allocation. Often the placebo
treatment is given first and this error may not be immediately obvious when
the investigators are comparing two treatments with placebo. It is not accept­
able to report a placebo-controlled randomised double-blind cross-over trial of
two treatments when the placebo period always precedes the randomised part
of the trial. In a trial of two antihypertensive drugs the reader would assume
that the baseline blood pressure was accurately determined by a randomised
period of placebo treatment during the trial. As the average blood pressure
tends to fall throughout treatment, the effect of the drugs in lowering blood
pressure may be exaggerated. Baseline measurements should be obtained dou­
ble-blind during the trial with the two active treatments and the placebo
treatment civen first, second, and third with canal frcaucncv.

8.10.4 Disadvantages of the use of placebos
The disadvantages arising from the use of placebos overlap those given for
double-masking (section 8.9.5). In addition, the use of a placebo may disrupt
the doctor-patient relationship and produce medico-legal problems.

8.10.4.1 Disruption of the doctor-patient relationship

i

Bradford Hill stated, “The doctor will also wish to consider the doctor/patient
relationship. Harm may be done if the public comes to believe that doctors are
constantly using them as guinea-pigs. In exhibiting new treatments they arc, it
is my belief, doing that willy-nilly, but the public does not realize it. But they
need not go out of their way to make it obvious by an unnecessary use of
dummy pills” [30]. But what is an unnecessary use? The provision of an exact
control group by the use of placebos is often essential.

8.10.4.2 Medico-legal problems

r

It is possible to argue that if a drug has been shown to be beneficial in one
clinical trial against placebo, an investigator repeating the trial is knowingly
placing at risk any patient treated with placebo. If patients in the placebo group
suffer harm, the investigator may be sued for damages on the grounds of
negligence. Dollcry [86] stated, “Lawyers appear to have little time for the
contention that there is uncertainty about the efficacy when only one or
perhaps two trials have completed. They do not understand the concept of
statistical uncertainty and are accustomed to resolving doubts about factual
uncertainty in the courts.”
An investigator may also run into problems when he employs an experi­
mental treatment and there is no definite evidence that it will be successful. If
the experimental treatment subsequently proves of benefit in later trials, the
courts may examine the original trial and find in favour of subjects in the
placebo group who have suffered harm. Dollcry provides the example of trials
of active antihypertensive treatment in patients with an untreated diastolic
blood pressure of 90-105 mm Hg. This group has only mild hypertension and
the original trial, the Veterans Administration Co-operative Study on Antihy­
pertensive agents [37], found a 35 percent reduction in complications in this
group with active treatment; but this did not achieve statistical significance.
Moreover, the trial included a high proportion of patients who had already
suffered a complication of hypertension and patients were only included if the
high blood pressure was maintained during a hospital admission. For these and
other reasons further trials of antihypertensive therapy have been started for
mild diastolic hypertension in young and middle-aged patients [106]. One trial
has been completed and shown a benefit in patients with a diastolic pressure
over 100 mm Hg but not 90-99 mm Hg [107]. If all trials subsequently find a
benefit from active treatment when the initial diastolic pressures arc between
100 and 104 mm Hg, can such patients take legal action when they have
sustained a disabling stroke while on nlnrrbo in the nriaiml nr

If the courts find in favour of the patient, will they realise that they do this
with the assistance of hindsight? When the first trial was started, for all the
investigators knew the patients treated with placebo may have been the fortu­
nate ones as the result of such trials has often been in favour of the placebo
group. However, many would claim that patients who suffer from taking part
in randomised trials should be compensated (section 3.2). In this event the
compensation payable should not be excessive as pharmaceutical companies
and other organisations may be unwilling to support placebo-controlled trials.
Imagine the problem if trials of new treatments arc conducted only against a
supposedly active treatment, the active treatment being of no use. There could
be a proliferation of useless drugs all being as good as the active (useless)

avoid introducing bias, especially when considering subjects who withdraw
from the trial.
The method of conducting the trial is very important and double-masking
and placebo control should be introduced where possible. Both these tech­
niques can present difficulties in the form of complexity, expense, and even
litigation. If double-blinding, placebo control, and randomisation arc not em­
ployed the consequences arc likely to be more serious than any adverse results
from using them.

treatment.
8.10.5 How many positive trials are required against placebo?

How many trials of active versus placebo treatment have to show a positive
result before the results can be accepted and further use of placebo prohibited?
Only two trials of antihypertensive treatment were performed before it was
accepted that middle-aged men with a sustained diastolic pressure over 105
mm Hg should be treated and not exposed to long periods of placebo treat­
ment. However, when trial results are contradictory, as with the use of antico­
agulants following myocardial infarction, a large number of trials may have to
be performed. The International Anticoagulant Review Group examined nine
trials where proper control groups were employed and concluded that the
benefits from anticoagulants were of the order of a 20 percent reduction in
mortality in men and zero in women [52]. The medical profession has not
considered this to be a worthwhile gain in men and the treatment has not been
employed in most countries (see section 19.4). On the other hand, a more
recent trial showed a significant reduction in myocardial reinfarction following
prophylactic treatment with sulphinpyrazone [48]. A positive result may occur
by chance in up to 5 percent of trials (according to the level of significance
achieved in the trial) and it would appear reasonable to require two positive
trials before accepting any results as conclusive. A second trial, including
placebo treatment, may therefore be started in order to confirm the initial

finding.
8.11 CONCLUSIONS

The results of a trial may be expected to be free of bias when randomisation
has produced an equivalent control group. The control group must provide an
estimate of any placebo effect, an estimate of any time trend in the condition
being investigated and of the effect of regression to the mean.
Noncompliance with therapeutic advice may alter the response to treatment
but patient and observer bias may be removed by single- and double-blinding
respectively.
Great care must be taken in the analysis of the results of the trial in order to
I.

9. THE VARIABILITY OF RESULTS

Even when the results of a trial are free of bias, the measurements on which
they are based are still subject to random variability. Often the variability can
be both estimated from its repeatability and reduced by quality control. The
measurements may vary between subjects and this variability can be reduced
by employing the within subject cross-over trial design. It must also be re­
membered that a measurement may be very repeatable but either not provide
the correct result (low accuracy) or measure something other than that in­
tended (low validity).

9.1 REPEATABILITY
Repeatability is the level of agreement between replicate measurements. The
level of agreement must be determined for measurements made in the same
subject in order to exclude differences between persons. The variability may
result from observer error, machine error and, when repeatability is estimated
at widely spaced intervals, from alterations in the true measurement with the

time of estimation.

cragc result of these measurements and the true result should also be calculated
when the latter is known. The standard deviation of repeated measurements
has been termed the precision of the measurement [108], although when the
deviation is high it will be less precise. The difference of a result from an
expected value has been termed the accuracy of the measurement. The standard
deviation is often divided by the mean and expressed as a percentage known as
the coefficient of variation.
Obviously, it is important to make very repeatable measurements during
the course of a trial and great difficulties have arisen when this has not been the
case. The World Health Organisation and the Center for Disease Control,
Atlanta, have been involved in assessing the repeatability of scrum cholesterol
measurements around the world [108]. Cooper has reported these results and
found that, although the standard deviation of the various methods was satis­
factory, some methods gave high readings of cholesterol and poor accuracy.
The two methods causing the most problems were the direct method of cho­
lesterol estimation and the chloride method. It is worth emphasising that if a
laboratory changes its method of analysing cholesterol or any other substance
during the course of a trial, the2 measurements may be altered with serious
consequences. The new method may lack precision or differ in accuracy when
compared with the old method.
Figure 9-1 illustrates how a biochemical measurement on the blood can be
checked three times during a clinical trial. This assessment will be most impor­
tant in a long-term trial and, when a placebo group is employed, repeated
measurements in this group will not be affected by treatment. In the example
each subject has blood taken on three occasions and the blood is split into three
samples. Biochemical estimations on the samples give the results designated
X. The results for the first subject have a suffix 1 (Xj); his results for the first
occasion arc designated XH and results for the first sample on that occasion
Xin. The subjects arc numbered 1, 2, 3,
h in the first suffix which we
shall call i; occasions 1, 2, and 3 are given the second suffix; and samples 1, 2,
and 3 the third suffix. The average for a suffix in the figure is given by a point.’
The average result for all measurements is given by X__, the average for
subject 1, X\ ., and for occasion 1, X. j

9.1.1.1 The repeatability on one occasion
The within-specimcn variance for subject 1, occasion 1, is given by

V (Xn, - X,, )2
^3

r - 1

9.1.1 Repeatability of a continuously distributed variable
The repeatability of a continuously distributed variable is estimated by the
standard deviation of repeated measurements. The difference between the av-

where
= i,3 is an instruction summing the squared differences for r samples
numbered in this example 1 to 3 (s = sample).

b

the 5

Il) o

I

FIRST OBSERVER

OCCASION SPECIMEN TAKEN

2

1

Subject 1

X111

X112

Diagnosis

Diagnosis -

Diagnosis

a

b

Diagnosis

c

d

Split Sample

Split Sample

Split Sample

Average

3

2

1

3

1

2

3

1

2

3

X113

X121

X122

X123

X131

X132

X133

V.
SECOND
OBSERVER

Subject 2

X211

Subject 3

X311

X212

X312

X213

X221

X3..

X313

Figure 9-2. How the data may be presented to demonstrate the agreement between two
radiologists examining the same films; a, b, c, and d are the number of pairs where they agree or
disagree.

Subject n

-Xn11

x n..

Xn12

*1.

Overall mean

S.3.

X

Fieure 9-1 The assessment of repeatability during a clinical trialI (see text). The data provide information on the precision and accuracy of measurements on each occasion.

For all subjects and occasion 1 the precision of the measurement

average result for a standard machine with the average for the machine under
investigation.
Another method can be employed in a long-term trial to examine for a drift
in the results. For each measurement during the trial the difference is calculated
between this figure and the initial average results. The positive differences
should balance the negative ones, giving a cumulative sum of differences near
to zero. The cumulative sum of differences is calculated during the course of
the trial and plotted against time. A deviation from zero increasing with time
will indicate a change in accuracy. The graph has been termed a CUSUM plot
and has been employed in monitoring laboratory performance.
9.1.2 Repeatability of qualitative data

' I = 1 ,H

5=1,3

where the squared differences are summed over n subjects and three samples.
In the same manner the repeatability of the measurement should be es­

timated for occasions 2 and 3 and compared with the initial measurement o
repeatability. A worked example is given in Appendix 9.5 to compare the
blood pressure measurements on two prototype samples of a new kind of
sphygmomanometer with one mercury sphygmomanometer. The prototypes
did not prove to be acceptable.
i.t can be gauged by comparing the mean for
The accuracy of the measurement
occasion 1 (X. i.) with the means for
f- the other two occasions or, similarly, the

Qualitative data arc not continously distributed and include diagnoses, symp­
toms, and clinical states given a discrete value (for example, zero when the
condition is absent and one when present). Repeatability of qualitative data is
most easily measured by comparing two estimates (for example, when radiol­
ogists report a diagnosis). To estimate bctwccn-obscrvcr repeatability two
radiologists may be asked to examine the same radiographs and report their
opinions. If one observer is given the same films to examine on two occasions
then within-observer repeatability can be estimated.
Figure 9-2 illustrates how the data may be presented, the columns giving
the opinions of the first observer and the rows the results from the second
observer. The letter “a” represents the number of positive responses in both

index (/?) was 80 percent with a 20 percent prevalence and 98 percent with a
two percent prevalence.
Two indices of repeatability have been published that
are independent of
prevalence:

FIRST ASSESSMENT

a)

TOTAL

20% prevalence

Index 1

s
E
C
0
N
D

ci + c

+

10

10

10

70

80

80

100

I
TOTAL
S
E
C
0
N
D

+

1

1

2

1

97

98

TOTAL 2

98

+ -±b + d

This index can be calculated as long as the sum of a row or a column is not zero
[109].

t

2% prevalence

d
c + d

20

Index 2=1- (be

TOTAL 20

b)

a
1/4 [—
^T + -------+
a + b

100 R=

98

100

= 98%

Figure 9-3. The measure of agreement, R = (a + d)/(a + b + c + d), is not independent of
prevalence being 80% in the first example and 98% in the second.

ad).

1 H1S InimniS alS° indcpcndcnt of Prevalence but cannot be calculated if any cell
is zero 11101.
Repeatability indices can be used to compare diagnoses made by different
Observers, the same observers on different occasions, or by different means
(for example, electrocardiographic evidence of myocardial infarction com­
pared with enzymatic data). The indices have been used to compare the re­
sponses of patients to self-administered questionnaires on two different occa­
sions. fable 9-1 provides some examples of the repeatability of symptoms in
5S normal subjects over a ten-month period as measured by index 2. Index 2
s lould be zero if there was no agreement between the responses to the self­
administered questionnaire and 1 if there was full agreement. The repeatability
measurements were all .95 or greater for single questions but were lower when
more than one question led to a conclusion. For example, the subjects were
asked if they had headaches and whether these headaches occurred on waking
T he repeatability index for the pair of questions eliciting the response, waking

Table 9-1. Repeatability of questions on
symptoms as determined
by two self-administered questionnaires, iten months apart.

assessments; “d” represents the number of negative responses on both occa­
sions; “b” assessments were negative with the first observer but positive with
the second observer; and “c” assessments were positive with the first observer
and negative with the second. The agreement has been measured by the
expression:

R = -

a + d

a + b + c + d

Symptom

Repeatability
Index

Single question
Faintness
Headaches
Blurred vision
Depression
Nocturia

0.95
0.98
0.99
0.95
1.00

Double questions

» However, this index varies with the prevalence of positive findings. Figure 93 shows the result of calculating R in a condition with a 20 percent prevalence
in both assessments and an equal division between the positive and negative
cases, and similarly the result when the prevalence was only two percent. The

Faintness on standing
Waking headaches
Index = 1 - (be -r ad). (Sec Text.)

0.94
0.87

I .

red that the single questions had an
Ur R7 However, it appon
cal symptoms is discussed further in
SS^^ity. Eliciting data

0 1 5 The effect of repeated assessments on the prevalence of a condition
9.1.5 The effect
P
of any conditlon is inversely propor-

I

Higgins’ Law states,
c P
tional to the number of^experts who
presence.” Let us assume that

ChaPter16'
bias and repeatability
9.1.J The relationship between b as
P
r
ablllty, replicating mea-

agreement is required to establish its
Imake a certain diagnosis for
w0 out of 1,000 subjects have

ob

I

it to be present. If one o scrvc'r
that 100 have angina but the latter
angina, a second observer may also g
initial and
100 may only include 75 of thc
“rcduclng the confirmed prevalence of

I

second diagnoses were not in agre

,

may eventu Uy glve a

may be reduced by repeated examinations.

9.1.6 Good repeatability does not .rnply valMity^^

what

The validity of a measUre^ntxa^Te 3 measurement of blood sugar may be
is supposed to measure. F
P
coefficient of variation for a
examined, a blood samp e sp
sat-sfactory results. However, if random
single sample computed wi
7
meUitus> within-sample re­
blood sugars are being used to d ag wcen_occas!on repeatability may reveal

5XSXXX « »
,.,.4 Th. .<r~.

peatability may be satisfac ory
considerable variation^ Moreov
good indicator of diabetes melhtus.
being high after a carbohydrate m
ment may have a high repeatabd

”‘”r

“X*“L"Xo

.. ••..oh Pecu,',

Z

random blood sugar may not be a
accOrding to the recent diet,
Y
measureas a dlagnostlc
th a high validity and a high

a nX8"”/.«

"d

9.2 QUALITY CONTROL DURING A TRIAL

lnstructions in the

In order for a trial to be ^^^tTfiXtrained^andthe progress of the trial
^ZeT mde^rievutions from the trial protocol and other errors.

................. ..
lth starting lower blood
d occasion. Therefore, if
lower pressures on the se
secon(
of their high blood
pressures than the mean> ei
ients on the basis
yOu select a group of hyper e
P^^
on the second occasion
fall in presnressures, their measurements
phenomenon. This
1 a
...
:
the
effect
„d te "1 'T ' ’
.he control group -» P^’
control group and the Prov-Rcgression to the mean can also
also be limited
being ascribed to active treatment.
g ........
rcDcatcdly high high pressure,
■ With'a'repeatedly
"only including patients
soi^r tendency to drop
drop to lower
to some extent by
■sumably having a sma
such patients pre’
pressures.

P...OCO, 1S d„„s.o » JX n,
.t.k= »“'• •
protocol and also in the final report.
Attention to detail is so '^P^

b

»X
m“ 'pp“r ”

^al should usually be conducted
tended in the light of experience

for a pilot period when the protoco ca
(section 14.1). Ferns and Ederer [112] have

q1 vlolatlons ln

ft

Table 9-3. Protocol violations observed during a trial involving the measurement of visual
acuity before and after refraction, and how the violations were dealt with [ 112|

Table 9-2. The detail that may be required in a
protocol for a trial where blood pressure is measured________
Machine
Cuff size
Cuff-deflation rate
Circumstances
Position of patient
Relaxation of patient
Time of day

Replication of measurements
Measurement of diastolic pressure
Accuracy

Standard sphygmomanometcr/random zero, etc.
State size of inflatable portion
2 mm/second?
Measurement to be made in the doctor s office,
laboratory, at home?
Lying, standing, sitting?
Resting for 10, 5, 2 minutes?
Morning, afternoon, evening?
Certain number of hours after treatment?
If replicated, which reading(s) are recorded?
Point of muffling or disappearance of sound?
Reading to the nearest 2 mm Hg?

(

Result

1. Visual-acuity lane 20 feet
long.

Distance reduced: difficult
for clinic to provide
distance.
One clinic provided rear
illumination.
a. These lamps not used
(e.g., giving only 10% of
specified lighting).
b. Bulbs aged and illumina­
tion reduced.
c. Uneven illumination.
Charts issued had only one
20/200 letter. (Patients could
not get this line wrong!)

Protocol changed: 10-foot
lane.

a. Not done by trained
personnel.
b. Not done by blind
observer.
c. Measurement rushed and
protocol ignored (or
unknown).

Examiners had to be trained
and receive certification of
training.

2. Front illumination.
3. Illumination by tungsten
spotlights.

I

i

4. Visual acuity defined as
that line on which patient
gets not more than one
letter wrong.
5. Refraction.

(

i

Clinic changed practice.

Clinics changed practice.

Specially developed visual­
acuity boxes issued.

New charts issued with four
20/200 letters.

Technicians trained to take
over this measurement from
ophthalmologists. Repli­
cations done by site visitor.

Korotkoff sound is the point of muffling of sounds (when the sounds stop
having a tapping character) and the fifth Korotkoff sound is when the sound
disappears. Observers can be trained to recognise these sounds by listening to
them while observing a film of a mercury column falling. The observers thus
gain experience and are then tested until they make the simulated pressure
measurements consistently close to a standard measurement. For real measure­
ments of pressure a stethoscope can be adapted so that the trainee and the
trainer can listen to the sounds at the same time and the training session can
continue until consistent agreement has been reached.

9.2.2.2 Adherence to the protocol

9.2.2 Training of observers
Observers must be trained to achieve both repeatable and accurate results. An
observer who makes repeatable measurements can still make the measure­
ments consistently too high or too low. An example of the necessity for
training is given by the measurement of blood pressure.

9.2.2.1 Learning the technique
The observer has to be taught which machine to use, how to apply the cuff, as
well as the requirements listed in table 9-2. Assuming that the observer is not
deaf there arc still difficulties in detecting the diastolic sounds. The fourth

Violation

i

i

ophthalmological studies, and table 9-3 reports some of these violations ob­
served during a pilot trial where visual acuity was to be measured. Certain
violations led to the clinic changing its practice, some to the protocol being
changed and others to the development of special equipment to overcome the
difficulties. Of particular interest was the fourth protocol violation, where the
patients had to read a line of letters and were judged to be able to do so if they
only got one letter incorrect. For the line 20/200, the charts included only
one letter and in theory, all patients should have been judged capable of read­
ing that line. New charts were constructed with at least four letters per line.
The violation 5.c was also of great interest. A standardised protocol for
refracting patients and obtaining the visual acuity had been developed but the
techniques were rarely followed. The greatest variability occurred when oph­
thalmologists were doing the examinations “A few . . . clearly did not know
the study protocol at all.” In many clinics technicians were trained to take over
the measurements and they adhered strictly to the protocol. The authors sug­
gested that the protocol way was the only way the technicians could make the
measurements and therefore they used this method and got reproducible re­
sults. The training of observers is of great importance.

Protocol requirement

I

The observer must be tested for adherence to the protocol. If the protocol
requires blood pressure measurements to the nearest 2 mm Hg then the end
digits 0, 2, 4, 6, and 8 should be recorded with equal frequency. When moni­
toring the quality of blood pressure measurement, a preference for an end zero
is often observed and indicates failure to adhere to the protocol. However,
digit preference is much less important than an overreading or underreading
due to observer bias. Two machines, used correctly, should prevent this bias. A
random zero sphygmomanometer is available where pressure is recorded and
then the zero point determined and later subtracted [113]. Digit preference is

vat

9.2.4.1 Qimliry ^ntrol of entry enteno

ronflrm that the patients are truly

of thc inltlal reCords the

trials by experienced ^est.gators nd y

9.2.4.2 Quality control d,<rmX the course
that thc protocol
The documents should be des'gnedperccnt of thc
is adhered to; for example, >f a
conipliance then the investrgator

I

?' "A b noX “« p..-« 'T *' °b”; A ,"„iremro« b'
7X^-.o»™»o.s<b.d»pod.f
conducted according to a speeir

p

necessary.
9.2.3 Should the same or

observes b^;--akesmCasu nents

expected number is necessary
whether or not noncomphance
should record the number of^tablets ra
thcm bu[ as
was present. Parients often forget to
and he will be tempted
thc investigator that they ha
objective evidence required m the pro-

„ „o<d run

"““Xtau blood f» b' “k“

tocol. Sim.brly ■ P<”»“'"“J
the last to be rccordc .
investigator should be as c o
The documents should be
information must be rcpcatc

couoo..-;
1

’„ lh„t ldh„c.ce ,o rhe proroool. <he

rorj an three measurements.
consistency and errors. Certain
t0 ensurc that they relate to thc
V
diagnosis, measurement, failure to

9.2.4.J Errors in diagnosis
correct. Kahn
lt cannot be assumed that all ^X^S-g examp.s of diagnostic
and his colleagues [115] P^Judcd that ^re training, more tightly written

observers.
9.2.4 Checking of trial documents
feign Vfformsto nnmmte'error^i in com

- ^TbiXt- b"

!

dauarc to

be
icctcd.
are

" rxnx.

i to

do so

SUre;SSy the forms should be designed

of objcctive measurements are all re-

difficulty in glaucoma and

complction. The

St "i” ”p»» •"« *'«™ ”

I

I

and the latter (systematic error
Before commencing a tual, other
one treatment g-up more ‘han no

knowled
of thc
than a small pilot trial, the mves g
inforrnation may be necesvariability of the mcasurcments
requircd for a particular trial and will be
sary for the calculate of the; numb
9
F blc range of results for

diff,cult to complete.

the protocol is adhered to during

°f

m

9.2.4 4 Error in measurement
svstematic. Both are important,
Errors Of measurement may ^ ^^^^XcTrfalse msult if n occurs m

i

»a««

“'thr“

re,’„„.a fo. qu,b,( «..»»> >iu„"b

“""“i

‘ d“8"°“

horded and not simply the diagnos.s.

ncb. 0...™ —»■
arc concerns

1

"

“,1

-

trial when the scrum creatinine exceeds a certain concentration, then the docu­
ments must request the results of serum creatinine determinations. When
patients are withdrawn from a trial for reasons outside the protocol this occur­
rence must be detected during the quality control procedures and the investi­
gator interviewed to ensure that the protocol has been understood.

Patients' answers

Yes (have been
admitted with
heart attack.)

No (have not been
admitted with
heart attack)

9.2.4.6 Clerical errors

Known to have been
admitted with a
heart attack.

a

b

Known to have been
admitted with a
different diagnosis.

c

d
I

Figure 9-4. The validity of self-administered questionnaire to determine whether or not a pa­
tient’s recent hospital admission had been due to a heart attack. The letters a, b, c, and d are
numbers of patients.

Clerical errors arc mainly those of transferring information and four have been
defined [116]: person to document; instrument to document; document to
punch cards; and computer output to report. We can add another transfer
error, typing mistakes in any of the draft or final reports.
The solution to these errors is repeated checking. Two persons should be
responsible for the completion of all documents and punching of cards must be
verified by rcpunching. In the latter process the card is rejected by a verifier
when the original holes in the computer card do not coincide with the second
keyboard input. Despite all care, errors still occur and all data have to receive
two examinations: range and consistency checks. Range checks ensure that the
results lie within the range of possible answers and consistency checks examine
the data for internal consistency. Examples of consistency checks are given by
the following: a male patient who is pregnant; a 50 kg person losing 20 kg in
weight; a woman complaining of impotence; and a patient’s aging by more
than one year over a one-year period.

9.2.5 The duties of a trial monitor
of the data. Figure 9-4 provides a theoretical example in which the validity of a
question, “Have you been admitted to hospital with a heart attack during the
last six months?” is examined. The question was asked of a group who had
been admitted with myocardial infarction and a group that had been ad­
mitted with a variety of other medical conditions. Their answers are entered in
the columns of the figure and cross-tabulated according to the known truth. If
the total number of patients was n then the proportion of correct responses was
(d + d)/n. Similarly the false negative rate can be calculated as b/(a + b) and a false
positive rate as c/(c + d). Such a tabulation could be used to examine the validity
of other data. For example, the vertical columns could be a normal or high
random blood sugar and the horizontal rows the presence or absence of diabe­
tes mellitus as determined by a glucose tolerance test. Similarly, electrocar­
diographic abnormalities, or serum enzyme changes, may be examined sepa­
rately as tests for acute myocardial infarction provided a definitive diagnosis or
gold standard can be provided from, say, coronary angiography.

9,2.4.5 Failure to withdraw patients according to the protocol
The criteria for withdrawing a patient must be stated in detail in the protocol
and the documents completed on withdrawal must include the exact reasons to
ensure that the protocol is adhered to. If a patient has to be withdrawn from a

It is essential that all multiccntrc trials appoint a monitor to visit the various
centres and ensure that protocol adherence is satisfactory. It has been suggested
that patient can also be used as a monitor [86]. It is true that a patient who is
familiar with a trial protocol and expecting a prescription, investigation, ques­
tionnaire, or appointment on a particular occasion, may well remind the inves­
tigator when these procedures arc not arranged.
A monitor’s report on a trial may be necessary for the acceptance of the trial
results by an administrative body. Table 9-4 gives the duties of a monitor.
The monitor should interview the investigators; examine the equipment,
make unheralded visits during the progress of the trial, and examine the trial
documents. The monitor should ask to see the signed consent forms and the

Table 9-4. Duties of a trial monitor

1. To ascertain that the investigators understand and are familiar with the protocol.
2. To determine that the clinical laboratory or other location for the trial activity has the neces­
sary equipment — for example, materials for resuscitation when specified in the protocol.
3. To make unheralded visits to observe the trial in progress and check adherence to the
protocol.
4. To examine the trial documents to ensure that the protocol is adhered to.

detailed protocol. He must check the trial documents for completeness and

123
118
118

110
109
125

088

SD

119.7
2.9

114.7
9.0

102.7
12.7

Mean
Precision^

142.1
5.6

152.1

147.4
7.9

Patient 3

evidence of adherence to the protocol.

Mean

9.3 THE REDUCTION IN VARIABILITY ACHIEVED IN A CROSS-OVER TRIAL

In a cross-over trial a patient takes one treatment, then a further treatment or
treatments. The patient acts as his own control and the variability of the
response to the treatment is correspondingly reduced. In certain instances the
reduction in variability may allow a result to be achieved with a fraction of
the number of patients (section 10.8.4). The advantages and disadvantages of

110
110

I
Overall

^Standard Deviation (SD) =

cross-over trials are discussed in chapter 11.

V/
V

7.3

(168 - 157.3)2 + (157,3 - 153)2 + (157.3 - 151)2
3-1

= 9.3
9.4 CONCLUSIONS

This chapter has described how the precision of repeated measurements can be
estimated together with their accuracy in comparison with a known standard.
The relationship between bias, repeatability, and validity has also been dis­
cussed. The concept of quality control was introduced and the methods to
achieve fewer errors considered, a clear protocol and careful training of the
observers being the most important. The detection of errors and the duties of a

i

= 5.6

9.5.2 Conclusion
The results for three patients cannot be conclusive but the prototype machines
tended to have a higher precision or standard deviation values than the stan­
dard machine. Moreover, the accuracy of the new machines was not accept­
able as they read systolic pressure too high in two patients and too low in one.

trial monitor were reviewed.

APPENDIX 9.5
9.5.1 Calculation of the precision and accuracy of measurements

I

Measurements of systolic blood pressure taken with a mercury sphygmo­
manometer were compared with two prototype instruments. The first Latin
square illustrated in figure 11-3 (section 11.5) was utilised to determine the
order. The standard machine was A and the prototypes B and C. Three pa­

tients were randomised as patients 1, 2, and 3.
Machine
Patient 1

Mean
SD

Patient 2

Mean
SD

A

B

C

168
153
151

183

174

164
168
165

157.3
9.3t

175
7.5

165.7
2.1

149
149
150

172

169

164
164

178
174

149.3

166.7
4.6

173.7
4.5

0.6

168

9.32 + 0.62 + 2.93
3

4: Precision

r
4
i

Usually the investigator will be interested not only in whether the new
treatment is better than the control treatment by a given amount, but also in
whether the new and control treatments have a similar effect, or indeed if the
new treatment is worse than the control treatment by the defined amount.

I
10. HOW MANY SUBJECTS ARE REQUIRED FOR A TRIAL?

10.1.1 The traditional objective: to determine if the new
treatment is better, the same, or worse than control treatment
The traditional objective determines: (1) whether the effect (A) of a new treat­
ment, is better than the effect (B) of the control treatment by a given amount;
(2) whether the new effect A is comparable with B within certain limits; and
(3) whether the unexpected occurs and B is better than A by the given amount.
For example: Let the effect of a new drug in lowering diastolic blood pressure
be a fall in pressure of A mm Hg. Let the effect of control treatment be a fall in
pressure of B mm Hg. The investigator is interested in whether:

I
1. A — 13 > 8 mm Hg

10.1 THE MAJOR OBJECTIVE

8 mm Hg and

3. A - B

- 8 mm Hg

— 8 mm Hg

This is the traditional aim of a trial but Schwartz, Flamant, and Lcllouch
[117] have pointed out that there arc circumstances where the investigator is
only interested in (1) (he only wishes to determine when the new treatment is
to be preferred and is not interested if the new treatment is the same as or
worse than the old). In certain circumstances the investigator may be inter­
ested in (1) and (2) but not (3) (that is, he is not interested when the effect 13 is

The number of subjects required for a controlled trial depends on the major
objective defined for the trial; the level of significance that must be achieved if
the objective is reached; the confidence with which a negative result should be
reported if the objective is not reached; and the vanlability of the end-point
measurements.

The major objective will be the difference to be detected between the interven­
tion and control group. This objective is discussed in section 4.2 and may be in
absolute units (for example, a reduction in diastolic blood pressure of 8 mm
Hg) or a proportional reduction in an event (for instance, a 50 percent reduc­
tion in stroke mortality). The difference should be both biologically important
and capable of achievement. If the average antihypertensive drug can lower
diastolic blood pressure by 15 mm Hg when compared with placebo and
preliminary studies on a new drug reveal a similar effect, it would be sensible
to set a reduction of 15 mm Hg as the objective in a trial of this new drug and
unreasonable to expect a greater reduction. Even a reduction of 8 mm Hg
would be biologically important and would constitute an acceptable objective.
Similarly a reduction in stroke mortality of 50 percent appears high but has
been exceeded in a trial of antihypertensive treatment [37] and therefore may
be a reasonable objective.

2. A - 13

greater than A).

'I
t

10.1.2 The investigator is only interested in
whether the new treatment is to be preferred
The example given earlier can be expanded if we assume the investigator is
using an established drug, with effect B, in the treatment of hypertension and
he is familiar with the effects of the drug and experienced in its use. However,
he is willing to change to a new drug with effect A if A — B > 8 mm Hg. If
A - B is not greater than 8 mm Hg the investigator wishes to continue with
the established drug. The trial result must decide between two strategies:
change to the new drug, and do not change. The investigator must be satisfied
with this decision as the only outcome of the trial. The advantage will be a
substantial reduction in the number of patients required for the trial. If the
established treatment is better than the new treatment this fact may not be
apparent as the numbers in the trial will not be sufficient to differentiate
between the results A = B and B > A.

Huvv

:»uuj'

the numbers using a two-sided asymmetric test (sec section 10.7). The figure
finally indicates the equations that may be employed to calculate the numbers
required for the trial (section 10.5). The central lines of the flow chart indicate
the pathway that is applicable to most trials.

if a - B
DO YOU WISH TO
.
KNOW?

YES

—NO ONLY INTERESTED IF A-B OR A>B'

NO Only interested if
A>B.

10.2 THE STATISTICAL SIGNIFICANCE OF
THE DIFFERENCE TO BE DETECTED

1

: ! The probability that an observed difference is due to the vagaries of chance is
’ 1 measured by a significance test. If the test gives a result less than five percent or
one percent the observed difference would be expected to have happened by
chance with a frequency of only one in 20 and 100 times respectively. The five
percent or one percent levels arc arbitrarily cut-off levels at which we decide
that the results arc unlikely to have happened by chance and the result is
statistically significant. These levels must be chosen during the design of the
trial (so that the numbers required for the trial can be calculated) and constitute
the first type of error, type 1 or
error. More exactly, when the null hy­
pothesis is true (that is, the treatment effects A and 13 arc the same), the type 1
error is the probability of rejecting the null hypothesis when it is true. When
calculating the numbers required for a trial, we assume that the desired effect is
obtained and A — 13 - 5 at a level of significance p where 5 and p arc defined

ARE YOU
INTERESTED IN BOTH
A>B AND B>A?z*zX

YES

NO

IS IT EQUALLY
IMPORTANT IF
A>B AND B>A .

YES

See Section 10.7.2

See section 10.5.2

See gection 10.5.3

Asymmetric test for, a

Use standard equations
(4) and (6).

Perform a decision making
pragmatic trial

Flow chart to determine how to calculate number of subjects for a clinical trial. A
Figure 10-1.
_
is the effect of a new treatment and B the effect of control treatment.

10.1.3 The investigator is certain that the control
treatment cannot be better than the new treatment
When the investigator is only deciding whether to change treatment or not, he
should not be interested in whether the new treatment is the same as the
control treatment or worse. However, when the investigator is interested both
in whether A is greater than 13 and in whether A is similar to B, he must also
consider the possibility that the effect B is greater than A. If it is not possible
for B to be greater than A, then this concept can be ignored with a reduction in
the numbers required for the trial. In biological research, however, it is usually
possible for A to be greater than B or B to be greater than A. An investigator
using the new blood-prcssurc-lowcring drug would certainly wish to report
the fact that the new drug was worse than the old. He should not calculate the
numbers required for the trial without taking account of this possibility.
Figure 10-1 provides a flow chart that may help an investigator decide how
to calculate the numbers for a trial. The investigator must decide whether he
wishes to make a yes/no decision or to quantify the differences between the
treatment effects, A and B. If A is the treatment under trial and 13 the control
treatment, the trial organiser must decide whether he is interested in the possi­
bility that B > A and if he is, whether he is equally interested if B is > A as he
is in the possibility that A > B. If he is not equally interested he may calculate

1

I

in the major objective.
Figure 10-2 provides a plot representing the frequency of all possible results
when the null hypothesis is true and A = 13. The height of the graph repre­
sents the
the probability
probability ot
of a result, A - 13 = 0 being the most likely, and the
area under the graph is set equal to 100 percent. The right-hand and left-hand
shaded areas represent the probability that a result will be obtained equal to or
greater than A - 13 = 5 and 13 - A = 8 respectively. The numbers for the
trial arc calculated so that, ifp must equal five percent and 13 cannot be greater
than A, the right-hand shaded area should represent five percent of the total
area. I lowever, in biological experimentation, 13 can be greater than A and to
achieve a two-tailed probability of five percent, each shaded area must equal
two and one-half percent as in the figure.
10.3 THE POWER OF THE TRIAL

A negative result for a trial may be difficult to interpret. We must ask whether
the number of subjects in the trial is sufficient to demonstrate an important
difference between treatments should that difference truly be present. In many
instances the numbers of subjects in a trial arc so few that a negative conclusion
is almost inevitable. The confidence with which we report a negative result is
known as the power of the significance test employed. Let us assume that 20
diabetic patients arc treated for three months with an active antiplatelet drug
and another 20 with placebo for the same period. It is likely that we will not
observe a coronary event in cither group and if we conclude that treatment is

A-B = 8

A-B = 0

a

A-B = 0

6

R

0

A increases

4
A-B

6

B increases

A-B = 8

0

A-B = 0 A-B = (5

b

A increases
B

>

>

increases

Figure 10-2. The graph represents the expected frequency of results when the effect of one
treatment (A) equals the effect of a second treatment (B); that is, the null hypothesis is true. The
shaded areas each equal two and one-half percent of the total area under the curve and a result
greater than A - B = 8 or less than A - B = -8 will differ from the expected zero result at
the five percent level of significance.

not required for the prevention of ischaemic heart disease it would be obvious
that our conclusion is lacking in power.
| The error in concluding that a given difference is not present when in reality
I ; it is, is known as the type 'll or p error. The power of trial is the probability of
I avoiding p error or 1 - p. In figure 10-3a two frequency distributions arc
J drawn: one for the null hypothesis A - B = 0 on the left and one on the right
for A — B = 5. The numbers for the trial must be calculated so that the
probability of a given result appearing compatible with the null hypothesis is
small when a true difference of the specified size exists. In the figure the result
R lies just within the results expected with the null hypothesis. In this example
the numbers for the trial arc sufficient so that the distribution for the main
objective A - B = 5 does not overlap the distribution of results for the null
hypothesis to a great extent. The lined area equals 10 percent and represents
the type II error or the probability of a false negative result. The power of the
trial is said to be 90 percent. In figure 10—3b only small numbers of patients arc
entered in the trial and the two distributions overlap each other to a large
extent. Again the lined area represents the type II error and is 80 percent, the
power of the trial being only 20 percent. For those not familiar with frequency
distributions figures 10-2, 10-3a, and 10-3b arc represented in figure 10-4 as

0

6

R

A increases
4

>

B increases

i

Figure 10-3. Two frequency distributions are represented when A - B is truly equal to zero
and when A-B actually equals some difference to be detected. 8. Figure a represents the
situation in which there are sufficient numbers of subjects in the trial to ensure that a result R
(just compatible with the null hypothesis) has a type 11 error of 10 percent. [J type I error. [3
type II error. Figure b represents the situation in which there are so few patients in the trial
that a result R (just compatible with the null hypothesis) has a type II error of 80 percent.
type I error. Q type II error.

10-4a, 10-4b, 10-4c respectively. The area enclosed by each circle equals 100
percent but equal distances around the circles do not correspond to equal
intervals of A - B. The first circle (10-4a) represents the results when the null
hypothesis is true and the dotted areas represent the two-tailed type I error of
five percent. Figures 10-4b and 10-4c represent the results when in truth
A - B = 5. Figure 10-4b represents the situation when the numbers for a
trial arc adequate and power is equal to 90 percent and figure 10-4c gives the
corresponding representation when the power of a trial is inadequate and
equals only 20 percent.

lit

(0)

o'
o
< c.E y>

A
Increoses

Increoses

I

m m
r-i m

ci o
r- o'

c-l O
O'

1

-S ~ d?

E

§

I

>—1 o —
-C

1

"O

rt

I

o

S? 8 S
.E ’3erm
c £
b 2 jo
- n o

ft-B =6

(b)

Increoses

3?

2 n- «
E 3 o
j s$?
3 2 >2-^8
Z m>2

A
increoses

88
m

88
— o

£8

s

m

§8
C! CO

'o

tn

i

POWER

« <£
•— E '7| c X

1

5 2fJ 1 ? rn tn

ci r-

6
A
increoses

B
Increoses

=1
1-

§

A-B =6

(C)

rj

£

cL?

- -71- «

■S5

V

-2 1 -O I

II

£ 5 JU

I

--Z1

'C SO
m


IZl

O' m
c- -t

o m
m —

— m
m m

— in
m ci

28

o
o
-r sc

?8

It

S'

POWER

5
E
rt

Figure 10-4. Schematic representation of type 1 error 0 and type II error
for a trial to detect
a difference between two treatments A and B. Figure 10-4a. The null hypothesis is true. A
= 0 and the probability of two-tailed type I error is given by the dotted area (five percent).
Figure 10-4b. The null hypothesis is not true and A - B = 5. The trial numbers arc sufficient
to detect this difference with a power of 90 percent.
,,nwcr is
Figure 10-4c. The null hypothesis is not true but the trial numbers arc too small and power is

C -t3

fa
>
I

5

5

Q.

I

only equal to 20 percent.

>

c<
(Z) o —

1

10.3.1 What power do we require in a trial?
In a trial designed to make the decision between two treatments and not to
determine whether the treatments effects> arc similar, a negative result will not
be reported and therefore the power of
c. the
.. trial need not be considered as the
treatment
” or ‘‘use the other treatment.”
trial result will only be “use one
lalysis
of
the
standard trial we shall have to
However, with the design and an;
decide on the power we require.
Table 10-1 gives the results of antihypertensive treatment in four placebocontrolled trials. The first four rows give the effect of treatment on stroke
incidence. In three trials in male patients treatment reduced stroke incidence
and this was statistically significant in two trials. There was no effect of treat-

d d

II

V II

II

V

e. e.

Q. 0.

II

d d
II ll

c. e.

Q, Q.

Il

II

•^-2
■sge
3 ,n

2- e
§ «

E

rt

I

go-

C

o d

5

3 o'2?
an —

<

31

'c S
oc

rn

Oo 2?
£n —

m o

m O'
CC rn

rn cn

m ci

— VC
Cl Cl

— 'C
Cl —

ir-2

rt
I

<2

S'B

-L^

11

V

III

tJ

H

w <
W

H

T

□ ih-ii
£
m

IIp

§ s

§ §

*2

H

*

■? £

> > I

J?

<O
QH
cd O
< cd
u<

cH
e
r r s

I

< < “

z1

*.2

cC c2 2

g

n

rt
Ofe " " |slg8S£
O V
> > I:
o

ment in women. One trial [98] did not observe a stroke event in either the
actively treated or the control group; this trial was excluded from this section.
The final six rows of the table consider the effects of treatment on the
incidence of myocardial infarction. There was not a significant reduction in
myocardial infarction in any single trial, but in males the reductions were 100
percent, 10 percent, 100 percent, and 100 percent respectively. The fourth
column of the table gives the difference between the treatment and control
groups which would have been statistically significant at the five percent level.
The fifth column gives the power of the trial, given that the actually observed
difference was significant at the five percent level (calculated as in section
10.6). The sixth column gives the numbers in the trial that would be needed to
prove with 90 percent power that the observed difference was significant at the
five percent level and the last column gives the actual numbers in the trials.
The following points may be made:

10.4 VARIABILITY OF THE RESULTS

The number of subjects required for a trial will depend on the variability of the
end points being measured. Either quantitative or qualitative measurements
may have to be considered and their variability calculated.

10.4.1 Quantitative data

When the trial is designed to detect changes in quantitative information, such
as weight, blood sugar, or blood pressure, the variance of the measurement
must be calculated. In a bctwecn-patient trial (two separate groups), the vari­
ance of the observation itself has usually been determined in preliminary stud­
ies and is calculated from the following formula:
Variance

1. In the three trials showing a large effect of treatment on the incidence of
myocardial infarction (47, 118, 98), the numbers entered into the trial were
only 45 percent, eight percent, and nine percent of the numbers required.
2. The power of these three trials was 42 percent, 17 percent, and 17 percent
respectively.
3. The power of the trials that successfully demonstrated a reduction in stroke
incidence (37, 118) was 75 percent and 55 percent respectively.

Variance of change

10.3.2 Confidence limits rather than power
should be reported for negative results

I

10.3.3 A clear demonstration of the power of trials

Rose insisted that confidence limits should be reported for the results of any
negative trial [122] and had in mind the use of 95 percent confidence limits,
corresponding to the usual five percent test of significance. Baber and Lewis
took up his point and, calculating less stringent 90 percent confidence limits,
they showed that in 18 trials of the use of beta-blockers following myocardial
infarction, the confidence limits encompassed a 50 percent reduction in mor­
tality in 14 and an increased mortality (of any degree) in 16 (123).

(10.1)

Where x is one of a number (tia) of available and normally distributed
observations (one per subject), x is the mean observation and S(x — x)2
denotes the sum of the squared differences between individual observations
and the mean. Occasionally this end point of a trial may be an average of more
than one measurement on an individual. For example, if three measurements
of weight arc averaged as an end point for a trial, the averages will be entered
as the x’s and the number,
will equal the number of subjects.
For the within-patient cross-over trial or the change from baseline in a
bctwccn-subjcct study, the variance of the change in the measurement must be
calculated:

These calculations help us to get a feel for the power term that we might
include in the calculation of the numbers required for a trial. Unlike the level
of significance (type I error), there is no time-honoured tradition stating what
is acceptable for type II or £ error. A low-power term of 50 percent has been
employed [119] as has a high level of 95 percent [120, 121].

The confidence limits for the result of a trial give the range of figures that arc
compatible with the observed result for a given probability. Section 15.6
shows how to calculate these limits as these arc more easily understood than
power terms.

= X (x - x)2
1

1

X (</ - <?)2
h.,

(10.2)

- 1

Where the notation is as for equation (10.1) except that d and d refer to the
difference between two observations and not the observations themselves. The
variance of a change may be considerably less than the variance of the original
observations, thus reducing the numbers required for a cross-over trial (sec­
tion 10.8.4).

10.4.2 Qualitative or discrete data

I

Qualitative data include measurements such as the proportion of patients who
die, improve, or relapse. The variance of such data depends on the proportion
expected in the control and treatment groups. With quantitative data the vari­
ance of data in the control and treated groups is expected to be the same, but in
qualitative data the variance will depend on the expected values.
Variance in control group

= P, (1 - PQ
ti

(10.3)

where Pi is the proportion with the end point in the control group and n the
number of subjects that will be required in the control group.
Similarly:
Variance in the intervention group

10.5.1.2 Type II error, P (power = 1 - $)

In figure 10-3a if a just-significant result R or less is arranged to occupy ten
percent of the area under the right-hand curve (power 90 percent), then the
corresponding standardised normal deviate (distance OR) will be 1.28.

P2(l ~ P2)
H

10.5.1.3 Type III error, y
A type HI error occurs when we conclude that the truly better treatment is
actually the worse treatment. In the explanatory trial type III error is vanish-

where P2 is the proportion with the end point in the intervention group and
the same number of subjects (n) is assumed. To calculate the numbers for a
trial with a qualitative end point the expected proportion with the end point in
the control group (Pi) must be known from existing data; P2 can then be
calculated from the trial objective. For example, if the major objective is to
reduce mortality by 50 percent, P2 = 0.5Pj.
With qualitative data, P\ and P2 arc required to calculate the number re­
quired for the trial but the number of available subjects (//,,) required for the
original observation of P| is not required.

ingly small.

10.5.2 Calculating the numbers for the classical explanatory trial

It will be assumed that either treatment may be superior giving a two-tailed
test for a.

10.5.2.1 Quantitative data
Let n be the minimum number required in each of two groups.
Let d be the difference to be detected.
Let K be a constant equal to the square of the sum of the standardised normal

10.5 CALCULATING THE NUMBERS

Some useful formulae will be presented but their derivation is given in the
statistical texts that are referenced. We must first consider whether the inves­
tigator wishes to estimate the effect of the new treatment or to make a decision
whether or not to use it (or investigate it).

deviates for
and P (ND-, + NDp)2
Let 5 be the standard deviation of the measurement, then
(10.4)

2Ks2

10.5.1 The usual or explanatory trial
0.05 and 0 = 0.10, K = (1.96 + 1.28)2
when a
tailed and p one-tailed.

Schwartz and his colleagues considered the usual trial to be an attempt to
examine the magnitude of treatment effects and to explain the observations
[117, 124]. In figure 10-3a the classical situation is illustrated when the control
and intervention group results arc truly identical, giving a difference in results
of zero, and when they are truly different by a difference 5. The results of the
trial will be distributed around mean zero in the first instance and around mean
5 in the second instance giving the two frequency distributions. The number
of patients required for the trial must be arranged so that the difference desig­
nated as the objective of the trial (5) is associated with a given.

10.5 and « is two-

10.5.2.2 Qualitdfii'c data
Let Pi be the proportion having the event in the control group.
Let P2 be the proportion in the intervention group.
Let n and K be as in section 10.5.2.1.
KJPj (1 - P)) + P2(l - P2)l
n = --------------------- —~
(Pi - P2)2

10.5.1.1 Type I error, « (two-tailed test)

(10.5)

This is an approximate formula and a more exact formula is [125].

In figure 10-2, 5 is arranged to define two-and-onc-half percent of the area
under the right-hand curve, and the distance 0 to 5 is said to be 1.96 stan­
dardised normal deviates.
To calculate the numbers for a trial the number of standardised normal
deviates has to be provided for a given a. Statistical textbooks provide tables
of these normal deviates.

n

(ND, V[2P (1

(Pl - P2)2

Where P

(

P)] + NDp V|Pi(l - Pi) + P2<1 ~ PJ?

'/2 (Pi + P2)

(10.6)

HIM

A-B = 0

Table 10-2. The three kinds of errors. Treatment effects A and B. In reality
the treatment effects may be equal (A — B = 0) or unequal B > A or A > B.

A-B = 6

Tria) conclusion
B> A

I
6

o

T
R
U
T
H

B

A

A - B = 0

i/2 a

A >B

7

A - B = 0

A

P

7

B

»/2 a

P

« is the type I error; p the type II; and y the type ill error 1117).

<

A

increases

B

Increases

>

error is limited to five percent the distance, 05, will be 1.64 standardized
normal deviates.

Figure 10-5. Two frequency distributions are represented where A — B = 0 (the left curve)
and A - B = 8 (the right-hand curve). The investigator is not interested in whether A and B are
equal but only wishes to decide between the use of treatment A and treatment B. Type I error is
100 percent, type II error is zero, and the type III (y) error is given by the dotted area
This
area equals five percent and gives the probability of choosing the worst treatment: B, when A is
the best.

The proportion of events in the experimental group can be adjusted for the
number of dropouts to be expected from default and from withdrawal due to
death from causes other than trial end points. Also, the numbers can be ad­
justed if the treatment takes a time to achieve full benefit [121], Similarly the
formulae may be adapted for more than two drug groups and for unequal
allocation of patients between the groups (section 10.11).
George and Desu [126] have also discussed the situation where survival
times rather than events arc to be compared.

I

10.6 CALCULATION OF THE POWER OF A REPORTED TRIAL

The concept of the power of a significance test was discussed in section 10.3.
The power of a reported trial when the observed difference is assumed
significant at the five percent level is given by ND$ when calculated by sub­
stitution of n and (NDa = 1.96) in K and equations (10.4) and (10.5). This
calculation of power led to the results reported in table 10-1 (section 10.3.1).

10.5.3 The pragmatic decision-making trial

Schwartz and Lellouch named a decision-making trial a pragmatic trial
[124]. Figure 10-5 illustrates the situation when we wish to make the decision
between treatments A and B but do not care if they are truly similar. 5 is the
mean result for a distribution when a true difference is present and 0 the mean
when there is no difference between the treatments. We must note the
following:

10.7 HOW THE SPECIFIED LEVELS OF ERROR INFLUENCE
THE NUMBER OF PATIENTS REQUIRED FOR A TRIAL
1

1. The type I error (a), or the likelihood of saying one treatment is to be
preferred when the treatments arc equal is 100 percent.
2. The type II error (0), or likelihood of saying that the treatments arc equal
when they are not, is zero.
3. The type III error (y), the probability of preferring the worst treatment can
be large and the trial numbers must be arranged to limit this error. If the

Table 10-2 illustrates the three kinds of error according to the observed
results and the true results.
The calculation of the numbers required for a pragmatic trial is performed
using formulae (10.4) and (10.5) in section 10.5.2 but K now equals the square
of the standardised normal deviate for y or type III error (one-tailed test) [117].
If y is set at five percent, K = 1.642 = 2.7.
We can also decide on the use of a new drug when a new drug A is better
than an established drug B by a certain number of units (D). The decision to
use A could be made for results > D and a decision in favour of B made when
the result is less than D. The employment of a difference, D, in making a
decision does not affect the numbers required for the trial [117].

I

The level or presence of any error term employed in calculating the number of
subjects will be influenced by whether the trial is required for estimation of
effect (explanatory trial, section 10.5.1) or as a mechanism for decision­
making (section 10.5.3); by whether the « error is symmetrically two-tailed or
not; and by the effect on a error of taking repeated looks at the data. In
addition we must examine the effect of factors influencing the level of variance
of the data (section 10.8); the treatment effect that is to be detected (section

significant adverse effect can result in the treatment’s being withdrawn from
use, used less frequently, or limited to certain patients. It can be argued that the
detection of an adverse effect has more influence on patient care and is more
important than the demonstration of a beneficial effect. It is only rarely that the

Table 10-3. Increase in numbers required for a trial with either
a = 1% or 5% and for increasing levels of power. (^, two-tailed.)

Increase in Numbers Required for a Trial

investigator can justify an asymmetric test for type I error.
1 - P

50%

75%

90%

95%

99%

« = 5%
a = 1%

1

1.8

2.8

1.7

2.8

3.9

3.4
4.6

4.8
6.3

10.9); the effect of dropout and withdrawal (section 10.10); and whether or not
we require equal numbers in the control and treatment groups (section 10.11).
10.7.1 The level of type I («) and type II (3) error
Table 10-3 gives the increase in trial numbers that would be expected if a
equals one percent and not five percent, and if 1 - P (power) equals 75
percent, 90 percent, 95 percent, and 99 percent rather than 50 percent. When «
equals one percent and power 90 percent the trial numbers arc increased four­
fold and with a power of 99 percent the numbers arc increased over sixfold.

10.7.2 Asymmetric instead of a two-tailed test for
Schwartz and his colleagues have argued the case for the onc-tailcd test of
significance [117] where the possibility that the intervention treatment is worse
than the control treatment is ignored. However, we are usually equally inter­
ested in whether an intervention treatment is better or worse than a control
treatment and a onc-tailcd test is not applicable. Schwartz and his co-authors
pursued the argument and suggested that we may be interested m an unex­
pected finding but only if it is highly significant. They provided an example
where the research worker was interested in a right-sided 2’/2 percent probabil­
ity as in figure 10-2 but only in a 0.1 percent probability in the other direction.
In their example the power of detecting a significant (P < 0.025) difference in
one direction was 95 percent whereas the power of detecting a difference m the
other direction was 70 percent. They stated, “Thus we accept a rather large
probability of failing to detect a difference on the left. This is quite reasonable;
the test is primarily intended to detect a difference on the r’g^t •
significant result on the left has to be considered as a byproduct.’’
A by-product it may be, but although a trial of a treatment is usually
mounted to detect a benefit, and an adverse effect of treatment is not antici­
pated (it would be unethical to perform the trial if it were), we must agree that

the failure to detect an adverse effect may have the most serious consequences.
If a trial lacks power in detecting a benefit, then patients who could be helped
may not receive the treatment and this will be to their detriment. However,
most trials are performed when a clinical impression suggests that the treat­
ment is beneficial and one trial result showing no difference from, say placebo,
will often result in the continued use of the drug. But a single trial showing a

I

1

10.7.3 The effect of repeated looks at the data on the type I error
A statistical test, significant at the five percent level, indicates that the observed
result has less than a five in 100 likelihood of having occurred by chance.
However, the assumption is made that the statistical test is only performed
once during the course of the trial. If the investigator makes the test after 20
patients have entered the trial, then after 40, 60, 80, etcetera, he will greatly
increase the odds on reaching a five in 100 chance. Moreover he will presum­
ably stop recruitment and end the trial when a significant result is observed.
McPherson has pointed out that ten repeated tests on accumulating data at the
one percent level of significance during a trial will be the same as an overall test
for the trial at the five percent level of significance [44]. Similarly ten tests at
the five percent level of significance will lead to an overall significance level
test of 19 percent, almost a one in five probability of the finding being due to
chance. In a long-term trial, repeated testing may be necessary for ethical
reasons. Table 10-4 gives the significance level for individual repeated tests
(nominal levels of significance), one of which would have to be exceeded in
order to achieve an overall level of significance of one percent or five percent.
The maximum number of repeated assessments has to be decided at the plan­
ning stage of the trial and the nominal level of significance selected at that stage
to give an overall level of significance of one percent or five percent.
The nominal level of significance must be strictly adhered to. For example,
with a maximum of ten tests, a nominal level of significance of one percent,
and an overall level of five percent, the first test may be significant at the three
percent level, but the trial cannot be stopped at this stage. The first statistica
test is one of a scries of ten and is as likely to be falsely significant as the last.
Even though only one test has been performed and that was significant at the
three percent level an overall level of significance of five percent cannot be
claimed. The decision rule was to stop the trial if the one percent level is

Table 10-4. Nominal levels of significance (%) that must be exceeded if an_overall level of
significance is to be achieved of 1% or 5% when the test is repeated 5, 10. 15, or 20 times 144)
Number of Repeated Tests

True level of significance (%)

1
5

5

10

15

20

0.28

0.19

0.15

1.59

1.07

0.86

0.13
0.75

achieved in any one of ten tests and only if this level is reached in the first test
may the trial be stopped.
If the investigator wishes to review the data constantly, perhaps with a view
to an early completion of the trial, he may adopt a sequential trial design. This
design is discussed in section 11.7 and takes into account the use of repeated
significance tests.

Table 10-5. Numbers required for a trial to show that one antihypertensive
drug reduces diastolic blood pressure by 10 min Hg more than a second drug.

Between-Patient
Study

10.8 HOW THE VARIANCE OF THE DATA
INFLUENCES THE NUMBERS REQUIRED

Difference to be
detected (mm Hg)
Standard deviation
(s and 5U.)
Significance level
Power
K

10.8.1 Quantitative data
With quantitative data the variance of a result may be reduced by replication of
the measurement. For example, it is reasonable to assume that the variance of
fasting blood glucose will be reduced if it is measured every day for three days
and the average taken as the end point of the trial, rather than if a single result
is employed. The variance of an average result for each patient will be lower
than that of a single measurement.

Number of patients to be
recruited to the trial
Number of observations
required

10.8.2 Qualitative data

In certain circumstances, measuring change from baseline can drastically re­
duce the numbers required compared with a trial without a baseline. Table 105 gives the numbers required for a trial of a new antihypertensive drug. The
standard deviation of betwccn-subjcct blood pressure measurements (13 mm
Hg) was derived from a trial of such a drug and is given in column one.
Similarly the standard deviation of within-paticnt changes in blood pressure is
given in columns two and three (8 mm Hg). The difference to be detected is 10
mm Hg, a = five percent (two-tailed), £ = ten percent. In a bctwccn-paticnt
trial comparing two treatments without a baseline measurement equation

Absolute
values

Change in
pressure

Change in
pressure

10

10

10

13
5%
90%

8
5%
90%

10.5

«
5%
90%
10.5

70

27

7

70

54

14

10.5

Note: A between-patient design with and without baseline measurements is compared with a within-patient
(cross-over) trial.

With proportional data the variance is maximum at 0.50 and minimum at very
high and very low values. More importantly, however, the higher the rate in
the control group the greater the effect of treatment for a given proportional
reduction. For example, a 50 percent reduction in a 90 percent event rate gives
a fall in events to 45 percent (a difference of 45 percent) whereas a 50 percent
reduction in a 20 percent event rate gives a fall to only 10 percent (a difference
of 10 percent). The higher the proportion in the control group getting an
event, the lower will be the numbers required for the trial. Intuitively, the
more patients that experience an event in the control group, the fewer will be
the numbers required and this is one of the reasons for doing trials in selected
high risk patients. However, Sondik and colleagues [127] considered a trial in
which subjects with a high serum cholesterol were entered. The higher the
serum cholesterol, the higher the risk. But if the subjects arc to be detected by
screening, the greater the serum cholesterol required to enter the trial, the
larger the number of subjects that have to be screened. In such circumstances it
may be less expensive overall to enter medium-risk patients to a trial.

10.8.3 Reduction in variance by measuring a change from baseline

Within-Paticnt
Study

(10.4) estimates n = 35 so that 70 patients with 70 observations arc required
for the trial. With the same criteria but estimating change from baseline only
27 patients and 54 observations are required.
10.8.4 Reduction in numbers using the patient
as his own control (a cross-over trial)
With the same parameters but a cross-over (within-paticnt) trial the formula is:
KS„2

(10.7)

d2

I
I
I

where nu, is the number of subjects, 2nu, the number of observations, and Su,
the standard deviation calculated from within-paticnt changes in blood pres­
sure. In this example only seven patients giving 14 observations would be
required for the trial but we have assumed that the difference between the
treatments is the same in the first as the second treatment period. Cross-over
(within-paticnt) trials arc discussed further in section 11.2.
10.9 THE EFFECT OF ALTERING THE DIFFERENCES TO BE
DETECTED BETWEEN CONTROL AND TREATED GROUPS

The smaller the difference to be detected the larger the numbers required for
the trial. Table 10-6 gives the number of patients who might be required in a
trial of a treatment to reduce the frequency of rcinfarction in patients who have
already suffered one myocardial infarction. It is assumed that the type I error is
five percent and the power term 90 percent. The numbers arc provided accord-

Table 10-6. Number of patients required for a trial of secondary prevention in myocardial
infarction, according to the event rate in the placebo group over the duration of the trial
and the % reduction to be determined in the intervention group.

Event rate in placebo
group/100

% reduction in events in
the intervention group

Total number required
for the trial

10
10
10
10
10

10%
20%
30%
40%
50%

29,460
7,010
2,960
1.570
950

20
20
20
20
20

10%
20%
30%
40%
50%

13,180
3,160
1,340
720
430

Type I error = 5%; power = 90%.

ing to the rate in the placebo group over the duration of the trial and the
expected reduction in events in the treated group. In this example 30 times as
many patients are required to detect a ten percent reduction compared with a
50 percent reduction [121].

I

I

new treatment and this is one reason for increasing the numbers receiving that
treatment. A new treatment may also have to be compared in two different
modes of administration or dose schedules, and one dose may be given to onethird of subjects, the second dose to another third, and the old control treat­
ment to the remainder, thus resulting in a 2:1 allocation. Peto and his col­
leagues [70] suggest that this strategy may allow different groups to participate
in a trial of a new treatment even when they have divergent views on minor
variants of treatment. Unequal allocation gives some loss of efficiency com­
pared to a 1 :1 allocation, but a 2:1 allocation is equivalent to performing a 1:1
allocation and eliminating about 10 percent of the patients from the trial.
However, more unequal comparisons cannot be supported and 3:1 random­
isation is equivalent to eliminating a quarter of the patients from the trial.
Unequal randomisation may also be employed when the costs of treatment
vary. Cochran [128] and Nam [129] have discussed the square-root rule which
states, “If it costs r times as much to study a subject on treatment A than B then
one should allocate Vr times as many patients to B than A.’’ This procedure
minimises the cost of a trial while preserving power. Gail and colleagues [130]
considered a similar situation where one treatment was more hazardous than
the other and developed a case-saving rule.

10.10 THE EFFECTS OF DROPOUT ON THE NUMBERS REQUIRED

10.11.2 Unequal randomisation when comparing more than two groups

Dropout is discussed in section 14.3 and is important if withdrawn subjects are
to be excluded from the analysis. If they arc retained in the analysis they may
dilute any effect of treatment. Dropouts consist of those who default or other­
wise do not follow the trial protocol; those who arc withdrawn for criteria
unrelated to the end point of the trial; those who die from causes unrelated to
the trial end points; and those who are withdrawn from the trial for criteria
possibly related to trial end points.
The Biometrics Research Branch, National Heart Institute [121] have pub­
lished tables giving the numbers required for trials according to the expected
dropout rate; these tables are appropriate when such subjects arc to be ex­
cluded from the analyses.

When more than one treatment group is to be compared with a standard
control treatment, it may be desirable to increase the relative number receiving
the control treatment. For example, the Coronary Drug Project trial compared
five treatment groups with a placebo group. As each of the five drug groups
was to be compared with the same placebo group, it was necessary to deter­
mine the final mortality in the placebo group with greater precision than in the
actively treated groups (it was expected that the five-year mortality in the
placebo group would be 30 percent and that active treatment would reduce this
rate by a quarter). The Coronary Drug Project allocated 2.5 times as many
patients to the placebo group (2,793 patients) as to any individual actively
treated group (1,117 patients in each group) [120]. The ratio 2.5 : 1 was cal­
culated by minimising the variance of the difference between the results in the
drug groups and in the placebo group [131].

10.11 DO WE NEED EQUAL CONTROL AND TREATMENT GROUPS?

The concept of unequal randomisation has been introduced in section 7.3.
Traditionally, treatment allocation has been arranged so that each group con­
tains an equal number of subjects. However, two alternatives have been sug­
gested, the use of 2:1 allocation of new: old treatments in trials comparing two
treatments [70] and a relative increase in the number in the control group [120]
when several treatments have to be compared with this group.

10.11.1 A 2:1 allocation ratio with two treatment groups
A 2:1 allocation has been suggested for a comparison of a new treatment with
an old or placebo treatment [70]. Less will be known about the effects of the

10.12 AIDS TO CALCULATING THE NUMBERS REQUIRED FOR A TRIAL

Tables have been published of the numbers required for a trial given the
dropout rate, the difference to be detected, the event rate in the control group,
and the duration of the trial [1211. As an example, figure 10-6 gives a graphical
representation of the numbers required in each group according to the percent­
age of patients expected to respond to two treatments, a = five percent, 1 - P
= 50 percent [119]. If, for example, 60 percent respond to one treatment and
40 percent to the other, between 40 and 50 patients will be required in each
group.

1

variance may have been low if determined under perfect standardised condi­
tions that arc not reproduced in the trial; on the other hand, the variance may
have been high if calculated from observations in normal clinical practice. In
the trial, standardisation and close attention to detail may reduce the variability
of the results.

NUMBER OF PATIENTS REQUIRED FOR CLINICAL TRIALS
50% CHANCE OF SUCCESS

5% LEVEL OF SIGNIFICANCE

GRAPH 2

100%

1

0

Number ol patients required

per treatment group

90

80

10.13.2 An inaccurate estimate of the number of events in the control group

In a clinical trial, careful attention to the patients’ welfare may markedly
reduce the number of events that were expected in the control group. This
effect will be increased if patients arc withdrawn from the trial prior to an
event owing to an observed deterioration in their condition. In the Hyperten­
sion Detection and Follow-up Program trial [132| the patients who were
closely followed lor hypertension showed a reduction in mortality from sev­
eral conditions unrelated to hypertension, possibly the effect of an early detec­
tion of other disease processes.
It is possible, in a long-term trial, to observe more events in the placebo
group than expected. 'I bis can result from a failure to allow for the ageing of
the population. In a ten-year trial of patients aged, say, 60, the correct number
of events should be based on the frequency of events at age 65. If the number
of expected events is based on the number at age 60, more events may be
observed during the trial then expected owing to this error.

¥

I
70

60

5
5
o
■o

o

o
50% ■S

s

Q
0)

40
5

---------------------

/

zzz

s.
30

o
OJ

2
20

S

10.14 CONCLUSIONS

Ju

10

0%

10

20

30

40

50%

60

70

80

90

_
0%
100%

Percentage ol patients expected to respond to one treatment

Figure 10-6. Number of patients required for a randomised controlled trial: 50 percent power,
five percent type 1 error. Reproduced with permission from Clark, C.J., and Dowme, C.C. Lan(Ct2: 1357, 1966.

10.13 FAILURE TO PREDICT THE VARIANCE OF MEASUREMENTS IN THE
TRIAL OR THE FREQUENCY OF EVENTS IN THE CONTROL GROUP

bers required for a trial requires an accurate estimate either
^asurcmcnts used as the trial end point or of the frequency

is an end point.
estimate of the variance of measurements
surements during the trial is lower than estimated, then
tquired than calculated; if the variance is larger, more
td Either situation may occur: the initial estimate of

)

In order to calculate the numbers required for a randomised controlled trial,
the major objective must be defined exactly. The treatment effect to be de­
tected has to be designated and the distinction made between a trial designed to
arrive at such an estimate (an explanatory trial) and one intended only to reach
a decision (pragmatic trial). The investigator must define the level of type I and
type II errors he will allow and also calculate the variance he expects in the trial
end point. The methods of calculating the numbers required for a trial were
given in this chapter together with the effect on these numbers of changes in
the objective, level of type I and type II error, and change in variance of the
trial end point. More specifically, the advantages of the within-patient cross­
over trial were discussed and the advantages and disadvantages of using un­
equal allocation procedures were given. The problems of a negative result
were considered in some detail and the power of a trial defined and calculated.
The concept of confidence limits was introduced and the problem that re­
peated looks at the data raises in determining the type I error was reviewed.
Aids were considered for calculating the numbers required for a trial. Lastly, it
was admitted that the assumptions made in the calculations may prove to be in
error. However, it is very important to attempt to calculate the numbers
required for a trial. Otherwise an inappropriate design and protocol may be
adopted, leading to a predictably inconclusive result.

Standard trial detign
TREATMENT A
TREATMENT B

R

TREATMENT C

11. DIFFERENT TRIAL DESIGNS

TREATMENT D

2.

Cross-over trial
TREATMENT
A.

------>
C
------>
B
------- >
D
------ >

R

TREATMENT
B

TREATMENT
C

-----> ----->
A
D
----->
D

TREATMENT
D

------ >

Trial to detect an interaction

TREATMENT A»B
TREATMENT A

r

TREATMENT B

R

NO TREATMENT

4.

Cro»»-over trial to detect an interaction
TREATMENT

In the standard trial design a subject or patient is considered suitable for entry
to the trial and is then randomized to a treatment group; each group receives a
single treatment. A fixed number of persons enter the trial and arc followed for
a predetermined interval of time; treatment is stopped at the end of this period.
Other trial designs have to be considered. The subject may be asked to take
more than one treatment consecutively (cross-over trial), more than one treat­
ment simultaneously (trial to detect an interaction between treatments), or
more than one treatment consecutively and concurrently (cross-over trial to
detect an interaction). The remaining design variations do not involve a fixed
number of persons being entered. The standard trial, cross-over trial, trial to
detect an interaction between treatments, and cross-over trial to detect an
interaction arc illustrated in figure 11—1 for four treatments A, 13, C, and D.
The standard design (number 1) allows four treatments to be compared
simultaneously and has been called a parallcl-groups trial. If one treatment is a
placebo then the effect of the other three treatments can be estimated in the full
knowledge of any placebo effect or change in baseline measurements. In the
cross-over trial (number 2) each subject receives the four treatments and the
order of treatment is randomized. In this example four different orders arc
specified. Trial design number 3 allows the effect of one treatment to be
assessed in the presence of a second treatment (that is, the presence of an
interaction between treatments may be determined). The individual drug ef­
fects arc also determined as a fourth group receives neither drug (but usually

TREATMENT

NIL

--->
A*B

TREATMENT
A*B

TREATMENT

--- >
NIL
--- >
A
-->
B
--- >

Figure 11-1. Two designs to assess the effects of four treatments A, B, C, and D and two designs to test for an interaction between two treatments A and B. R = point of randomisation.
1. Standard trial design. Four groups studied in parallel to assess four treatments.
2. Cross-over design to assess the effects of four treatments within patients.
3. Parallel groups study to detect an interaction between two treatments.
4. Within-patient cross-over trial to detect an interaction.

I



receives a placebo). In design 4 an interaction between treatments can be
determined within-subjcct in a cross-over trial.
11.1 THE USE OF THE STANDARD TRIAL DESIGN

x

The standard design is the one most commonly used and has the virtue of
simplicity in that a single treatment is given to each group and a fixed number
of patients is involved. The other frequently employed design, the cross-over
trial, is inappropriate when a treatment is curative, when the duration of
treatment has to be long, when the effects of treatment persist for some time
after stopping treatment (a carry-over effect), or when a large number of
treatments have to be compared. In any of these circumstances the standard
design has to be employed. This design is also more appropriate when a large

number of subjects arc available for the trial. The standard design should be
employed in the following circumstances.
11.1.1 When treatment is curative
If the trial is to test a curative treatment for an illness, the cross-over design
cannot be employed. There is no point in a cured patient continuing with
further treatments.

11.1.2 When the duration of treatment has to be long
If the effect of a drug has to be determined after, say, five years, a cross-over
trial may take too long since the duration of the trial is ten years with two
treatments and 20 years with four.

11.1.3 When the effect of one treatment is
different when it follows another treatment
If the effect of one drug persists for a long time, a carry-over effect of this
treatment into the next treatment period may interfere with the effect of any
further treatment and the standard trial design is to be preferred.
When comparing two treatments in a cross-over trial the difference between
the treatment effects must be independent of the order of administration. Hills
and Armitage have concluded that if previous experience with the treatments
has not proven that this is true, then a parallel group study should be carried
out [133]. A different result in one period (in statistical terms, an interaction
between treatment and period) may not only occur with a carry-over effect.
For example, placebo treatment may possibly be more effective when given
first to lower blood pressure or when given last to relieve a painful condition
that is improving with time.

11.1.4 When a large number of treatments are to be compared With a large number of treatments the trial would be too long and complex if
each subject had to take every treatment. The standard trial allows several
treatments to be compared.

11.1.5 When the number of subjects available for the trial is unlimited

When large numbers of subjects arc available for the trial we can assume that
enough subjects will be recruited in a standard design to detect important
difference between the treatments, even if the variation of bctwccn-subject
measurements is much greater than the variation of within-subjcct measure­
ments. Assuming costs arc not the limiting factor, we can opt for the less
efficient design but one that is quicker and simpler to execute without the
difficulty of carry-over effects and because it is shorter fewer patients will drop
out.

11.2 THE USE OF THE CROSS-OVER TRIAL DESIGN

The cross-over design can be considered when the condition being inves­
tigated is constant and only temporarily affected by treatment. For example, a
patient with a high blood pressure or blood sugar may receive drug treatment
that has a short-term effect on his condition and may then take a succession of
treatments that do not affect the result of later treatments.
Cross-over trials may be recommended when any carry-over effect is short,
when the prolongation of the trial neither greatly increases dropout rates nor
alters the relative effects of the treatments being compared?*when the withinsubjcct variation is l<?ss than bctwccn-subject variation, and when any order

effect can be balanced out.
IVaniin^. It is difficult to prove that the difference between treatment effects
is independent of the period of treatment and therefore the Food and Drug
Administration in the United States has concluded that the cross-over trial is
not the design of choice where unequivocal evidence of treatment effect
is required. Hills and Armitage also concluded, "If the number of patients is
limited and a cross-over design is chosen, then the internal evidence that the
basic assumptions of the cross-over arc fulfilled must be presented and if
necessary the conclusions should be based on the first period only” [133].
The investigator may, however, be certain from previous studies that the
difference between treatments is independent of period and he can then pro­
ceed with a cross-over design in certain circumstances. It must be remembered
that a difference between treatment effects may be due to a carry-over effect of
one treatment into the next period or to an influence of the time of assessment
from the beginning of the trial, a so-called order effect.

11.2.1 When any carry-over effect is of short duration
In the treatment of hypertension with antihypertensive drugs the carry-over
effect in lowering blood pressure is usually short, and a brief interval between
treatments will ensure that one treatment docs not influence the result of the
next. The interval may last from two-four weeks and has been called a wash­
out period. I lowcvcr, certain antihypertensive drugs may have longer effects
on measurements other than blood pressure (for example, diuretics may re­
duce scrum potassium for three months). With such long effects the cross-over
trial design may not be appropriate, but under certain stringent conditions the
trial may still be analysed with the carry-over effect balanced out and even
estimated (section 11.5).

11.2.2 When extending the treatment period does not
diminish the difference between the treatment effects

As discussed earlier it is possible that the difference between the two treat­
ments may differ at the start of a trial from later periods in the trial. Meier and
Free (134) have therefore argued that “each patient as his own control" is not

entitled to the status of dogma. They reviewed the results of cross-over studtes
on the use of analgesics in postoperative pain. In this situation the pain .
lessening with time and the standard design has the advantage of simphcity.
However, the between-patient differences are considerable, supporting the use
of a cross-over trial, and it is possible to allow or adjust for the order effect m
the design or analysis. Although the treatment effects may diminish throng i

time the differences between various treatments may be more consistcn .
11.2.3 When a baseline measurement cannot be made

In a cross-over trial baseline measurements may be desirable but they arc not
essential as all treatment effects arc measured withm subject. In a parallclgroups trial, precision 'may be increased when the witlnn-su bjcct variance is
and baseline measurements arc
lower than that of the bctwccn-subjcct variance
employed (section 10.8.3). However, if baseline measurements cannot be
made, the parallel-group trial will require many more patients than a cross­
over trial. One example would be provided by patients with severe diabetes
mellitus who require insulin treatment every day. If two new insulins
i........... arc to be
blood sugar, the baseline blood sugar would be
compared for their effect on 1
: be stopped and the starting
unsatisfactory as the current treatment cannot
iginal treatment. However, if
starting sugar would be a satisfactory baseline measurement.

11.2.4 When a cross-over trial will not result in a large increase in dropouts
The longer an individual takes to complete a trial the more likely that the
person will default (dropout). This problem is discussed in section 14.3. The
rate will
with time owing to subjects' moving address, havdropout rate will increase ’
tcrcurrcnt illness, changing their occupation, and taking holidays. A
ing an intuvunwn
------- c
.
* trial1 and,
J, m addition,
these possibilities increase with the duration of the
’s becoming intolerant of the number
dropout may occur owing to thesubject
—j-------->ns, or one of the trrnnnents
of visits, repeated investigations,
treatments employed. If an
adverse effect of a treatment is experienced
in
the
first or second treatmen
pcricnccd i
period the subject may be unable or unwilling to take further treatments.

: months, it will reduce the dropout
When the treatment period is three
rate per treatment to give four times a given number of patients one of
four treatments for three months than to give the number of patients al four
treatments over a one year period. However, it may be difficult to recruit four
times as many patients for the standard trial design and the costs of recruitment

and initial investigation will be increased.

11.2.5 When an order effect is absent or can be balanced out
The order effect is the change in a measurement according to the period of
estimation after allowance has been made for the effect of treatment. In a tna
of antihypertensive drugs blood pressure can become progressively lower as

the trial proceeds. 'Flic exact mechanisms producing this fall in pressure have
not been determined. Initially, pressure falls due to familiarisation with the
technique of measurement, the observer, and the surroundings. This effect can
be reduced by a prolonged run-in period prior to randomisation. Additional
reasons have been suggested for the fall in pressure: an effect of any placebo
tablets that arc given; a phenomenon whereby an initial lowering of pressure
makes blood pressure control easier thereafter; and the removal from the trial
of persons whose blood pressure rises leaving a higher proportion of those in
whom pressure falls. Whatever the cause of a trend with time the subjects must
be randomised to receive the treatments with equal frequency at different
times. In figure 11-1 and trial design 2, if equal numbers of patients arc
randomised to the four sequences, then drug A will be as often given first as
second, third, or fourth. Similarly with drugs B, C, and D, the order effect for
these treatments is said to be balanced out. 1 he order effect can be estimated
by comparing the average results for each period, every interval including an
equal number of measurements on the four treatments.
Fhc order effect may be important in trials other than those of antihyperten­
sive drugs or analgesics. In the treatment of diseases with a fluctuating course
the trial may be commenced when the condition is at its worst and a subse­
quent improvement is expected as part of the natural course of events.
11 3 THE STUDY OF TWO OR MORE TREATMENTS
SIMULTANEOUSLY: FACTORIAL DESIGNS OR TRIALS
TO DETECT AN INTERACTION BETWEEN TREATMENTS

Traditionally, in an investigation the experimenter isolates a number of factors
and studies the result of altering one factor while holding the others constant.
Fisher considered this doctrine to "be more nearly related to expositions of
elementary physical theory than to laboratory practice in any branch of re­
search’’ [135] and we shall consider the advantages of more complex experi­
ments where two factors (treatments) are given together. The simultaneous
examination of more than one treatment allows any interaction between the
treatments to be determined. If an interaction is not present the experiment
allows an extra estimate of the two treatment effects.

11.3.1 The detection of an interaction between the treatments
An interaction is said to be present when the effect of one factor is different in
the presence of another factor. Let us consider a trial where the subjects receive
cither drug A. drug B, A plus B, or placebo (Figure 11-1, number 3). This
design is known as a factorial experiment and yields two estimates of the effect
of two factors, drugs A and B, one estimate of the drug’s effect when given
alone and one of its effect when given in combination. Let us suppose that the
drugs lower scrum .uric acid, the mean uric acid after drug A being UA, after
drug B, Ub, after both drugs combined, Ua + b< and when on placebo, Uq.
Figure 11-2 provides fictional data for a factorial design. The upper lines

A not token (A-)

U° *\/

uB
UA *■

* UA*D

/

A token (A*)

B*

B-

present the drug effects arc said to be additive. In the presence of an interaction
the different estimates of the drug effect arc not equal.
The middle graph in figure 11-2 illustrates the result when the two drugs in
combination have less than the expected additive effect. This is known as a
negative interaction and has been loosely referred to as antagonism between
the drug effects. The lower graph represents a positive interaction when the
effect of the drugs in combination is greater than expected. The term syne^ism
has unfortunately been used both for an additive effect (with no interaction)
and a multiplicative effect (with a positive interaction). Few trials to detect
interactions have been performed; early examples arc given by Wilson and his
colleagues and Acnishanslin and co-workers [136, 137].

u0.'.UB
o

V

/

"• UA*B

A+

5
cz>

•UB

11.3.2 More than one estimate of the treatment effects
The standard design would consist of a group on drug A, a group on drug B,
and a group on no treatment. This design would give one estimate of the effect
of drug A and one estimate for drug B. Overall, this design yields two esti­
mates for three treatment groups, whereas the factorial design gives two esti­
mates of each drug effect, four estimates for four treatment groups. 1 he
factorial design is more efficient and docs not lose precision [135], However, if
the effect of drug A is not the same in the presence of drug B (and vice versa)
then the factorial design gives only one estimate of the effect of each drug but it
docs detect the interaction. As discussed previously the estimate of the effect ol
drug A is given by the two comparisons UA - U„ and Ua + b - Ub alld
similarly the effect of drug B is given by UB - U„ and UA. u - UA. The

overall estimate of the effect of A is given by
• UA*B
B NOT
TAKEN
(B-)

3

B TAKEN

(UA —Uo) + (Ua + b

Ub)

2

(B*»

Figure 11-2. The effect of two drugs A and B on serum uric acid. The left-hand results are
when B was not taken and the right-hand results when B was taken. The upper lines represent
the situation when A was not taken and the lower lines when A was consumed. The upper graph
illustrates the situation when no interaction between the treatments was present, the middle
when a negative interaction was observed, and the lower graph when a positive interaction was
present (see text).

connect the results when treatment A was not taken and the lower lines when
it was. The left-hand results were obtained when drug B was not given and the
right-hand results when B was taken. The figure illustrates three sets of results:
the upper panel when no interaction is present; the middle panel when a
negative interaction occurs, and the lower panel when a positive interaction is
demonstrated. When there is no interaction the effect of A is the same irrespec­
tive of the presence of B and vice versa and the distance between the two lines
gives the effect of drug A, equal to UA — Uo or Ua + b “
The effect of
drug B is UB “
which equals Ua + b “ UA. When an interaction is not

and the effect of B by
(Ub-Uq) + (UA4b-Ua)

2

This trial design should be more widely used, especially when two treat­
ments arc thought to have moderate, but additive, effects (for example, a
reduction in mortality of 20 percent for each drug). A trial including both A
and B simultaneously may be able to detect a 40 percent reduction in mortality
while a standard design of treatment with A alone, B alone, and placebo may
fail to detect a reduction of 20 percent.
11.4 CROSS-OVER TRIALS TO DETECT AN INTERACTION

Figure 11-1, design number 4, gives an example of this trial design which is a
combination of a within-patient cross-over design and a design to detect an

order effect. The design also ensures that each patient receives all three treat­
ments. The second square could be useful in a trial of three intraarticular
injections in patients with severe generalized rheumatoid arthritis. The most
severely affected wrist, knee, and metacarpo-phalangcal joint could be se­
lected, and in the first patient, injection A made into the wrist, B into the knee,
and C into the metacarpo-phalangcal joint. For every set of three patients each
type of joint will receive all three treatments. The design could be said to be

ORDER

a)

PATIENT

I

II

III

1

A

C

B

2

B

A

C

3

C

B

A

balanced for the joint treated.

11.5.1 Randomisation of patients in a Latin square design
Let us suppose that 18 male patients are to be randomised to a Latin square
design with three treatments and three orders of treatment. Three squares
identical to a) in figure 11-3 could be taken and three equal to b). These squares
can be pooled to an 18x3 table, and the rows numbered one to 18. Eighteen
sealed envelopes would be prepared to be opened consecutively as the patients
arc entered to the trial. In the envelopes would be the row number allocated to
a particular patient. For example, the first patient may be randomised to row
three, the second to row 17, and so on. The randomisation can be easily read
from a random number table (section 7.3) by noting a sequence of numbers
less than 19. Randomisation will ensure that the investigator cannot predict the
order of treatment for the patients entering the trial. However, he could (but
only if he wished) predict the eighteenth order.

SITE

b)

PATIENT

WRIST

KNEE

METACARPOPHAI.ANGEAL JT.

1

A

B

C

2

B

C

A

3

C

A

B

Figure 11-3. Two 3x3 Latin squares to allocate each treatment to every patient and to ensure
that each treatment is used once at each order or site of administration.

11.5.2 The treatment and order effects

interaction. In such a trial the individual patient has to receive, for example,
four different treatments in a certain order. The order can be randomised so
that for each set of ii patients the order effect and carry effects arc cancelled out

The differences between treatments arc calculated by comparing the results for
the patients when they arc taking the particular treatments. Similarly the aver­
age result for a particular order (or site) gives the effect of the order or site,
provided that each average result for the orders is derived from an equal

(section 11.5.3).
Such a design may consist of n treatments arranged in one or two n X n
squares using Latin letters to designate the treatment. Such a design is known as
a Latin square and is appropriate for the cross-over design to detect an interac­

tion between treatments and for other trial designs.
The design has been utilized successfully for antihypertensive drugs [101,
138, 139] and antianginal agents [140]. Owing to the lower within-subjcct
variance of measurements of blood pressure and frequency of angina, these
trials gave estimates of drug effects with very few patients. However, the trials
did not detect interactions between treatments either because they were absent
or because the trials were too small to detect these effects.
11.5 THE LATIN SQUARE OR RANDOMISATION SUBJECT
TO DOUBLE RESTRICTION (ROW AND COLUMNS)

Figure 11-3 gives two 3x3 Latin squares. The first square gives the order of
administration of three treatments A, B, and C to three patients designated 1 to
3, and the second square, for a different trial, gives sites for the application of
three treatments. With the first square, for every three patients each treatment
is given first, second, and third once only; the design is said to balance out any

number of the different treatments.
11.5.3 The carry-over effects

I'

The effects of a treatment may continue into the next period; this is known as a
carry-over effect (section 11.1.3). Latin square designs can be employed that
balance out residual or carry-out effects [141, 142]. If the number of treatments
is even, one square can be designed to achieve this effect, and if the number of
treatments is odd two squares are required. Only certain Latin squares have
these characteristics. Figure 11-3 gives the two squares required for three
treatments. Each treatment follows every other treatment twice when the two
squares arc employed. Figure 11-4 gives the one square required for four
treatments and the two squares necessary with five treatments.
When single squares balanced for carry-over effects are duplicated or a
combination of Latin squares is employed that is balanced the residual effects
can then be estimated. The methods for calculating the residual and other
effects have been clearly described and examples of balanced squares provided

for more than five treatments [142].

1?M

' 40

ORDER

ORDER
I

II

in

IV

1

A

B

D

c

2

B

C

A

D

3

C

D

B

A

4

D

A

C

B

PATIENT

PATIENT

ORDER

PATIENT

i

ii

III

IV

V

6

A

c

B

E

D

D

7

B

D

C

A

E

ii

III

IV

V

1

A

B

D

E

C

2

B

C

E

A

PATIENT

2

3

1

Aa

B/3

CV

2

BV

Ca

A/3

3

C0

AT

Ba

I

ORDER

i

1

I.

3

C

D

A

B

E

8

C

E

D

B

A

4

D

E

B

C

A

9

D

A

E

C

B

5

E

A

C

D

B

10

E

B

A

D

C

J

Figure 11-5. A Graeco-Latin square for three patients; three drug treatments A, B, and C; and
three orders of administration, a (orally), P (intramuscularly), and y (intravenously).

I
ii

.Figure 11-4. —
o— -Latin square designs
to balance out order effects and carry-over (residual) effects.
In order to achieve balance, one square is required for four patients and four treatments, and two
squares for five patients and five treatments.

I
2

11.6 THE GRAECO-LATIN SQUARE

The Gracco-Latin square employs both Latin and Greek letters and allows
three different sources of variation to be equalised. For example, a trial may be
designed for three drug treatments, three patients, three orders, and three
methods of administration (oral, intramuscular, and intravenous). Figure 11-5
gives an example of such a trial. In this Gracco-Latin square each drug treat­
ment A, B, and C is given once orally («), once intramuscularly (P), and once
intravenously (y). More complex Gracco-Latin squares together with their
methods of analysis have been described by Cochran and Cox [142].

i

11.7 THE SEQUENTIAL TRIAL

Armitage defined a sequential trial as a trial where “Its conduct at any stage
depends on the results so far obtained’’ [143]. Usually a sequential trial com­
pares two treatments, and the results during the course of the trial determine
the number of observations made. Frequently subjects are entered to the trial
in pairs; one patient is given one treatment and one the other. The results arc
analysed according to the outcome within these pairs. In most trials of any

P
f

design patients arc started on treatment serially and not simultaneously; there­
fore it is a simple matter to assess the response to treatment as it becomes
available in sequential order.
A section of statistical theory termed sequential analysis derives largely from
the the work of Wald [144] who allowed for repeated significance testing and
derived boundaries describing three possible outcomes. Figure 11-6 gives an
example of such boundaries where a comparison is made between drug T and
drug A. The upper boundary is a boundary that must be reached to demon­
strate a statistical preference for T and the lower boundary must be reached to
demonstrate a preference for A. If the two boundaries forming a V shape at the
right centre of the figure are reached, then the investigator knows that a
preference for one drug over the other is not likely to be demonstrated within
the predetermined conditions of the trial. A design with this central limiting
boundary is known as a closed design. Figure 11-6 illustrates the result of a
trial by Robertson and Armitage [145] where two hypotensive drugs used
during operations were compared, T being phcnactropinum chloride
(Trophcnium) and A being trimetaphan (Arfonad). The comparison was made
between subjects, and for each pair the time taken for the systolic blood
pressure to rise to 100 mm Hg after the use of the drug was measured. The
results arc plotted and do not reveal a preference for cither drug.
Figure 11-7 illustrates the results when two cough suppressants, heroin and
pholcodine, were compared with each other and placebo. Comparisons were
made within subject [146] and after six days the patients had tried all three
treatments and ranked them in order of preference. The trial was designed to



C

E

<

30

15r

20

10

'Lipecf better
than placebo^/

H

IO

>2

c
<u ’Ll
u
c £
<u

£

IO

Q-

o_ 'o
'o t -10

20

30

40

50

—►

Number of
preferences

w
0)
o
c
o

5

0)

0

Q.

' Lipect’v. heroin
(No significant
difference)
Number of
preferences
•.

5

10

15

26"

25

30

w

V)

<D
O
X
LU

-Q

E

5 -



E c
Z

-20

-10 -

-30

-15 L

Figure 11-6. The result of a sequential trial to compare two hypotensive drugs employed in
Reproduced
anaesthesia, phenactropinum (Trophenium or T), and trimetaphan (Arfonad or A).
*' "
with permission from Robertson and Armitage, Anaesthesia 14: 53, 1959.

Heroin better than placebo

Figure 11-7. A trial to compare pholcodine (Lipect), heroin, and placebo as cough suppressants.
Reproduced with permission from Snell and Armitage, Lancet 1:860-862, 1957.
----- Comparison of Lipcct with placebo.
.... Comparison of heroin with placebo.
------ Comparison of Lipcct with heroin.

detect a significant difference between pairs of the treatments at the five per­
cent level of significance with a power of 95 percent when 85 percent of
preferences are in favour of one drug. Both heroin and pholcodine were pre­
ferred to placebo but no distinction could be made between the two active
drugs.

i
-

11.7.2 Advantages of sequential trials
Sequential trials have the ethical advantage of terminating quickly when one
drug is an important new advance. These trials may also prove economical and
useful as a pilot study to determine the variance of the measurements.

11.1.2.1 Ethical advantages
11.7.1 The decisions that have to be made to employ a sequential trial design

As in a standard trial the levels of type I (a) and type II (£) error have to be
decided. In addition the trial may be of an open or closed plan. In other words,
the investigator has to decide whether he is prepared to allow the sample size
to increase indefinitely, or whether he will restrict the trial so that if a specified
difference between treatments is not apparent by a certain stage, then the trial
is stopped. Figures 11-6 and 11-7 illustrate closed plans. The reader is referred
to Armitage [143] for further details. Armitage recommends a closed plan for
medical trials as an unexpectedly long series of observations may be a consid­
erable disadvantage. The closed plan reduces the maximum possible sample
size.

r
I

The investigator may wish to follow the results of the trial closely and con­
tinuously and bring the trial to an end immediately when any statistically
significant difference is observed: for example, a new treatment for cancer
where the treatment is widely available. The investigator will wish to reach a
conclusion in the shortest period of time in order to ensure that the treatment,
if successful, is generally applied.
11.7.2.2 Economy
If the trial is brought to a speedy conclusion, then the financial cost of the
experiment will be reduced. This will be true only when one treatment is
much worse than another.

11.7.2.3 Use in pilot studies

11.7.3.5 Concealment of the results during the course of the trial

Anscombc [147] has suggested that if the numbers required for a nonsequential
trial cannot be calculated because the variance of the measurements is un­
known, then a pilot study may be conducted in a sequential manner until an
estimate of variance has been determined with a given precision. The second
stage can then be a standard trial or a further sequential trial.

The statistician in charge of the sequential analysis will naturally plot the
results of the trial graphically. If this clear graphical representation is seen by
the clinicians involved in the trial, they will naturally have an idea of the likely
result of the trial. As a boundary is approached they will imagine that one drug
has superiority over another. This conviction will bias their attitude to the trial
and may lead to a demand that the trial be stopped. These problems may be
overcome by making sure that only a central monitoring committee has access
to the results during the course of the trial.

11.7.3 Disadvantages of the sequential trial

The sequential trial is not suitable for long-term studies or when secondary
objectives arc important, and the method docs not guarantee economy.

11.7.3.6 Other possible disadvantages
Cochran [148] stated, “In the sequential trial, at the beginning, the doctor is
forced to make some decisions about the desired sensitivity of the trial which
he can dodge in a fixed-size trial.’’ However, if the researcher is going to
estimate the number of subjects he will require for his fixed size trial, then he
must make the decisions in the same way as he would for a sequential trial. It is
hoped that no investigator will set out on a trial without prior consideration of
whether he will recruit sufficient patients for his purpose.
The investigator embarking on a sequential trial will be attracted by the
economy in both the subjects involved and the observations required. In addi­
tion to the sequential strategy he may also include a low figure for the power
term. This will reduce confidence when the boundary is crossed for a
nonsignificant difference between the treatments. Although the sequential trial
will not differ in this respect from a small nonsequential trial with limited
power, there will be a tendency for the trial employing a sequential design to
include a lower specification for power. The tendency to use a low-power
term in a sequential trial must be avoided. Armitage [143] pointed out that a
negative trial that is obviously low in power may inhibit further work, “either
because other investigators attach more importance to the first negative results
than they deserve ... or because they have less enthusiasm for repeating
previous work than for breaking entirely new ground.’’ Lastly, reporting of
the results of a sequential trial should not consist solely of a graphical repre­
sentation. A summary of the data as a whole must still be presented with
means and standard errors, comparisons between relevant subgroups of treat­
ments, and confidence limits.

11.7.3.1 Difficulty o f use in long-term studies
The objective of many sequential trials is to bring the study to an end before
many treatments have been started. If the period of the observation is long in
comparison to the time taken to enter patients in the trial, then there is little
scope for limiting the number of patients who do enter the trial. A sequential
trial is most appropriate when the response is obvious soon after treatment is
started. Sequential trials arc therefore more suitable for the treatment of acute
leukaemia than, say, Hodgkin’s disease, which has a more prolonged and
fluctuating course.

11.7.3.2 Loss of additional information
In the standard trial the larger number of patients may allow end points to be
reported with smaller confidence limits and also may allow more observations
on related aspects such as side effects or the more severe adverse effects of
treatment. A sequential trial is therefore not appropriate when important sec­
ondary objectives have been defined.

11.7.3.3 Economy may not be achieved
We must remember that the sequential plan is more economical on average
than a nonsequential trial. However, in exceptional circumstances the sequen­
tial trial may require more, patients and observations than a standard trial. If
one treatment is only moderately worse than the other the standard procedure
and analysis may be more likely to give a statistically significant result, one
final test being performed rather than a re peated sequence of tests (section

10.7.3).
11.7.3.4 Organisational problems
A sequential trial, by definition, will last for an unknown duration. It is there­
fore more difficult to estimate the total cost of the trial or to know how long to
employ staff working on the trial.



11.7.4 Conclusions

Despite the difficulties in the design and execution of a sequential trial, the
design should be utilized more extensively. If a sequential trial is started and if
one treatment proves greatly superior, much may be gained. When the trial
fails to reveal a significant difference between the treatments the investigator
can calculate the confidence limits of any possible benef icial effect. Armed with

this new knowledge he may or may not proceed to a fixed sample trial of
known duration.

The play-the-winner trial has been proposed to limit the number of patients
who receive an inferior drug during a clinical trial [149]. A simple example
would be to keep using one drug until it first fails and then switch to the
second drug until it fails and so on.

future treatment on this basis—c.g., the so-called two-armed-bandit problem,
where the arms of a slot machine arc the treatments and inserting a coin is
treating a patient [87, 152|. All these procedures assume that all patients are the
same; since this is not true, it is safer to randomise and have comparable
groups from which conclusions can be drawn. Despite this reservation it
would be of interest to see a play-the-winner or related trial performed, and
there could be ethical advantages for the investigators involved in this form of
trial.

11.8.1 Problems with a play-the-winner trial

11.9 CONCLUSIONS ON DIFFERENT TRIAL DESIGNS

To my knowledge this method has not yet been tried in a clinical trial and
Meier claimed that, “This is testimony to the triumph of good sense over
irrelevant theory” [150]. There arc three main reasons why the method has not
been employed.

In this chapter the advantages and disadvantages of the standard parallel
groups and cross-over trial designs have been discussed.Cross-over trials tend
to be more efficient but the results can be difficult to interpret in the presence
of persistent carry-over effects. Factorial designs both within and between
subject have been discussed. These designs arc very efficient and allow interac­
tions between treatments to be detected. Latin square and Gracco-Latin square
designs allow cross-over trials to be performed balanced for order and carry­
over effects. Lastly, the advantages and disadvantages of sequential trial de­
signs have been indicated and the concept of play-the-winner trials introduced.
When in doubt, it is safest to employ the standard trial design of one treat­
ment for each person and a fixed number of subjects. However, the numbers
required may be greatly reduced by a cross-over or sequential design. When
two active treatments are to be tested, a factorial design may well prove the
most economical.

11.8 PLAY THE WINNER

11.8.1.1. The result of the treatment must be unambiguous and known essentially at once

This is rarely true. In the sequential trial new pairs of patients can be started on
treatment prior to the results of the original comparisons being known. This
would not be possible in a play-the-winner trial [87].
11.8.1.2 The population of interest is limited to (he group in the particular trial
Meier considered that if the number of patients receiving the inferior treatment
arc minimised in the group involved in the trial, it may take longer to get
enough of them to come to a reasonably sure conclusion. During this pro­
longed interval, patients in other centres may receive the inferior treatment
and suffer the consequences that the trial seeks to avoid [150].

I
11.8.1.3 The investigator may select patients who would respond to a particular drug

Chalmers [151] suggested that playing the winner will mean that the inves­
tigator having successes with drug A will expect the next patient to receive
drug A even when he docs not know’ what drug A is. Allocation to treatment
is, therefore, no longer blind and Chalmers stated, “It is very easy for a selffulfilling mechanism to get started in which the winner is ahead and only the
winning patients are more and more accepted for the study, thus ensuring that
the leader is confirmed as the winner.” However, this argument assumes that
the investigator can select winning patients for the trial (that is, patients who
would respond well to a particular drug). Presumably these patients could
have a mild form of disease or other so-called winning characteristic.

i

'1)
l

11.8.2 Conclusions on play-the-winner trials
A play-the-winner trial is one method of adaptive allocation of patients where
the results during the trial determine the treatment to be given to the next
patient to enter the trial. A more optimal strategy would be to continuously
calculate the probability of success with the two treatments and to allocate

J

!
and publication of large trials. The persons involved in running the trial must
be stated and can be grouped as follows.
12.1.1 The steering committee

12. WRITING THE PROTOCOL

The steering committee will include the principal investigators and a chairman
should be named. This committee should meet with a predetermined fre­
quency and should control the running of the trial.

12.1.2 The coordinating centre
The coordinating centre will include staff with clearly identified respon­
sibilities: clerical staff and staff responsible for quality control, data processing,
analysis of results, preparation of reports, and distribution of any medications
that may be required. At the centre, persons involved in analysis or publication
may be grouped into units or subcommittees.

12.1.3 The clinical centres
The staff involved at the clinical centres must be included in the protocol and
acknowledged in any publications.

12.1.4 Advisory board

Most large trials have a panel of experts who constitute an advisory board and
whose advice may be sought when necessary.
The protocol, or as Bcarman preferred, the manual of operations [153], may
have to serve many functions: raising monies from a funding agency; obtain­
ing the approval of an ethical committee; recruiting participants; providing a
detailed and specific list of instructions on how to perform the trial; and lastly
supplying a permanent record of what was intended in the trial. The same
document may serve all these functions, and in addition a section on finances
may be included when the document is used as a grant application.
The protocol should consist of clear statements on the following: where the
trial is being run and by whom; the background of the trial; objectives; num­
bers to be entered; eligible patients or subjects; procedures to be adopted
during the trial; duration over which the trial will be performed; handling of
dropouts; proposed analyses; criteria for stopping the trial; publication policy;
and financial consideration's. A copy of all the documents to be used in the trial
should be attached as well.

12.1.5 Funding agency
The source of funds should be stated as should the names of persons responsi­
ble for administering the grant.
12.2 BACKGROUND TO THE TRIAL

An exhaustive review of the literature is not called for in the protocol but the
reasons for performing the trial should be stated and a few references cited in
support.
12.3 OBJECTIVES

The major and minor objectives must be clearly stated and enumerated (chap­
ter 4).
12.4 NUMBER OF PATIENTS REQUIRED

12.1 WHERE WILL THE TRIAL BE PERFORMED AND BY WHOM?

It may appear self-evident that the personnel involved and the site of the trial
should be stated. However, in the ease of long-term, often multicentre trials,
many difficulties may arise. McFate Smith [154] discussed an organisational
model for a multicentre trial and considered that failure to specify the responsi­
bility of various committees has led to difficulties in the execution, analysis,
136

The number of patients or subjects to be entered into the trial in order to
achieve the objectives must be stated, together with the assumptions made in
arriving at this figure.
12.5 ELIGIBLE PATIENTS

The method of recruiting patients should be noted and the criteria for inclusion
and exclusion stated explicitly. It is also desirable to collect a limited amount of

upon. The protocol may therefore state how the final results should be pre­
sented to medical colleagues and to the general public.

information on those who were considered for the trial but not included and,
where possible, details on patients who were available but not considered. The
trial participants may then be viewed as a subset of those available and an
impression gained of how representative of the population the trial subjects arc

12.12 FINANCIAL CONSIDERATIONS

Finances should be sought to cover all the costs of the trial. It is not fair to
expect a research institution to cover the extra costs of secretarial assistance,
computing, stationery, travel, and other overheads. The application for
financial assistance should make allowance for these costs and even for mone­
tary inflation if this is expected. However, funding agencies will not usually
expect to pay the salaries of the principal investigators or their secretaries; nor
will they often be willing to provide office accommodation, typewriters, tele­
phones, and other basic items of equipment that would be expected in a
standard office.

(section 19.5).
12.6 PROCEDURES TO BE ADOPTED DURING THE TRIAL

II

This section should include the information to be given to the subjects, the
method of obtaining and recording consent, details on any run-in period, how
randomisation will be achieved, the treatment schedules to be employed,
blindness, data to be collected, and the end points for the trial. Exact methods
must be described for any measurements that arc to be made (chapter 9).
Methods of recruitment, and entry, exclusion, and withdrawal criteria must be
stated precisely.
12.7 THE DURATION OF THE TRIAL

12.13 ADDENDA

The following addenda should be included in the trial protocol.

I

An attempt should be made to estimate how long it will take to recruit the
required number of patients. The financial support necessary will depend on

12.13.1 Documents to be used in the trial

the expected length of the trial.

Examples of the recording documents should be attached to the protocol.

12.8 WITHDRAWALS FROM THE TRIAL

12.13.2 Details of methods

The protocol must state under what conditions patients may be withdrawn
from the trial and how they should be followed and treated thereafter. The
details will include those adverse reactions that require withdrawal and how
such reactions, and lesser side effects not leading to withdrawal, arc to be
detected and treated. It is important to follow all patients who are withdrawn.

Full details of all the methods to be used in the trial must be included in the
protocol, usually as addenda. Bcarman provided an example of a protocol
where it was stated that various biochemical measurements must be within
normal limits 1153]. The protocol did not state the method of performing the
biochemical tests nor the normal limits to be expected. These details must be
provided.

12.9 ANALYSIS

The outline proposed for the analysis of the results should be stated and, most
importantly, when and how often this analysis will be performed (see section
10.7.3). Details on computer facilities may be required.
12.10 CRITERIA FOR STOPPING THE TRIAL

I

12.13.3 Quality control
The protocol must state how the precision and accuracy of various measure­
ments will be determined and followed during the trial. If a trial monitor is to
be appointed the protocol should state how this person is to perform his or her
duties. Details of the training of staff must also be provided.

The criteria under which the trial will be terminated should be stated in as
much detail as possible. An attempt should be made to foresee all possible
eventualities (section 3.11): for example the mortality from one condition in
the intervention group may be reduced with statistical significance, yet total
mortality may not be reduced.

For a large trial the protocol will become, of necessity, a bulky and unwieldy
document. Each section should be carefully summarised and the protocol be
introduced by a brief report of its contents.

12.11 PLANS FOR ORAL PRESENTATIONS AND PUBLICATION

12.15 CONCLUSIONS

In a large trial involving many investigators a case may be made for planning
oral presentations and publications in advance. It may be very destructive if
those who arc aware of preliminary results leak this information before all the
investigators have been informed and before the final report has been agreed

The protocol must be written with great care. An inadequate document, with
insufficient information and sometimes containing errors, is unlikely to attract
financial support and may not meet with the approval of an ethical committee.
Also, the protocol cannot be employed as an adequate manual of operations if

12.14 SUMMARY OF THE PROTOCOL

w 11

14

pic

it lacks the necessary detail. Many large research centres in the United States
employ professional writers to finalise protocols, especially when they are
used in a grant application.
Appendix 12.16 is based on the headings in chapter 12 and allows the inves­
tigator or reviewer to check that the protocol has covered most of the impor­
tant items in the design of a trial. Similar checklists have been prepared by
Sprict and Simon [155] and the Clinical Trial Unit of the London Hospital
[156]. Undesirable responses to the questions arc in italics.

Appendix 12.16 (continued)
5.4
5.5

6.1

6.2

b.

1.1
1.2
1.3
1.4

Arc all the participants named?
Are their addresses given?
Arc steering and other committees necessary?
Is the constitution of these committees stated?

=

No
Yes
Yes
No
No
Yes
No
Yes
Not required

A2 Background to the trial

p

r1

2.1

Has ethical-committee approval been given?

2.2

Are the authors aware of the literature in their
field?
Arc the authors aware of similar trials that have
been completed?
or that arc in progress

2.3

No
Yes
Not required

Yes

No

Yes
Yes

No
No

Yes
Yes
Yes

No
No
No

A3 Objectives
3.1

3.2
3.3

Is the major objective clearly stated together with
its magnitude?
Is the objective realistic?
Docs the trial answer an important question?

A4 Number of subjects required
1

-

I

4.1
4.2
4.3
4.4

Have the numbers been calculated?
Size of type 1 error
and type II
Is the type I error two-tailed?
Can the authors recruit this number of patients?

Yes

No

Yes
Yes

No
No

f

Is previous treatment allowed?
Are the diagnostic criteria clear?

No
No

6.3
6.4
6.5

Is the design: parallel groups?
cross-over?
interaction?
sequential?
single-blind?
double-blind?
If the trial is cross-over, arc there data showing
no change in treatment effect with period?
Is there a washout period?
Will there be a pilot trial?
Will blindness be preserved?

Will the observers be adequately trained?
Are the end-point measurements valid
and repeatable?
6.8 Arc randomisation procedures efficient?
too complicated?
open to Hiaiiipulatioii?
6.9 Is the treatment fixed?
or to be titrated?
6.10 Arc the doses reasonable?
6.11 Is the labelling and checking of any drug treatment
adequate?
or inadequate? D
6.12 Is it clear which accessory treatments will be
allowed?
6.13 Will compliance with treatment be determined?
6.14 Is the subsequent treatment for patients who
complete the trial stated?
6.15 Will a trial monitor be appointed?

6.6
6.7

No
Yes
Yes
No
No
Yes
No
Yes
Not relevant
No
Yes

Yes

No

Yes

No

Yes
Yes

No
No

Yes
Yes

No
No

A7 Duration of trial

7.1
7.2

Duration for the individual patient
Duration of recruitment

I

wccks/inonths/years
| wccks/months/ycars

A8 Withdrawals from the trial

A5 Selection of subjects

5.1
5.2

Yes
Yes

A6 Conduct of the trial

APPENDIX 12.16
12.16.1 Checklist to assess the protocol for a randomised controlled trial.
An undesirable response is in italics.
Al Who will perform the trial and where?

Arc selection criteria (age, race, gender) well
thought out?
Do exclusion criteria cover all ethical problems?

Yes
Yes

No
No

8.1
8.2

Will patients who arc withdrawn be followed?
Will severe adverse reactions be detected quickly?

Yes
Yes

No
No

Appendix 12.16 (continued)

Appendix 12.16 (continued)

A9 Analyses

A13 Supporting documents

9.1
9.2

9.3

Are the proposed analyses sensible?
Will the trial be analysed on the intention-to-treat
principle?
the per-protocol principle?
both principles?
Arc any necessary computer facilities available?

Yes

No

Yes

No

Yes

No

Yes

No

Yes

No

A10 Criteria for stopping the trial
10.1
10.2

10.3
‘I

Are possible outcomes clearly stated?
Have decision rules for stopping the trial been
defined?
Will the data be reviewed efficiently as they
accumulate?

Are regular meetings planned?
Are the plans for presenting the results
acceptable?

Yes

No

Yes

No

A12 Financial considerations (n/r = not required)
Are the following staff available when required:
N/r
clinicians?
N/r
nurses?
pharmacist?
N/r
programmer? N/r
statistician?
N/r
monitor?
N/r
12.2 Are payments to collaborators
adequate?
excessive?
or inadequate?
12.3 Is the equipment to be ordered
adequate?
excessive?
or inadequate?
12.4 Are the costs of overheads, travel, and so on
adequate?
excessive?
or inadequate?

12.1
i(

li1'*'


=

!

Yes
Yes

No
No

Yes
Yes

No
No

A14 Summary

All Presentation of the results

11.1
11.2

13.1 Arc the trial documents clear?
or ambiguous?
efficient for data-proccssing?
or inefficient?
13.2 Is theinformation collected too
much?
sufficient? D
or too little?
13.3 Arc the methods of measurement well
described or not?
13.4 Arc they good methods?
13.5 Will quality control be
nonexistent?
mediocre?
or efficient?

Yes
Yes
Yes
Yes
Yes
Yes

No
No
No
No
No
No

14.1 Docs the summary do the trial justice?
14.2 Is the trial ethical?
14.3 Will (would) you fund this trial?

No
Possibly
Yes

quatc to identify the patients in the trial and show that the different treatment
groups were similar with respect to important starting characteristics. If minor
objectives have been defined these will also require the collection of extra
information.

13. INFORMATION TO BE COLLECTED DURING A TRIAL
13.2 FACILITATING DATA ENTRY

Three strategics will reduce the time spent on data recording by the individual
investigator. The patients or ancillary staff may complete some of the docu­
ments, and some information may possibly be transferred from one computer
to another.
13.2.1 The patient can complete certain documents
The patient can be asked to complete a form giving all the identifying and
demographic information discussed in section 13.3. This form can be extended
to include past medical history, past treatment, family history, occupation,
cigarette and alcohol consumption, and other items relevant to the trial.
During the course of the trial the patient can be asked to complete self­
administered questionnaires on symptomatic and general well-being (section
16.5). This strategy may save the investigator time and effort and may im­
prove the quality of the data when subjective symptoms have to be assessed.

13.2.2 Ancillary staff may complete certain documents
Information must be recorded before a subject enters the trial, throughout the
course of the trial, and at the end. Before the start of a clinical trial it must be
documented that the patients have the condition under investigation, that there
are no contraindications to their entering the trial, and that informed consent
has been obtained. During the course of the trial both the benefits and adverse
effects of treatment must be recorded to demonstrate that the patients may
safely continue in the trial. At the end of the trial the final data must be
recorded. These will include full details on defaulters and the attempts made to
contact them. This chapter also considers the quantity of data to be collected,
the design of documents, the questions to be asked, and the various stages of
data preparation.
13.1 THE QUANTITY OF DATA TO BE COLLECTED

Hamilton has cautioned against collecting too much information: “It would be
better to resist the temptation to collect every kind of information and spend
the time first in thinking more carefully about what would be relevant, and to
devise hypotheses to be tested” 1157|. Wright and Haybittlc have agreed with
this assessment and cautioned that the investigator may be unwilling to enter a
patient in the trial if there is a lot of paperwork and also that the quality of the
paperwork may deteriorate as the quantity increases [158]. On the other hand,
data have to be carefully recorded on the outcome of the trial, both on the
benefits and disadvantages of treatment; the initial information must be ade-

Clerical, secretarial, or nursing staff may prove more accurate and conscien­
tious than the investigator when transcribing investigation and other results
from the medical records to the trial documents. Whenever possible these tasks
should be completed uninterrupted by the urgency of patient consultations.
13.2.3 Direct transfer of computer-held
information to the computer-held records for the trial

Biochemical results, electrocardiographic tracings, and haematological
findings arc often held on computer tapes and, in theory, could be transferred
directly to the trial data tapes for final analysis. In fact, however, this remains a
hope for the future rather than a common occurrence at the present. The
computers that serve these different functions usually originate from different
manufacturers and the problems of tape conversion may be formidable. The
details of patient identification may also vary between the different tapes and
pose extra problems when extracting information from one computer file and
including it in another.
13.3 INFORMATION IDENTIFYING THE PATIENT

The record forms used in a trial must be considered as confidential information
but in a clinical trial the degree of confidentiality need not be greater than with
the usual medical records. It is therefore common practice to include name,
address, and other identifying features on the first document to be completed.

Subsequent record forms may include extremely confidential information or
be sent through the post with the possibility that they arc opened by the wrong
person or discarded in a public place. For example, a symptom questionnaire
has been mailed that included questions on sexual function in order to detect
drug side effects [159, 160]. It is prudent to identify these documents with a
trial number alone, and the code should be held by the investigator. The
patient should be assured of the confidentiality of the information; this is
important as documents have gone astray in hospital postal systems and been
discovered in discarded refuse (it is hoped only after the data have been ab­
stracted). The investigator must take great care of all confidential information
and the record forms should be shredded before disposal.
If the name and address are only included on the initial trial record form and
a code number used thereafter, the use of a single number alone may lead to
errors in that the wrong number may be entered on a particular record form.
For this reason and when sensitive information is not being recorded, many
investigators prefer to have both name and trial number on every document.
Similarly, two separate numbers for identification will also limit any difficulty
in identification.
13.3.1 Items required for identification

Items required include: (1) full name; (2) address; (3) trial number; and (4)
hospital number when appropriate.
13.3.2 Other demographic information required

Other information required includes: (1) sex; (2) date of birth; and (3) race.
13.3.3 Demographic information that may be required if
the patient is to be identified in central government records
If a patient in a long-term trial is lost to follow-up it is essential to determine
whether or not he has died. In the United Kingdom the Office of Population
Censuses and Surveys can inform a bona fide medical research worker, for a
small fee, whether a patient is dead or alive and if the person is dead, the office
can provide a copy of the death certificate. The tracing of such a patient will be
facilitated if the National Health Number of the patient has been documented.
Also useful for this purpose is the marital status of the patient and the name of
the patient’s general practitioner (primary care physician).

13.3.4 Other useful general information

13.3.4.1 Telephone number of patient
The patient’s telephone number must be obtained because the patient may
have to be contacted quickly.

13.3.4.2 Marital status of (he patient
This will enable female patients to be addressed correctly but will only rarely
be relevant to the conduct or results of the trial.

13.3.4.3 Name and address of the patient’s primary care physician

When the trial is not being conducted by the primary care physician he must be
informed of the details of the trial. In the United Kingdom every patient has a
general practitioner and his agreement should be sought for the patient to take
part in a trial. He must be given the opportunity of objecting to his patient’s
taking part although his written consent is not usually necessary. In my experi­
ence the cooperation of the primary care physician can be of great assistance in
a long-term clinical trial. Glaser considered that this cooperation is essential in
trials involving employees in a pharmaceutical company as the family doctor
may be aware of a condition or circumstance that makes it inadvisable for the
subject to take part in an experiment [26].
13.4 DATA TO SHOW THAT THE PATIENT IS ELIGIBLE FOR THE TRIAL

Prior to a patient’s entry into the trial it is important to document that he
satisfies all the entry criteria and has no contraindication to participation in the
trial. These criteria arc discussed in detail in section 3.9. It is essential that the
trial records confirm that all exclusion criteria were in fact negative and all
inclusion criteria positive. Quality control cannot be carried out effectively
without this documentation.
13.5 DOCUMENTATION OF INFORMED CONSENT

Normally the signed consent form will be included with the trial records; often
a copy is given to the patient. If written consent was not required then a note
must be made in the trial documents that verbal consent was obtained on a
certain date. If a third party was present at these discussions, this fact should be
noted.
13.6 DATA TO BE RECORDED DURING THE TRIAL

As discussed in section 13.1 it is very difficult to decide how much information
should be recorded during the course of the trial. Data relevant to the prime
objective of the trial have to be fully documented. Subsidiary objectives arc
usually identified (section 4.3) and data relevant to these must also be recorded.
It is also essential to record adverse drug effects and symptom side effects.

13.7 DATA TO BE RECORDED AT THE COMPLETION OF THE TRIAL
At the end of the trial the most important end points arc determined: for
example, the fasting scrum cholesterol in a trial of lipid-lowering treatment. If
a patient does not appear for this final visit or any other trial consultation, he
must be contacted quickly and arrangements made for him to be seen as soon
as possible. In a trial of drug treatment it may be possible to give the patients
an extra supply of tablets so that if they miss one visit they can continue with
treatment until seen. It is of the greatest importance to trace defaulters as they
may have died or been withdrawn owing to some adverse effect of treatment
or, if in a control group, due to an adverse effect of inadequate treatment. Such

14b

torn

o be

cd d

U l.il

I

patients must be contacted and asked to return to see the investigator in order
that the reasons for default can be determined. The trial documents must
include all the details on these patients including the methods used to recall
them, the reasons for default and in the event of death, the date and causes of

In the last 3 months have you suffered from attacks
of the following:

death.

sweating

No

Yes

palpitation

No

Yes

confusion

No

Yes

severe apprehension

No

Yes

hunger

No

Yes

sleepiness

No

Yes

13.8 THE DESIGN OF THE DOCUMENTS

The investigator must be able to complete any forms quickly and easily. The
questions must be clear and unambiguous so that the answers recorded are
correct, and the documents should be designed so that the data can be pro­
cessed accurately and efficiently. The investigator should consider the layout of
the forms, the provision of instructions to those who complete them, and the
duplication of the materials for reasons of security.

13.8.1 Layout of the forms
All documents should be headed by the name of the institute in which the trial
is being performed. If there is any possibility of the form’s going astray, the
name and address of the investigator should also appear. When the form is to
be completed by the patient it is important that a brief note accompany the
questionnaire or be printed on it stating who is asking the questions, why they
are being asked, and assuring that the answers will be treated with complete
confidentiality. The patient must also be instructed how to complete the form
(see figure 13-2).
The first items on the documents must be those collected first, as the forms
should be completed in the order in which the data arc obtained. For example,
in a trial of an antidiabetic drug, details of previous treatment will be available
first and entered first, results of clinical examination will be available next, and
the results of investigations, such as blood tests, will be available later and
entered on the end of the form.
When one investigator is completing his own trial documents, presumably
he will not be concerned with strategics that increase the proportion of ques­
tionnaires that arc completed. When patients or many different investigators
have to complete the documents, it is important to improve the layout in order
to maximise response. The reader is referred to standard texts on this subject

Figure 13-1. Section of a questionnaire given to diabetic patients [166]. The condensed format
led to some patients' indicating only positive information and omitting negative answers.

I

1161, 162].
The investigator may not be able to compete with the postal sales technique
to increase response, for example, where the recipient is told that he has won a
prize. To receive this gift he has only to stick a yes stamp into an exactly
matched space and return the document in a postage-paid envelope. (In order
to avoid purchasing another item, he may also have to take a psychologically
less attractive action, for example, refusing an additional and generous, but not
free, offer from his benefactor.) Sales techniques have not yet been widely
employed in randomised controlled trials, but if the investigator wishes a
questionnaire to be completed, he should consider the aesthetic layout of the

document, provide postage-paid envelopes, and consider whether the ques­
tions should be answered by ticking yes or no boxes rather than deleting the
incorrect answer or entering a correct answer in freehand.
To avoid errors, a box to be ticked should be sited very close to the appro­
priate answer.
A more common source of error is to shorten the questionnaire by amal­
gamating questions into blocks as in figure 13-1. In this example the phrase
“In the last 3 months have you suffered from attacks of the following” does not
have to be repeated but patients may not answer all the questions in such a
sequence, often only indicating the positive responses. A longer section asking
each question separately would result in a higher completion rate for the
individual questions and avoid having to make the assumption that no answer
equals not present.

13.8.2 Instructions to person completing the form

!

fhe instructions for completing a form should appear on the document to be
completed and not on a separate sheet as the latter may not be consulted. When
appropriate, the instructions should be adjacent to the question being asked

half the original diameter.’’ The original answer is a phrase that is not strictly

and should state how the responses arc to be recorded. For example, if the
correct answer has to be ticked, a cross may cause confusion [158].

i

I

I
i

1

ambiguous but simply difficult to understand.

13.8.3 Duplication of forms prior to use
A small trial may only involve typing the forms and photocopying the top
copies However, in a larger trial many documents may be required and
printing will have to be considered as this has certain advantages over typing,
Wright and Haybittlc [158] considered that printed words are easier to read
than typescript as typewriters do not allocate characters a space commensurate
with their size but give each letter equal space. These authors also advised a
print size comparable to newspaper text and they warned against the use of
capitals alone rather than both lowercase and capital letters. Hie use of capitals
on their own can increase reading time by 12 percent [163].
A stencil, photocopying, or printing will provide the forms to be completed
but there remains the problem of duplicating the documents after use. No-

13.9.2 The question should use positive terms
Clark has shown that positive terms are easier to understand than negative
terms [164]. For example, the question, “is the right first toe longer than the
right second toe?” is to be preferred to the question, “Is the right first toe

carbon-required paper may be useful.

13.9.4 The question must only require a single response
If an observer is asked to record poor circulation to an extremity, he can be
asked, “Is a hand or foot white, blue or cold?” The answer options are yes or
no Later, however, the investigators may regret that they did not record
whether it was one hand, both hands, a hand and a foot, or even whether only
certain fingers or toes were affected. Similarly they may wish to know
whether the affected extremities were white, blue, or cold. A series of ques­
tions leading to single responses would be preferable. Alternatively, the inves­
tigator can be asked to list the fingers or toes that are white and, similarly,
those that are blue and those that are cold. If the question is crucial to the
outcome of the trial, the temperature of each extremity should be recorded, as
an objective measurement is always to be preferred to a subjective assessment,
assuming the objective measurement is both valid and repeatable.

shorter than the right second toe?”
13.9.3 The question must be easily understood
The question must not include difficult words and must be understood by the
least intelligent observer. Never use a long word when a short one will do^
Abbreviations should be avoided if they arc not familiar to all investigators and

the questionnaire should also be grammatically correct.

13.8.4 The use of no-carbon-required paper
The documents in a clinical trial are of great importance and the use of no­
carbon-required paper will allow all recordings to be made in several cop.es.
The copies should be filed separately as an insurance against damage by fire or
water, loss in the post, or theft. The loss of research data by theft will probably
be accidental, but imagine that your only set of trial documents might be
stolen along with your car! Great care must be taken of the information. I well
remember the fate of some data I had laboriously collected and recorded w.th a

1

I

fountain pen containing washable ink. Someone left a tap runmng over a
weekend in a laboratory on the floor above my office and following the
deluge of water through the ceiling, the data were more than difficult to
.nmmred
Collect all
interpret! Collect
all data
data in
in duplicate
duplicate or triplicate and store the copies

13.10 THE RESPONSE TO THE QUESTION

separately.

I

I

13.9 THE FORM OF THE QUESTIONS AND ANSWERS
The characteristics of a good question for a patient are discussed in section
16 3 The present section considers the most desirable features for questions in
a trial document to be completed by an investigator: namely, lack of ambi­
guity use of positive terms; case of comprehension; and necessity for a single
response. There is obviously a considerable degree of overlap between this

section and section 16.3.

13.9.1 The question and answer options must not be ambiguous
Wright and Haybittie provided a good example of question and answer op­
tions [158]. In response to a question on the size of a tumour mass, an answer
option was: “Reduction of tumour diameter by less than 50%. 1 he authors
suggest the answer would be better phrased as, “smaller but not as small as

o

13.10.1 The observer should be asked to
tick a response rather than enter a coded reply
An investigator may be asked to indicate whether the patient is taking an
anxiolytic drug or not and, if so, which one. He could be asked to enter the
name of the drug, but alternatively he could be asked to tick the name on a
complete list of possible drugs or to examine the International Nonproprietary
Names for Pharmaceutical Substances Classification [165] and enter the nu­
merical code provided by this classification. The provision of a complete list of
drugs will enlarge the document considerably but will have the advantage of
reminding the investigator of the names of antianxiety drugs.
There arc two problems in entering a numerical code from the international
classification. First, the form may have to be completed in a hurry in a busy
clinic with no time available for the document to be consulted, second, a

!

physician-recorder, who rarely performs coding duties, may be less accurate
than trained clerical personnel whose main job is to provide this service.
In general, the investigator should be asked to enter the response in freehand
for subsequent coding or to tick a list of options rather than enter a coded

Please tick ( /) correct box or enter required number.
All information will be treated with complete
conf ident iali ty.

For use of medical
staff only.

response.
13.10.2 Enter responses in preprinted boxes
The provision of boxes may or may not improve the completion rate, but they
should be used for numerical data in order to facilitate later entry to computer
files. It has been shown that entering data directly into boxes may increase
writing time by ten percent and reduce legibility by three percent [ 158]. How­
ever, when the data arc subsequently converted to a machine-readable form
time will be saved and some errors may be avoided. Even when the data
cannot be punched directly, as with a ticked response, it is often convenient to
include a box to hold this information. The boxes also indicate where an

Please state
Your
Ful1 Name ..

Trial number

Is your father still living?
No

Yes

4

answer is expected.

13.10.3 Order of yes and no answers
When several yes/no answers arc required consecutively it is advisable not to
have a particular response always stated first as it has been suggested that some
respondents prefer to tick the first box irrespective of its contents. Occasion­
ally reversing the position of yes/no answer options may inhibit a repetitive
response but may lead to errors when most first answers arc, say, yes and
suddenly one is no. The no may be ticked in mistake for a yes. The best
solution may be to roughly alternate the yes/no and no/yes answer options.
13.11 CODING THE RESPONSES
The information provided on the trial documents has to be analysed; in most
trials computer programs will be employed to limit the errors in calculation
and allow complex statistical computations to be made. If the arithmetic tasks
to be performed arc not complex, analysis by computer program may take
longer than the use of a calculator as all the qualitative data such as gender have
to be transformed into numerical codes that can be read by a machine. How­
ever, the investigator usually attempts so many analyses—for example, of
many variables in several subgroups—that the use of computer programs is
inevitable. The data must then be transferred to computer punch cards, paper
tape or directly entered to a computer file via a visual display unit and an
attached keyboard. The data must be coded to a numerical format and
identified as requiring transfer to a computer file. The position that the data

must occupy in that file must be indicated.
Figure 13-2 gives a section of a questionnaire to be completed by a patient
and figure 13-3 gives the instructions for coding the information when the
codes arc to be written by the investigator in the part of the questionnaire

If ’Yes' how old is he now?

Years

5-7

If ’No' how old was he when he died?

Years

8-10

nplc of part of a questionnaire, designed so that coding and subsequent
Figure 13-2. An cxan\
punching on computer cards is facilitated.

headed “For use of medical staff only.’’ The rest of the heading of figure 13-2
instructs the subjects in how to complete the form and confirms the
confidentiality of the document. The patients arc asked to enter their full name
but only a trial number is entered into the computer file. Entering the full
name is time consuming, takes up space in the computer file, and raises ques­

tions of confidentiality.
1 he coding document illustrated in figure 13—3 instructs the coder to enter
the trial number with leading zeros on the left, into boxes 1-3. The coder then
follows the instructions for boxes 4 to 10. The answer no to the question, “Is
your father still living?’’ is designated as the figure 1; the answer yes as 2; and
no answer is to be coded as 9. Similarly, instructions arc given for the coding
of father’s age when alive, and when dead, his age at death. Alphabetical
information can be entered into a computer file but is not so easily anallysed;
therefore the answer no is coded as 1 and not N.

mu

Is your father still living?
Column (boxes)

1-3

Enter trial number eg.

30

3

0

4

4

0

Yes

No

4

No

2

1

- 1

Yes
■ 2
No answer - 9

Is ’Yes’ how old is he now?
5-7

Age now if living

eg.

0

6

0

5-6
Years

No answer
Not applicable ie. dead

- 999
- 888

If ’No’ how old was he when he died?
7-8

8-10

Age at death

No answer
Not applicable ie. alive

Years

- 999
- 888

Figure 13-4. A questionnaire with automatic coding of answiers (see text).

Figure 13-3. Coding document for section of questionnaire illustrated in figi•ure 13-2.

example, in the margin); coded and written on a separate sheet, or entered
directly to the computer file without a separate coding stage.

13 12 IDENTIFICATION OF INFORMATION TO BE TRANSFERRED
AND THE POSITIONING OF THE DATA IN COMPUTER FILES

The person transferring the data may be instructed to punch or key all the
information included in boxes. In figure IM these boxes are further .dcnt.ficd
by their inclusion under the heading "For use of mcd.cal staff only
The
positioning of the data in the computer files is indicated by the number
adjacent to the boxes. In this example punch cards are to be employed with a
certain number of columns and a punch card operator is asked to punch the
subject (trial) number in columns 1 to 3. the answer to the question, Is your
father still living?" in column 4 and so on. The most commonly used punch
cards have 80 columns and can take 80 numbers.

I
I

i

I

13.13 DIFFERENT METHODS OF DATA CODING

The transformation of information into a numerical form for entry into a
computer file is known as coding. The data can be coded on the document (for

13.13.1 The information may be transferred
to a separate column on the questionnaire
Figure 13-2 illustrates this arrangement: a right-hand column on a patient
questionnaire being reserved for the use of medical staff. The coding is carried
out on the same piece of paper adjacent to the response, and the number of
transcription errors along with the time taken to code data should be reduced.

J

13.13.2 The information on the documents
can be transferred to a separate sheet
The data can be coded and written on a form or sheet that facilitates transfer to
the computer system. When cards are to be punched these sheets consist of 80
columns, one line per card. This method often leads to transcription errors and
takes the most time. However, the information on several trial documents can
be contained in one punching form and may be more easily entered as a punch

operator will spend less time turning over a number of separate trial docu­
ments With this system, and when using a patient questionnaire, the original
documents will lack the column headed ‘Tor the use of medical staff only.
The questions may appear less complicated to the patient and may possibly be

14. THE CONDUCT OF THE TRIAL

completed more frequently.

Figure 15-4 illustrates how part of the questionnaire in figure 13-2 could be
modified for direct computer entry. The number over the top of the box gives
the column number to be entered on a card and the punch operator punches a
in column 4 if no is ticked and a 2 if yes is indicated, this instruct.on .s given at
the right lower angle of the boxes. With the information on age, the punch
operator simply punches the age given in columns 5-6 or 7-8 When no
answer is given the punch operator would leave the columns blank but could
be instructed to enter 9’s for missing data. Unfortunately cither way there
could be confusion between ‘not applicable’ and ’not answercdfAlso. if three
boxes are allowed for age then some patients will record | 7 16JJ and others
[ | 7 |'6~| and this may lead to confusion. The punch operator has to be trained
to punch this information in the correct right justified manner.
This system removes the coding phase of the data-processing operation but
the greatest problem with direct entry is that the keyboard operator, working
at speed, may not notice inconsistencies, errors, and scribbled comments.
Trial documents must be scrutinized for such problems, and this can be done
conveniently during a coding phase. If the coding phase is omitted, the trial
documents will still have to be briefly examined.

13.14 CONCLUSIONS
This chapter has outlined the information that should be documented for an
individual subject, before, during, and after the performance of the trial. A
compromise has to be made between collecting all the information t lat con c
conceivably be of interest and limiting the effort involved by documenting
only the most essential data. The design of the documents was discussed,
including their layout and the form of any questions. The methods of coding
trial data were introduced, and strategies to facilitate data entry and subsequent
processing were discussed.

This chapter considers some of the practical difficulties in performing a trial
and the problems that may arise. These may be divided into those that arc
foreseen and taken account of prior to the start of the trial and those that arc
unexpected. Murphy’s law states, “If anything can go wrong it will.’’ This law
has been attributed to captain E. Murphy, a development engineer, who ap­
plied it first to an individual technician saying, “If there is any way to do it

wrong, he will’’ |167].
Owing to the probability of unforeseen difficulties it is desirable to have
either a separate pilot trial or to commence the trial in a pilot fashion using a
provisional protocol. If the pilot trial proves satisfactory and the protocol only
requires minor modifications, the trial may continue; otherwise the initial
design is abandoned as impracticable. After considering the pilot trial, this
chapter discusses the use of a run-in period, the problem of noncompliancc by
the subject or the investigator, and the difficulties that arise in stopping a trial.

14.1 THE PILOT TRIAL
It is important that the protocol and design for a large or long-term trial is
tested either in a separate pilot trial or during a preliminary period of pilot
running during which the protocol is open to amendment. The performance
of a pilot trial allows the technique of measurement and treatment to be tested,
the optimal treatment schedule to be determined, the administration of the
trial to be tested, and the rate of recruitment to be assessed. Also, a preliminary

estimate of the treatment effect may be obtained and the wider implications of

with a view to subsequently altering mortality, the change in the factor can be

the trial become more clear.

confirmed in 'he pilot trial.

14.1.1 The examination and therapeutic techniques are tested
During the pilot trial an examination procedure may prove to be unsatisfactory and can be modified. Similarly a therapeutic regimen can be altered if
that proved necessary in
necessary. In section 9.2 the alterations were discussed
c

order to measure visual acuity during a trial.
14.1.2 The optimum treatment schedule may be identified
The pilot trial may lead to a modification of the dose schedule for a drug by
revealing that a large dose produces an unacceptable excess of side effects.
Alternatively the pilot trial may demonstrate that an insufficient dose of the
drug is being administered. This can be rectified subsequently in the mam trial.
When two active drugs are being compared in a long-term trial the pilot tna
may confirm that they are being used in equipotent doses with respect to their
short-term effects. For example, if two drugs arc being compared for the
prevention of gout over two years, it will be sensible to select doses that reduce

the scrum uric acid by a similar amount.

14.1.3 The administration of the trial can be tested
The early stages of a trial will involve the entry of patients, randomisation, and
the provision of treatment. A coordinating centre may be responstble for the
processing of trial documents, randomisation, and the distribution of drugs.
All these administrative functions can be tested in the pilot study.
14.1.4 The rate of recruitment can be assessed
Muench's third law states, “In order to be realistic, the number of cases prom­
ised in any clinical study must be divided by a factor of at least ten. The law
has two important corollaries: “the length of time estimated as necessary to
complete a study must be multiplied by a factor of at least ten and the sum
of money estimated as necessary to complete a study must be multiplied by a
factor of at least ten (without inflation)" [96]. There is more than a gram of
truth in Muench’s third law and the pilot tnal may provide an assessment of
the recruitment rate that can be achieved in the main trial.
14.1.5 A preliminary estimate of the efficacy of treatment can be made

Although a trial will be designed to detect a certain magnitude of effect,
presumably based on pnor information the pilot tnal may be usefu m
confirming that this effect can be achieved under the condmons of the trial. A
pilot trial cannot be expected to detect a given reduction m mortahty or
morbidity as a long period of observation may be rcqmred for such an observation. However, if the trial is intended to alter a risk factor by a given amount

14.1.6 The pilot trial may enable the investigator
to consider the wider implications of the study

('.laser has stated, “A lesser study of an important question is usually of more
value than an excellent study of a trivial question” [26], The reader may not
agree with this opinion, but in a pilot trial the investigator may realise that
more important questions could be answered by the trial. He may adjust the
protocol accordingly to include more subjects or observations than initially
envisaged.
14.1.7 The disadvantages of a pilot trial

A pilot trial may delay the start of the full trial, but this may be overcome by
starting the major trial with a period of pilot running. This preliminary inves­
tigation may not lead to any major change in the trial protocol, but a minor
modification may be most important and occasionally the initial trial protocol
will be abandoned. However, the results of a pilot trial may be misleading.
Muench’s first law states, “No full-scale study confirms the lead provided by a
pilot study.’’ Other statements have been made as well, including. “Studies
can be called pilot studies and so avoid criticism for poor design” and “Pilot
studies arc a waste of time, money and effort" [96]. It must be emphasized that
the results of a pilot study must be treated with great caution; after all, they arc
only an initial reconnaissance expedition. I lowcver, pilot studies do lead to
important changes in protocol, are often invaluable, and arc always to be
recommended. If the protocol is not altered in any major respect the results
obtained in the pilot trial may be incorporated in the final analysis, provided an
adjustment for repeated looks is made when necessary (section 10.7.3).
14.2 THE RUN-IN PERIOD

The run-in period for a trial is a period of observation during which subjects
arc considered for entry to the trial. At the end of the period they arc ran­
domised to a treatment group if they prove to be eligible. A run-in period is
not obligatory and certain advantages and disadvantages arc given in table 141, and discussed in the sections below.
A run-in period may be appropriate if a preliminary period of observation is
required to prove that the patients have a certain condition or to establish other
baseline information. The time spent in the run-in period may also be used to:
allow any effect of a placebo to take place and thereby reduce this effect alter
entry to the trial; detect noncompliancc with therapeutic advice and exclude
such patients; and allow the patients time to consider whether or not they wish

to take part in the trial.

Table 14-1. The advantages and disadvantages of
having a run-in period prior to randomisation in a trial.
Advantages

1. Diagnosis can be ascertained
with greater certainty.
2. Allows baseline measurements to
be made.

3. Placebo response may be re­
duced after randomisation.
4. Some noncompliant patients
may be identified and excluded.

no

Disadvantages

1. Patients in the trial arc less typical
of the general population of
patients.
2. Dropout during the• run-in period
may increase the total number of
defaulters.
3. Ethical problems.
4. Expense.

5. Dropout rates arc reduced after
randomisation.

!

A

cn

t
I

100-

B

Plocebo treatment

UJ

co

C
Q

o
CQ

90-

Active treatment
started at B

14.2.1 Establishing that the patients have the condition under investigation
If a patient is only examined on a single occasion, it may be difficult to
establish a particular diagnosis with confidence. For example, it may be neces­
sary to repeat measurements on subsequent occasions to prove that the patient
has, say, a consistently high blood pressure, fasting scrum cholesterol, or
blood sugar. In addition, it may be necessary to order further investigations to
prove that the condition is not secondary to a pathological process that re­
quires immediate attention and that would exclude the patient from the trial.

Active treotment
started at A

O

♦“Run-In period

TIME
14.2.2 Establishing baseline measurements
Before commencing treatment in the trial, baseline measurements may have to
be made; this period of investigation can constitute a run-in period for the trial.
The establishing of a stable baseline is particularly important when regression
to the mean is a problem (section 9.1.4). Failure to establish a baseline mea­

Figure 14-1. A hypothetical trial of the effect of an active treatment and placebo on diastolic
blood pressure. When active treatment was started at A, a 15-mm Hg fall in pressure was judged
satisfactory at C and the dose of active treatment was not increased When active treatment was
started at B. following a run-in period, a 5-mm Hg fall was not judged satisfactory at C and ac­
tive treatment was increased (—).

surement is especially important when a placebo effect is present.
14.2.3 Estimating the placebo effect
During the run-in period it may be appropriate to give a placebo in order to
assess the response to this treatment. If there is a large response to placebo, this
may mask the effect of active treatment. For example, in some trials the dose
of a particular treatment is not fixed and has to be increased in a stepwise
manner according to the response. This titration or increase in treatment may
be difficult if a large early placebo effect is occurring. This situation is illus­
trated in figure 14-1 where active antihypertensive treatment has to be com­
pared with placebo and measurements of blood pressure arc presented in a
schematic fashion for a patient whose initial casual diastolic pressure is 110 mm
Hg falling to 100 mm Hg after one month on placebo. Ibis fall is due to a
combination of regression to the mean, becoming accustomed to the measure-

ment, and a possible effect of a placebo tablet. Figure 14-1 also provides the
pressure when active treatment is started at A with no run-in period, and at B,
after placebo run-in period of one month. When the drug is started at A, the
investigator may be content with a 15 mm Hg fall in pressure after one month.
On the other hand, when active treatment is started at B the investigator is less
likely to be content with a 5 mm Hg fall and may increase the treatment at
point C. In this example, the presence of a placebo run-in period results in a
wider separation between the effect of placebo and active treatment than when

a run-in period is not employed.
Trials with a placebo run-in period and a titration of antihypertensive medi­
cation tend to reveal a larger fall in blood pressure on active treatment than
trials without a run-in period. The EWPI IE trial with a run-in period reported
a difference between the active and placebo groups of 25 mm I Ig systolic ami

ALL PATIENTS WITH
MYOCARDIAL INFARCTION

12 mm Hg diastolic [168], whereas the MRC trial without a placebo run-in
period observed a difference of 15 mm Hg systolic and 7 mm tig diastolic
[106]. There arc other differences between the two trials that may account for
these results, but the MRC Working Party reported, •‘Differences in mean
pressure between treated and control . . . were less than expected because t ic
fall in pressure in the controls was greater and more prolonged than cxpcctci
It is important to start the trial after any nonspecific treatment effect has

11,125 ENTER A CCU*

6,970 DO NOT MEET
CRITERIA FOR INFARCTION

terminated and the baseline measurement is constant.
14.2.4 Noncompliance can be determined
Noncompliancc with therapeutic advice may be determined when a placebo is
given during the run-in period (section 14.4). Similarly the willingness of the
patients to return for follow-up appointments is tested as well as their com­
pliance with biochemical, radiological, and other investigations. Noncompliant patients may be excluded from entering the trial, but the trial population
will be less representative of the whole population as a result.

I
j

I

i

i
I

14.2.5 Dropout may be reduced following randomisation
Default from follow-up during the main course of the trial should be reduced
following a placebo run-in period as patients who arc unwilling to return for
repeated visits or investigations will be excluded. Also some patients become
ill soon after starting placebo and attribute the illness to the placebo mcd.cation. These patients will usually be excluded or will exclude themseIves from
the trial Many of these patients do not experience a coincidental physica
illness but arc worried about the trial treatment and experience a psychological
reaction Patients who react adversely to placebo medication (and therefore
can be expected to react adversely to other treatments) may therefore be ex­

cluded. In addition, the run-in period gives the patients time to rcconsiccr
whether they are really willing to participate in the trial. Default appears to be
more frequent in the initial stages of a trial and a run-in period should exclude a
high proportion of trial dropouts.
A run-in period may reduce the degree to which the patients entering the
trial are representative of the population as a whole. It may ra.se the overal
default rate, increase the expense of the trial, and pose additional ethica

problems.

14.2.6 The trial population may no longer be representative of all patients
Randomisation of subjects eligible for a trial will usually ensure that the differ­
ent treatment groups in a single trial arc similar for important characteristics.
However different trials will include dissimilar patients. I lampton has sug­
gested that similar trials of secondary prevention in myocardial mfarct.on may
arrive at different results owing to the unequal characteristics of patients enter
ing the trial [169]. He pointed out that the mortality in the placebo group may
of the population
provide a clue as to 1how representative the trial patients arc

508 DTE BEFORE
EVALUATION

I

666 HAVE CONTRAINDICATIONS TO
TRIAL TREATMENT

DO NOT ENTER A CCU
? NUMBER

4,155 MEET CRITERIA
FOR INFARCTION

3,647 EVALUATED
FOR TRIAL

V

162 NEED ALTERNATIVE
TREATMENT

321 NEED TRIAL
TREATMENT

354 REFUSE, ETC.

260 HAVE SERIOUS
CO-EXISTENT DISEASE

w
1884 RANDOMISED
TO TRIAL

Figure 14-2. Recruitment to the Norwegian Multiccntre Trial of Secondary Prevention in my­

ocardial infarction using timolol 117(l|.
*CCU = coronary care unit. Some patients were admitted and evaluated more than once.

.,r patients as a whole since the usual survival of patients with myocardial
of
infarction is known from prospective population-based epidemiological stud­

ies.
It is possible that the results of a trial will depend on the seventy of the
disease process; it may be difficult to recruit patients who represent the whole
community and have an average severity of the condition. Indeed, this may
not be desirable if only patients with a given degree of disease severity arc
considered suitable for the treatment. A recent trial of a bcta-adrcnoccptor
blocking drug in the secondary prevention of myocardial infarction [170] pro­
vided documentation both on patients who entered the trial and all patients
considered for the trial. This information may be very valuable. Figure 14-2
gives the details for this multiccntre trial. An unknown number of Norwegian
subjects aged 20-75 years sustained a myocardial infarction, but about 11,000
were admitted to coronary care units during the period of recruitment for the

trial. Of these, less than half met the stringent trial criteria for myocardial
infarction and 508 died too quickly to enter the trial. The remaining 3.647
were evaluated for the trial but only 1,884 were randomised. The reasons for

14.3 THE PROBLEM OF WITHDRAWAL FROM A TRIAL

Subjects may be withdrawn from a trial because they default or do not comply
with the protocol. They may also die from causes unconnected with the trial
or its treatment. The most important patients arc those whose withdrawal is
related to the trial end points.
At least four percent of patients taking part in a long-term trial may default
from follow-up every year despite attempts at recall 137|. The possible number
of defaulters can be estimated and the numbers required can be adjusted ac­
cordingly. During the course of the trial it is very important to establish that
the pattern of default docs not differ between the treatment groups. When the
reason for default or the proportion of defaulters docs differ, omitting them
from the analysis may bias the results; the problems of such an analysis arc
discussed in section 15.7. The results of such a trial must be analysed on an
intention-to-treat basis (that is, without omitting defaulters). The alternate
analysis when subjects who do not follow the protocol arc omitted has been
called a per-protoeol analysis.
The numbers required for a trial will increase according to the proportion of
defaulters. With a trial to be analysed on the per-protoeol basis the increase
will ensure sufficient numbers to demonstrate a given effect and with an
intention-to-treat analysis the proportion of defaulters will indicate the extent
to which the effect of treatment may be diluted.

not entering the trial were: contraindications to trial treatment; serious coexis­
tent disease; trial treatment necessary; alternative treatment required; and ad­
ministrative reasons such as the refusal of the patient At the most optmnst.c
estimate, less then ten percent of the population sustaining a myocardial infarct
entered the trial. This shows how difficult it would be to recruit a representa­
tive sample.
.
. , , • --------- ]css representative, as the
A run-in period may make the trial subjects even
information obtained during this interval may result iu the exclusion of patients who do not have all the features of a disease or who arc unwilling to
attend regularly for supervision and investigation or who do not comply with
therapeutic advice. The trial population, unrc|•presentative of the total popul.ition in the first place, is further narrowed to a group who arc willing to take
advice and medication and to put themselves at considerable inconvenience for
the benefit of medical research. This is acceptable as clinical trials are usually
intended to be explanatory in nature and examine the effects of treatment in
those who receive the intervention. However, it is not surprising that some­
times the results of clinical trials may not apply to the population as a whole.

1

I

I

i
i

I

Ji

I-

I;
i.

i;

14.2.7 A long run-in period may result in
an increased total number of defaulters
The run-in period may so increase the length of the trial that the default rate
from the trial as a whole is increased, even when the dropout rate is reduced
for the period of the main trial. However, dropout after randonnsatmn .s
much more important than default during the run-m period. Hie latter only
reduces the numbers available for the trial but default after randomisation may
reduce the comparability of the different treatment groups. If the run-in period
reduces the default rate after randomisation, this will be a major advantage

I
1

14.3.1 Withdrawals due to reasons unrelated to the end point of the trial
Subjects may be withdrawn when they move address, emigrate, or develop
some condition that prevents their further participation in the trial (for ex­
ample, a fractured spine following a road traffic accident). When calculating
the numbers for a trial the number of dropouts must include an allowance for
these withdrawals. Such withdrawals should not bias the results of the trial and
should be equal in all groups.

(section 4.2.5).

1

14.3.2 Deaths from causes unrelated to the trial end point
I

14.2.8 Increased expense
’ 1 to the expense. If a
A run-in period will prolong the length of the trial and add
cost
of
placebo is to be taken during this period, then the
eftthis medication and
its administration must be taken into account.

14.2.9 Ethical problems with run-in period
It may not be possible to leave patients untreated or to give only a placebo for a
period of time. This is discussed in sect.on 3.8. When deciding whether or not
to have a run-in period the investigator must examine the advantages and
disadvantages for the trial under consideration. When one of the trial treat­
ments is a placebo, a single-blind, run-in period on placebo is usually

desirable.

l he probable number of unrelated deaths can be determined from national
statistics. In a long-term trial these numbers should be incorporated in the
withdrawal rates, especially in trials of treatment in the elderly. When analys­
ing on the per-protoeol basis the rates for unrelated death in the trial must be
scrutinised for differences between the treatment groups.

14.3.3 Withdrawal due to criteria related to the trial end points
Withdrawal for reasons related to a trial end point may pose great problems in
analysis. The problem can be illustrated by trials of the long-term active
treatment of hypertension versus placebo: the end point here is stroke events.
We can consider two examples of withdrawals related to the end point: the

rise of blood pressure, but it cannot be concluded from this evidence that treatment
effectively reduces the risks of death or morbid events due to cardiovascular disease.

development of an adverse cftcct to active treatment and a deterioration in the
hypertensive disease while on placebo.

14.3.4 Methods for reducing withdrawals

1433.1 The development of an adverse effect to active treatment

It is possible that patients with cither mild or severe disease arc more prone to
get an adverse effect to active treatment and be withdrawn. In the EWP1II:
trial [43] two treatments arc employed in the actively treated group. Initially a
combination of diuretics is given and if blood pressure control is not satisfac­
tory, mcthyldopa is added. Therefore patients with severe hypertension in the
actively treated group may be withdrawn owing to adverse reactions to
mcthyldopa. In the placebo group severe hypertensives arc less likely to be
withdrawn as a result of placebo mcthyldopa treatment. I lowcvcr, there arc
other reasons why severe hypertensives may be differentially withdrawn from

I

i

I
I

The withdrawal or dropout of subjects from a trial can be reduced by a run-in
period to detect patients who react badly to placebos or miss appointments; by
excluding patients who have a high probability of withdrawing; by not mak­
ing the withdrawal criteria too stringent; and by reducing the length of the
trial. Withdrawals may also be reduced by maintaining dose contact with the
subjects during the trial, limiting demands on their time or patience, and
continuing efforts at recall when they first miss an appointment.

!

143.4.1 Employ <1 run-in period

I

the placebo group (section 14.3.3.2).
It is not always necessary to withdraw the patients because of an adverse
effect. For example, hypertensive patients on diuretic treatment may develop
diabetes mcllitus. The trial protocol can stipulate withdrawal, antidiabetic
treatment, or substitution of the diuretic treatment by another active antihy-

As discussed in section 14.2 the run-in period may be employed to reduce the
number of defaulters after randomisation. I he dropout rate appears to dimin­
ish with time and often subjects agree to enter a trial but do not return for their
second visit. These early defaulters will be excluded during the run-in period
and if a placebo is given, those who react with many symptoms and cannot

pertensive drug.

take this medication will be excluded.

1433.2 Withdrawal of patients due to a deterioration in the disease process

143.4 2 T.xclude patients who may have to withdraw

A trial of antihypertensive treatment in the prevention of stroke may enter
untreated hypertensives with a diastolic pressure less than 100 mm 1 Ig. How­
ever, there is evidence that patients with diastolic pressures higher than 104
mm Hg require treatment. For ethical reasons it may be decided to withdraw
from the trial patients who develop diastolic pressures greater than 104 mm
Hg. Such patients will not (we hope) have sustained a stroke but they will not
have reached the important trial end point. Moreover, these patients will conic
almost entirely from the placebo group [98|. As a result of such withdrawals
the placebo group may contain progressively fewer severely hypertensive pa­
tients as the trial continues. The patients who arc withdrawn cannot be said to
have reached the stroke end point and yet cannot remain in the trial as it is
believed that they may sustain a stroke. Randomisation should have provided
groups that were similar at entry to the trial but selective withdrawal will
destroy this equality. The problem is so great that one group of workers who
failed to observe a stroke in either placebo or actively treated groups wrote

Patients arc often excluded from long-term trials if they have a serious coexis­
tent disease that may limit their survival or ability to take part in later phases of
the trial. Similarly, subjects who already know that they arc going to move to
another district, emigrate, or change their occupation arc likely to default as
the trial progresses; these subjects can be excluded from the trial.

[98|:

If withdrawal because of rising blood pressure is regarded as a unsatisfactory outcome
then there were many more of these cases in the control than the treated group, but we
arc doubtful of the validity of this interpretation. Another consequence of these with­
drawals is to alter the comparability of the groups. However comparable the two
groups may have been at the outset, within 1H months this comparability had disap­
peared. In the present series the mam function of treatment seems to be to prevent the

I

143.43 The u’ilhdraii'dl criteria must not be too stringent

In a trial lasting for, say, five years, it would not be reasonable to withdraw all
patients who missed one assessment or failed to take their medication for a
onc-wcek period. Absence of follow-up for six months or no trial treatment
for a total of three months would constitute more reasonable criteria for with­
drawal. Similarly, it may be reasonable to allow additional treatment to be
given during acute illness without withdrawal even if this interferes with the
effect of the trial treatment in the short-term. For example, an elderly patient
in a long-term trial of antihypertensive medication could be allowed diuretic
treatment during an episode of acute bronchitis and not be withdrawn from
the trial, even though this therapy will lower blood pressure in the short term.

14.3.4 4 The trial must not he too long
The longer the trial the greater the possibility that the patient will drop out or
be withdrawn. The parallel-groups trial design has an advantage in this respect
over the necessarily longer cross-over trial.

14.3.4.5 Miscellaneous factors
The subjects must be free to withdraw from the trial at any stage and cannot be
asked to commit themselves to taking part for the whole duration. It is likely
that the continued enthusiasm and interest of the same investigator may moti­
vate the patient to continue. Factors that reduce default arc not well docu­
mented, but the patients should not be kept waiting for long periods to see the
investigator. They should be given their treatment and investigation without
charge and be helped, where necessary, with the cost of transport to the trial
centre.

14.3.5 What is the answer to groups made unequal by withdrawals?
Despite the measures discussed in section 14.3.4, withdrawals may occur and
the remaining patients may be unrepresentative of the original randomised
groups. During the course of the trial the withdrawn patients can he treated in
three diflcrcnt ways. First, they can be paired with patients in the other treated
group(s) who arc also withdrawn; second, the patients withdrawn from one
treatment group can be transferred to another; and third, the withdrawn pa­
tients can be considered to have reached an important positive or negative trial
end point. At the end of the trial, withdrawn patients may be excluded from a

»

per-protocol analysis.

I

I

I

1
I

14.3.5.1 Withdrawal of similar patients from the other ^ronp(s)
We can consider again the example of active treatment producing diabetes
mcllitus. Patients developing this condition may be identifiable at entry to the
trial (for example, from a high fasting blood sugar). If this is the case, for every
patient removed from the active treatment group, a patient could be removed
from the placebo group who was matched for initial blood sugar and other
characteristics. This approach is full of difficulties as the initial blood sugar
may not predict subsequent events and, if it is a good predictor, it would be
better to exclude all patients with a high fasting blood sugar from entering the

trial.

14. J.5.2 Patients withdrawn front one xronp sIiohM transfer to another xronp

In a within-patient cross-over trial a patient having to be withdrawn from one
treatment can proceed immediately to the next but with loss of data in the first
phase of the trial. In a placebo-controlled trial the patients having both a
placebo and an unacceptable response would be transferred to active treatment
and those in the actively treated group with an adverse effect would transfer to
the placebo group. Although this is acceptable for a cross-over trial, with a
bctween-subjcct design the final treatment groups will become dissimilar with
respect to presenting characteristics and the procedure cannot be recom­
mended. However, the authors of the Co-operative Randomised Controlled
Trial quoted in section 14.3.3.2 suggested partial treatment for patients with-

I

drawn from the placebo group owing to severely high blood pressure: “One
possible solution might be to maintain the diastolic pressures of the patients in
the control group between 110 and 120 mm Fig, or whatever upper limit of
diastolic pressure was considered acceptable and compare the deaths and mor­
bid events in this group with those in whom the pressure is maintained below
100 mm Hg.” I bis strategy will compare full treatment in one group with
partial treatment in another, hardly a satisfactory solution.

14.3.5.3 Make withdrawal criteria important end points for the trial
In a trial with mortality as an end point it would be inappropriate to equate an
adverse effect of treatment with death (for example, the development of diabe­
tes mcllitus is unlikely to be fatal). Also when stroke is the end point, not all
patients withdrawn because of a high blood pressure would have a stroke if left
untreated. However, some will have a stroke and if the incidence is known
from previous observational studies for a particular level of blood pressure, a
proportion of those withdrawn can be assumed to have had a stroke. 1 hus for
a given number of withdrawals a number of end points will accrue. To my
knowledge, the assumption that some withdrawals reach an end point has not
been utilised in the analysis of a trial, presumably as any conclusions would
rest on unconfirmed assumptions. However, the results of the trial can be
examined by assuming that the proportion of withdrawals leading to an end
point is zero, one, or various intermediate values.

14.3.6 Conclusion on withdrawals
Patients may be withdrawn from a trial because they were ineligible to enter
and should have been excluded initially; they may be withdrawn for ethical
reasons or refuse to collaborate further and default from follow-up. The de­
fault rate during a large clinical trial is unlikely to be zero even over a short
duration. In long-term trials the default rate will be at least four percent per
annum |37|; this fact should be taken into account when computing the num­
bers required for such a trial.
The treatment may precipitate default and defaulters must not be omitted
from the first analysis, if analyses arc conducted according to the initial ran­
domisation on the intention-to-treat basis, the groups will maintain their ini­
tial comparability. 1 lowcvcr, the interpretation of the results may remain very
difficult if withdrawal results in a change of treatment. Also, the investigator
may be only interested in the effect of treatment in a subset of the subjects (for
example, compliant patients). It is obviously unsatisfactory to assess the effect
of dietary advice on, say, lowering blood fats when a proportion of patients
refuse to adhere to the diet. Most trials must therefore also be analysed on the
per-protocol basis.
The failure of patients to complete a trial may have serious consequences
when the investigator intends to complete a series of Latin or Gracco-Latin
squares so that order and carry-over effects can be balanced and calculated.

Often a patient will default and remove the possibility of an elegant and simple
analysis (section 11.5). The situation may be retrieved, however, by substitut­
ing for missing values using multiple regression or other techniques 11711.
14.4 THE PROBLEM OF PATIENT NONCOMPLIANCE

Noncompliancc is the failure to adhere to therapeutic advice. Some noncompliant patients may be willing to report this at interviewr or on a self­
administered questionnaire. Those who fail to admit to noncompliancc cannot
be identified by the physician but may be detected by pill count or by more

objective measurements.
Failure to comply with therapeutic advice may have important conse­
quences in randomised controlled trials. If the treatment regime is only

I

I

I

I

The physician’s assessment has been shown to be useless when considering
noncompliancc with drugs 1178]. The physician appears to have no way of
predicting who will be nonconipliant in the future nor who has been noncompliant during the course of a trial.

14.4.4 Pill count to detect noncompliance

For a pill count the patients arc asked to bring their containers of tablets to each
visit. The pretext for returning these is to determine whether or not a further
prescription is required and the contents arc counted out of sight of the patient.

Fhe observed number of pills left is compared with the expected number. The
method is obviously open to manipulation by the patient and takes no account

adhered to by a proportion of subjects in the trial, the results for the patients
will relate to the attempt or intention to treat rather than the effect of adhering
to the particular intervention. If the patients in the trial arc exceptionally

of the patient’s giving other persons his treatment or otherwise destroying

compliant the effect of treatment may be large whereas if they arc less com­

some of the tablets [179, ISO]. The use of a pill count may overstate the

pliant than expected the effect of treatment may be reduced, too few patients
may be recruited or determine the effect, and any dose-response relationship

compliance of the patient.

may be underestimated. Lastly we must consider the misinterpretation of trial

14.4.5 Other measurements of compliance with treatment

results that may result from noncompliancc and the use of nonconipliant

Compliance with a drug regime may be tested by measuring drug or metabo­

patients as a control group.

lite concentrations in the blood or urine. However, this method docs not

14.4.1 Interview to detect noncompliance

necessarily estimate the day-to-day degree of noncompliancc. For example, a
patient may usuallv take his or her tablets but may not have taken them at the

An interview detects a proportion of nonconipliant patients. When noncompliant patients admit to failing to adhere to their treatment, they are almost

time of the blood test. Another patient may omit most of his medication but

certainly not lying [172, 173] and an interviewer easily identifies a proportion
of such subjects. However, not all nonconipliant persons arc detected by a

have taken the treatment just prior to the time the blood sample was taken.
Patients may be more likely to take their medication on the day that they make

ing to compliance as assessed from an interview we arc unable to distinguish

a visit to the investigator, and blood samples at this time may underestimate
noncompliancc. The number of tablets taken cannot be determined as the
pharmacokinetics of many drugs arc complex and it is difficult in an individual
to predict the concentrations of the drug or its metabolites that would be

between patients who adhere to their treatment and nonconipliant patients

expected from a given consumption of the drug. If a drug is difficult to detect

simple interview [174, 175, 176], and this method overstates the degree of
adherence to a treatment regime. When analysing the results of a trial accord­

>1

14.4.3 Physician’s assessment of patients
who do not admit to noncompliance

who refuse to admit to this fact 1173|.

in the urine, the tablets may be labelled with a fluorescent dye and the urine

14.4.2 Self-administered questionnaire to detect noncompliancc

examined for fluorescence.
Certain drugs have marked effects on blood constituents and these changes

The completion of a self-administered questionnaire has the same advantages
and disadvantages of the interviewer technique. However, whereas in a small

may be used to monitor noncompliancc. For example, a thiazide diuretic may

trial there is often ample opportunity for one interviewer to question the

potassium is not lowered by treatment or in whom the scrum uric acid docs

patients, if a large number of subjects arc involved it is easier and cheaper to
arrange for a standard questionnaire on compliance to be completed 1177|. A

not change in the expected direction can be suspected of noncompliancc 1106).
Similarly, in a trial of antismoking advice the carboxyhaemoglobin level in the

self-administered psychological scale to detect obscssionality has been shown
to produce a score inversely related to nonadherence. This may prove useful in

blood is an indication of whether the patient has stopped smoking or not.
Objective measurements of compliance arc to be preferred to indirect

lower scrum potassium and raise scrum uric acid. A patient whose serum

detecting patients who arc nonconipliant 1177]. I lowcver, such a technique has

methods but may be expensive and relatively more difficult to organise than

not yet been attempted in a clinical trial.

other measures of compliance.

benefit from a strategy of suggesting the diet will be apparent. However, it is
possible that health-conscious people will comply with the dietary advice and
persons who are not worried about their health will not. In this instance,
noncoinpliant patients may continue to smoke and take no exercise and these
habits may adversely influence their survival. If the patients who are noncompliant with dietary advice arc omitted from the subsequent analysis of survival,
then we will compare health-conscious people (who took the dietary advice)
with the control patients who consist of both health-conscious and healthcareless persons. It would not be surprising if such an analysis proved the diet
to be beneficial. A similar example of antismoking advice was discussed in
section 8.3. Again, such trials must be analysed on an intention-to-treat basis.

14.4.6 Owing to noncompliancc, later trials of a new
treatment may demonstrate a smaller effect than earlier trials
Early trials of new treatments (for example, a new pharmacological agent) will
be performed on volunteers or selected patients in a laboratory setting. 1 hese
subjects are likely to be more compliant than patients subsequently treated
outside the research-institute environment. Assuming that noncompliant pa­
tients demonstrate a reduced drug effect, the average effect of a given dose of
drug will be smaller when these patients arc included. Later trials (for example,
on outpatients) may therefore suggest a smaller pharmacological effect of the
treatment owing to noncompliancc.

14.4.7 The numbers required for a randomised control
trial may be underestimated owing to noncompliancc
When estimating the numbers required for a trial the investigator may employ
the results obtained from a pilot or early trial where compliance is high |1B1|.
The numbers required for the trial will be calculated on the basis of the effect
shown in this trial; if the effect in the main trial is smaller due to noncom­
pliance, then insufficient numbers may be recruited for the trial.

I

I

4

14.4.10 The deliberate use of noncoinpliant patients as a control group
When a randomised controlled trial cannot be performed it is very difficult to
estimate the gain that results from treatment. For example, when considering
surgery for a resectable cancer a randomised controlled trial may not be possi­
ble but a control group, consisting of persons who refuse to have the opera­
tion, may be collected. However, the control group will consist of a verybiased and unusual sample; such a study lies outside the scope of this book.

14.4.8 The dose-response relationship may
be underestimated in noncompliant patients
If noncompliancc is present, then the effect of a particular drug dose will be
underestimated and any estimate of the dose response will be incorrect. In
addition, noncompliant patients may appear to take large doses of the drug
with no adverse consequences. However, the difference between therapeutic
' and toxic doses may be small, and when later compliant patients are given the

14.4.11 An overall view of noncompliance
Noncompliancc by patients during a trial may lead to a misinterpretation of
the results. I lowcvcr, the results of a trial that includes noncoinpliant patients
may be more applicable to the effects of the treatment in the community at
large, hi a randomised controlled trial, procedures should be adopted to detect
noncompliancc and the results analysed in two ways: first including the noncompliant subjects, and second omitting them. A proportion of noncompliant
subjects may be detected by simple and inexpensive interview methods, and
urine or blood tests may provide more objective measures of compliance on a
single occasion. Noncompliancc may also distort any dose-response relation­
ship and lead to inadequate numbers being recruited to a trial.

higher doses they may experience an adverse effect.

(i
.1

14.4.9 Noncompliance may lead to a smaller
response in one of two equal treatments
In a randomised trial wc compare one strategy with another. If one strategy
proves more effective than another wc may conclude that it is to be preferred,
but wc cannot necessarily conclude that the pharmacological agent g.vcn ,s
more effective. For example, if two drugs arc employed m a trial one may
cause side effects and another may not. It is possible that noncomphancc will
be greater with the drug producing adverse effects than with the other he
drug that induces noncomphancc will have its pharmacology effect underes­
timated and the other drug may appear more pharmacologically active, lowever, the conclusion from the trial is correct in the sense that the strategy of

prescribing the drugs leads to the observed results.
,,un1rl
Feinstein provides another example when dietary advice is given |18^|. I hl
strategy of dietary advice is compared with no dietary advice and the end point
of the trial is survival. If all the cases randomised to diet arc compared to all
randomised controls, then the results of the trial should be clear and any

14.5 NONADHERENCE TO THE PROTOCOL BY INVESTIGATORS

i
i

i
i

i

1
I

Chapter 12 considered the importance of drawing up a detailed protocol. It is
important that all the participants agree to adhere exactly to the protocol.
Occasionally the investigator may have to deviate from the protocol in the
interest of an individual patient. Such an event will usually lead to the termina­
tion of the trial for that particular patient. More important is consistent failure
of the investigator to adhere to the protocol. 1 his may not be readily admitted
and has to be determined indirectly. Nonadhcrence can be accidental or inten­
tional and involves three most important areas: admission to the trial of pa­
tients who should not be admitted, breaking a double-blind code, and the
prescription of additional treatment that is not allowed in the trial protocol.

14.6 TERMINATING THE TRIAL

14.5.1 Admission to the trial of patients
who do not fulfil the admission criteria
The coordinating office or the person randomising the patients should confirm
whether or not the patients are eligible for the tnal. For example, a tr.al may
admit persons with a scrum uric acid within a certain range and the coordmat­
ing centre must refuse to randomise a patient who docs not have a uric acid
within this range. An unintentional failure to adhere to the protocol was
observed in the University Group Diabetes Program tr.al where severa pa­
tients were admitted to the tnal without the strict critena of d.abetes melhtus
[182], This problem added to the considerable controversy about this trial,
which is discussed in section 19.6.
Patients arc occasionally admitted to a trial owing to administrative errors
/ t is admitted before an investigation result is availfor example, when a patient
have fulfilled the other entry criteria to the trial but
able. The patient may
when the result becomes known it may exclude the patient. Administrative
errors may also occur when noncompliant patients arc to be excluded by pill
counts. A patient may repeatedly forget to return with his tablets and com­

pliance with medication cannot be confirmed.
14.5.2 Breaking a double-blind code
An investigator may deliberately break a double-blind code by examining the
details of treatment secured in a scaled envelope. Fins is unlikely to occur, but
he may guess the identity of the treatment from other mformatton. l or ex­
ample, in trials of beta-adrcnoceptor blocking drugs the invesbgator who ts
preserving the treatment should not measure the pulse rate as he may detect
the marked slowing of the pulse that occurs with this treatment, for tins

!.

reason one investigator should prescribe and a second investigator assess t ic

In a trial the duration of treatment for the individual patient is specified at the
design stage as is the total number of subjects to enter the trial. In the normal
course of events the trial will last as long as it
i. takes to enter the required
number of patients and for the last patient to com plctc the study. The trial may
be abandoned early if adverse effects of treatment arc observed, if a benefit is
demonstrated at the predetermined level of significance, or if it is apparent that
recruited or that the response to
the required number of patients will never be
I--------------

treatment is not as expected.
...
. •
Decision rules for stopping the trial must be agreed at the design stage and in
a long-term trial a review committee must examine the results of the trial at
given points in time. The decision rules for stopping the trial will be discussed
together with the disadvantages of stopping too early or continuing too long.
14.6.1 Decision rules for stopping the trial
Decision rules arc considered in section 3.11 and table 14-2 summarises their
general format. The table assumes that interim analyses arc performed at
predetermined fixed intervals and that the level of significance is adjusted for

repeated looks (section 10.7.3).
.
Rule 1 states that the trial must be terminated if a statistically significant and
biologically important adverse effect of treatment is demonstrated. Rule 2
states that a trial must be designed to terminate when any one important
Table 14-2. Decision rules for stopping a long-term
trial with morbidity or mortality as an end point.

RULE 1
A statistically significant increase in a serious adverse cfTcct is observed in
group.

an actively treated

results of treatment.
RULE 2

14.5.3 Treatment during the course of a trial
During a trial of drug or dietary treatment other treatments may interfere w.th
the results of the trial and may be prescribed either by accident or design.
When comparing active antihypertensive medication with placebo a protocol
may not allow the administration of other pressure-lowering drugs If a pa­
tient develops angina, he may require a beta-adrenoceptor blockmg drug and .
have to be withdrawn from the trial as this treatment also lowers blood pres­
sure. The investigator must adhere to the protocol and withdraw the patient.

A statistically significant decrease in morbidity or mortality is observed.

14.5.4 Conclusions on deviations from protocol

RULE4

Quality control is critical throughout the trial and should detect deviations
from the protocol. Monitors have been appointed in many trials whose task

The number of patients recruited will not be adequate or the cHect of treatment is not as great as
expected.

has been to evaluate adherence to the protocol (section 9.2).

PLUS

A reduction in total morbidity and mortality is observed commensurate with there being no
transfer of morbidity or mortality from one cause (the end point of the trial) to another.
RULE 3
The predetermined number of patients has been admitted to the trial and followed for a given
length of time.

6

con

the

benefit is demonstrated provided the effect is compatible with an overall
benefit to the patients. A reduction in total mortality and morbidity must be
apparent, even if not statistically significant as a treatment may reduce, say,
myocardial infarction or stroke yet produce an excess of other serious morbid
or mortal events. An antihypertensive drug may reduce stroke events but
produce episodes of hypotension and an excess of injuries due to falls and other
accidents. When terminating such a trial because of a reduction in stroke
events, we must be certain that there has been no comparable excess of mor­
bidity from other causes.
it has been suggested that in trials of secondary prevention in myocardial
infarction both mortality from myocardial infarction and total mortality must
be reduced with statistical significance to stop the trial. However, such a trial
may arrive at a statistically significant reduction in myocardial infarction
events before total mortality is significantly reduced. It would be unethical to
continue the trial until total mortality is also reduced by a commensurate
amount.
Rule 3 states that a trial should be terminated when the intended number of
patients has been recruited and followed, even when there is a negative result
but with the power as specified in the trial design. Rule 4 states that a trial may
be aborted if there is no hope of recruiting the required numbers for a given
amount of time or money. The trial may also be stopped if the treatment is not
producing the effect anticipated in the design. For example, it may be intended
to assess the effect on myocardial infarction mortality of lowering scrum
cholesterol by 1 mmol/1. A diet may be chosen for this purpose put prove to
lower scrum cholesterol by only half the expected amount. In this event the
intended reduction in mortality will be unlikely to be observed and the trial
may have to be abandoned.
14.6.2 The disadvantages of stopping the trial too soon
The result of a trial may be accepted if it agrees with the preconceived notions
of the medical community and rejected if it agrees with the unexpected (chap­
ter 19). With a surprise outcome it is important to achieve a high level of
significance, preferably less than one percent. The pressure to terminate a trial
increases as a significant positive result approaches and the temptation must be
resisted until the desired level of significance is achieved. One large trial was
continued in an attempt to achieve a higher level of significance and will be
discussed in section 19.1.
If a trial is intended to prove the efficacy of a treatment that has appeared
beneficial in observational studies, then a lower level of significance may
suffice. For example, in the EWPHE trial [43] a five percent level of
significance has been considered sufficient to stop the trial if stroke incidence is
reduced by active antihypertensive treatment. This effect has been shown in
middle-aged patients and would not be surprising. I lowevcr, a novel or unex­
pected result should reach a higher level of significance.

MD
ID
t

0

lb)
MD-Maximum acceptable difference
LD Least interesting difference

’ i 'ire 14-3. The difl'crcncc between two treatments is plotted against time. The trial will be
st ('ped when the Ml) is exceeded with statistical significance. Three analyses are performed: at
the first, the maximum acceptable difference is exceeded but not with any confidence; at the sec­
ond analysis, the result lies between the LI) and the maximum acceptable difference; and at the
third analysis. Ml) is exceeded with statistical confidence and the trial is terminated. Adapted
from Meier, Clin. Pharmacol. Thcr. 25: 649-650, 1979 with permission.

14.6.3 The disadvantages of stopping the trial too late
When a trial is terminated and a very high level of significance has been
achieved, the question arises whether the trial could have been terminated
earlier and the benefit of active treatment given to the control group at that
time. For example, in a recent trial of secondary prevention of myocardial
infarction there were 98 deaths on the active treatment (timolol) and 152 deaths
on placebo (P = 0.0003) [170|. It is theoretically possible that some of the
placebo deaths could have been prevented if the trial had been terminated
earlier when P = 0.01. However, the review committee may have only exam­
ined the data when P > 0.01 and later when P = 0.0003 and therefore had no
opportunity to stop the trial at an earlier stage.
14.6.4 Generalised decision rules for stopping a trial
Meier has formalised some general stopping rules for a clinical trial [150]. 1 Ic
has defined the maximum acceptable difference (Ml)) as the largest true differ­
ence between treatments that a subject in the trial should be expected to accept
and yet continue in the trial. I le also defined the least interesting difference

(LD) as a true difference, although not terminating the trial, of sufficient
magnitude that when exceeded the difference would “be enough to justify a
decision in favour of the winning therapy.” Figure 14-3 illustrates the possible
use of such rules. At the first analysis the MD is exceeded but not with any
confidence. At the second analysis the result lies between MD and LD, but at
the third analysis the result exceeds MD with statistical confidence and the trial

is terminated.

15. ANALYSIS OF THE TRIAL RESULTS
i

I

14.7 CONCLUSIONS
This chapter examined some problems that will be encountered during the
conduct of a trial. The investigator should first consider performing a pilot
trial or at least starting the trial in a pilot fashion. The advantages and disad­
vantages of a run-in period were discussed together with the problems of
reducing dropout, detecting patient noncompliancc, and the failure of inves­
tigators to adhere to the protocol. Lastly, the decision rules for terminating a
trial were considered.
The problems in execution of a trial should be presented in the report and
information
information should
should be
be given
given both
both on procedures that worked well and those
that did not. The reader must understand that errors will occur in the perfor­
mance of trials and that the results cannot be discounted on the basis of minor
deviations from a protocol. Details of such errors should be reported together
with the number of patients considered for the trial, entered in the study, and

I
Several authors have reviewed the statistical analyses appearing in reputable
medical journals. The reports reveal an appalling record of error and incompe­
tence [183-188]. This chapter considers these problems but it is beyond the
scope of this book to describe in any detail the statistical methods required to
analyse the results of randomised controlled trials. The reader is referred to
standard texts such as the books by Armitage [189], Snedecor and Cochran

withdrawn.
The decision rules for stopping the trial must be outlined in advance, al­
though not every contingency can be foreseen and catered for.

[190], and Petrie [191].
Two strategics have been suggested to improve statistical analyses: first, a
statement may be made in the protocol of the analyses that arc intended. This
can be scrutinised by fund-giving agencies and ethical committees [188]. The
investigator writing the protocol must have a grasp of elementary statistics or
seek the help of a statistician. The second strategy is to have all articles
scrutinised by a statistician prior to acceptance for publication. In the past it
appears that papers with a detailed description of statistical methods have been
referred for a statistical opinion whereas those with little or no mention of
these methods have not. The latter articles have proved to be those that actu­

I

*

ally needed the statistical review 1183],
When writing the protocol and requesting funding for a trial, care must be
taken to ensure that adequate arrangements arc made for prompt data process­
ing and analyses as it may be unethical to proceed with a trial longer than
necessary. Pcto and his colleagues also considered prolonged trials and stated,
“Collect as much data as possible at first presentation, only data which arc

i

ivv

15. OIlMIjsiS Ol

“■al res....^

strictly necessary thereafter, and analyse the data you do collect very thor­
oughly” [192]. The initial data determine whether randomisation has been
successful in producing equivalent groups and may also be of use in determin­
ing prognostic factors.
The present chapter considers errors in analyses; checks on randomisation;
analysis of normally distributed data, proportional data, and survival data;
confidence limits; and the problems posed by dropouts.
15.1 COMMON ERRORS DURING ANALYSES

The errors frequently encountered in reports of clinical trials include: confu­
sion about the experimental unit in the trial; the failure to use a statistical test;
failure to state the statistical test employed when one is used; the inappropriate
use of t tests; the reporting of standard deviation instead of standard error of
the mean and vice versa; the use of one-sided significance tests for «; confusion
over the meaning of P; and failure to analyse dropouts in a reasonable manner.

Table 15-1. The results of a trial of the treatment of viral warts in
11 children. The apparently beneficial results of treatment A were
almost entirely due to the effect of this therapy in child number 6.

Child

Number of warts
randomised to
treatment A

Number of warts
cured

Number of warts
randomised to
treatment B

0
1

1
1

3

4

1
2

3

3
4
5

3
2
2

6

II

7
8
9

4

10
11

3
3

1
2
10
0
0
0
2
2

Total

34

21 (62%)

1

1
1

1

2
11

3
2

Number of warts
cured

1
I
0
0
2
0

3
3

3
0
0
0
2

32

9 (28%)

1

15.1.1 Confusion about the experimental unit

It is not always entire individuals that get randomised in clinical trials; rather,
the subject may be a bodily part, for example an eye, limb, or joints. In
patients with rheumatoid arthritis it is possible to randomise joints to one of
two treatments and five patients may have, say, 16 joints randomised for
treatment. The most extreme example of more randomisations than subjects is
the single-person trial. In this type of trial episodes of illness arc randomised
for treatment (for example, recurrent episodes of hay fever or asthma). Such a
trial has been conducted in a patient with suspected myasthenia gravis who
received in random order, placebo, prostigminc, and D-amphctaminc [193].
This trial was of diagnostic value in this patient but would a trial of treatment
in one person be likely to have any general applicability? For example, a trial of
hay fever treatment in one patient demonstrates that drug A is statistically
significantly more effective than drug B. What can we deduce from this trial?
We can only conclude that drug A is to be preferred in this one subject. We
cannot conclude that, in general, patients should be given drug A and we arc
mainly concerned with the overall validity of our results (chapter 5).
When more than one patient is involved in a trial but there arc more ran­
domisations than patients we may conceivably get the results given in table
15-1. In this hypothetical trial 11 children had a total of 66 warts but one child
had 22 warts and the others had seven or less [194]. Taking a wart as an
experimental unit, treatment A cured significantly more warts (62 percent)
than B (28 percent), P < 0.01 using a chi-squared test. However, the data
show that this result was due to the excellent result in the child with 22 warts.
If the experimental unit is taken to be a patient, then four children responded
more to A than B, two more to B than A, and five responded similarly—
hardly an impressive result. In clinical trials the experimental unit must be the
subject and the results arranecd so that each subject is eouallv important. In

any statistical test the degrees of freedom should be slightly less and never
more than the number of subjects.

15.1.2 A statistical test is not employed

Contemporary reports of clinical trials usually employ a statistical test as most
authors arc convinced of the necessity of supporting their conclusions with
tests of statistical significance. However, if the result of treatment is very
remarkable, such as the recovery of the first patients with tuberculous menin­
gitis to be given streptomycin, then neither a controlled trial nor statistical
analysis may be required. Unfortunately most treatments are not so effective,
the outcome of the untreated condition is more variable, and both controlled
trials and statistical evaluation arc required.
15.1.3 The statistical test is not identified

Three reviews have reported that about a third of publications state a P value
without quoting the procedure used to test for statistical significance [183, 188,
195]. It is of great importance to state the method used to arrive at a conclusion
in order that the readers may fully evaluate the results and check the statistical
tests.
15.1.4 The statistical test is used inappropriately

Glantz [188] has recently demonstrated that at least a third of articles quoting
the results of more than one t test should have employed a test that allowed for
the use of multiple comparisons.
Let us consider the results given in figure 15-la. Four drugs arc compared:
A, B, C, and 13. Often six t tests arc reported in order to test for differences

naly

182

1
ic tr

ts

I


i
a)
DRUG

A

Mean Results

B

c

S

5c

t1

C2

I

D

C3

C4

t5

C6
b)
TIME (YEARS)

1

Mean results

*1

Mean change

Paired t test

2

3

52-%
C1

C2

4

X 4 -X
*0

C3

Figure 15-1. Results in two trials where multiple t tests should not be employed.
Figure 15-la. Gives the mean results (X) for four treatments A, B, C, and D. to t6 indicates
the multiple t tests that could be (but should not be) employed.
Figure 15—lb. Provides the mean results for one group of a within-patient trial given a single
treatment for four years. Xois the mean baseline result, tj to t4 are the four paired t tests that
could be performed on the change in the variable. Such multiple testing should be avoided (see
text).

between the four mean results. By employing the six tests (^ to /6) all results
are compared and yet we know that the extreme results must not be selected
and contrasted. When only one t test has to be performed in a test for a five
percent level of significance, we may be willing to accept a one in 20 chance
that we have incorrectly rejected the null hypothesis. If we perform six /-tests
we approach a one in four probability of a wrong conclusion. The probability
is not as high as 6 X 5% = 30%, but is given by the formula:
Probability of making a mistake

= 1 - 0.956

1 - 0.74

0.26

1

probability of not making a mistake

(15.1)

If there were ten drug groups it would be more obvious that we cannot select
the smallest and the largest result and compare the two using a straightforward
t test. The solution to the problem is to select a test that takes into account the
multiple comparisons performed: for example, a studentized range test [189],
or Schcffc's test |190|, or, where appropriate, to perform an analysis of vari­
ance |189-191].
Figure 15-1 b illustrates the situation where one group in a trial is given a
particular treatment and the average result of a variable, say, weight, is xn at
baseline and X| to x4 after 1, 2, 3, and 4 years, respectively. The investigator
wishes to sec if there has been any statistically significant change in weight
after any one of the time intervals and the change is best examined using a
paired t test. Unfortunately this gives rise to four tests of significance, and we
have the problem of multiple testing and selecting the extremes. Analysis of
variance would be an appropriate solution in order to examine the data for a
significant change after any one of the four years. The analysis can even be
extended to detect a tendency for weight to alter progressively and linearly
with time.
Multiple testing will also be a problem if we examine changes in, say, 20
variables, but confine the results to those at four years. Examining many
variables docs increase the probability of a significant result, but each test is
independent of the others and there is only one way of coping with this
problem: demand a high level of significance, say P < 0.01.

15.1.5 One-tailed tests of significance

I

hi biological work, the test for a type I (oc) error must be two-tailed (chapter
10) as the result may cither be in the expected or the unexpected direction. The
only exception to this rule is the decision-making trial (for example, a trial to
determine whether or not a pharmaceutical company should investigate a new
drug). In this instance the firm is not initially concerned with whether
or not the new treatment is worse than a control treatment, but only whether
the company should investigate the new treatment or not. However, in the
usual report of an explanatory trial, if the reader detects a P value based on a
one-tailed test this should be multiplied by a factor of two to provide a twotailed assessment.

15.1.6 The meaning of P



!

When a statistical test is performed on a difference between two groups, P is
the probability that the particular difference would be explained by sampling
error and observed if the null hypothesis is true and there is actually no differ­
ence. P is therefore the confidence with which we reject the null hypothesis but
not the confidence with which we accept that the difference is exactly zero. For
example, if P = 0.45, we can only reject the null hypotheses with a probability
of 45 percent (that is, we cannot reject it). Forty-five repeat samples out of 100
would be as extreme as this or greater and 45 percent is the probability of

O'!

'n (B and snppon

The value oft could be more accurately calculated by pooling the separate
variances (189, 190, 191 ]. A report of a trial should therefore include the
appropriate means and standard errors of the means or standard deviations and
the numbers of measurements involved so these calculations can be made.
In a graphical representation of the outcome, a mean result is often repre­
sented as ] , and the legend to the figure must always state what the bars
represent Traditionally the figure indicates the mean and one standard error of
the mean on either side. It should be noted that this figure docs not provide the
confidence limits for the mean and it is surprising that such a device has

null hypo.be,:.

colleagues [187] examined 71 negative randomised controlled trials mainly
published in the Lancet, Ne.n England Jo. nval of Medicine, and the Journal oj he
American Medical Association. Half the trials had more than a 74 percent proba­
bility of falling to detect a statistically significant 25 percent improvemen with
therapy. When reporting the results of controlled trials the confidence limits

achieved widespread acceptance.

should be reported for any result (section 15.6).

15.1.8 The handling of data on dropouts and withdrawals from the trial
The inclusion or exclusion of withdrawals from the trial often leads to errors in

15.1.7 Standard deviation (standard error) of the mean
The calculation of a standard deviation was described in section 10.4. I he
standard deviation of a series of normally distributed results summanses the

analysis and is discussed in section 15.7.
15.2 ANALYSES TO DETERMINE THAT RANDOMISATION
HAS PRODUCED EQUIVALENT GROUPS

spread or variability of the data. As Glantz stated, “When observations are
equally likely to be above or below the mean and more likely to be near the
mean than far away, about 95% of them will be within 2 standard deviations
on either side of the mean” [191], A standard deviation therefore summanses
the data and is an accepted description of the spread or variability of the raw
data. The variability of an average result is given by the standard deviation of
the mean which is known as the standard error of the mean (SEM) and is

All the characteristics of the patients at presentation should be compared between the treatment groups in order to demonstrate that randomisation has
produced equivalent groups. For each normally distributed characteristic the
mean number of measurements, standard deviation (rather than SEM as the
raw data arc being described), and range should be presented. For discrete data
the proportions must be reported (for example, the percentage male married,
or black). The appropriate statistical tests must be conducted to see if there arc

calculated by:
SEM =

I

I

I

Standard deviation of the raw data

differences between the groups at the start of the study.
The following points should be noted:

(15.2)

Vnumber observations

]. When a large number of characteristics arc compared, one may well differ
between the groups by chance alone.
2
2. When
When large
large numbers
numbers of
of patients
patients arc entered into a trial, some differences
between the groups may be statistically significant but they arc unlikely to

The 95 percent confidence limits for the mean (section 15.6) arc calculated
by taking almost exactly two standard errors on either side of the mean. Glantz
concluded “ . . the standard deviation, not the standard error of the mean,
should be used to summarise data.” This is true when describing the informa­
tion on subjects before entry to the trial but not necessarily for the results of
the trial. The outcome of a trial will usually consist of the difference between
the mean results of the separate treatment groups. The variability of this
difference will depend on the standard error of the means and not simply on

the standard deviations of the raw data.
. .
.
Assume drug A produces a mean result of xA, standard deviation, SD,, and
standard error of the mean, SEMA. Similarly drug B produces mean x„, SDI5

be large or of biological importance.
3.
3. When
When the
the trial
trial includes
includes a small number of patients, significant differences
arc likely to be of biological importance and even nonsignificant differences

may be large.
,
4. If more statistically significant differences between the groups arc detected
than would be expected by chance, the randomisation process may have
failed and a biased selection into the different groups may have occurred.
Any random differences between the groups can be adjusted for retro­

I

i

5.

and SEMb and a comparison of the two means is made by a t test ca cu a c<

i1
i

spectively in the analysis [192].

I

follows:

15.3 ANALYSIS OF THE RESULTS: QUANTITATIVE OR CONTINUOUS DATA

(15.3)
t =

standard error difference
~
_____
Vs EM a2 + SEM^

(15.4)

I

i

The data must be checked for outliers and distribution, transformed if neces­

sary, and subjected to the appropriate statistical test.

15.3.1 Data checking
It is of utmost importance to rectify any data that have been incorrectly re­
corded and to eliminate any results that constitute errors in measurement.
Great care must be taken that outliers arc not removed in order to improve the
results and support the investigators’ preconceived notions.
The data arc checked by examining the frequency distribution for impossi­
ble or outlying values. In certain situations, consistency checks can be per­
formed, for example, to confirm that all patients who arc pregnant or have
gynaecological complaints arc female.
When computing, statistical packages such as SPSS (Statistical Package for
the Social Sciences) [197] can be used to derive a frequency distribution, mean,
and standard deviation. Healy and others have described rules for the detection
of outliers [198-200].

15.3.2 Tests to confirm that the data are normally distributed

»

Normally distributed variables arc continuous quantitative measurements (for
example, blood sugar) that conform to various tests of distribution. The most
important test is the degree of skewness that can be tested for statistically [ 197|
and also examined by plotting the frequency distribution of the variable. An
example of a skewed distribution is provided by the frequency of plasma urea
that has a right-hand tail of high values. Plasma urea has to be transformed to
achieve a normal distribution, but the distribution of blood sugar is not mark­
edly skewed.
The normal distribution is also bell-shaped and a test for kurtosis indicates
whether the data have too flat (uniform) or too peaked a distribution.

15.3.5 The tests to be used with normally distributed data
The t test and analysis of variance should be employed where appropriate
(section 15.1.4). The Student’s t test is a well-known statistical test and the
analysis of variance (ANOVA) is becoming more familiar and better under­
stood by the average reader of trial results. ANOVA may prove that the result
of one or more treatment groups is significantly different from the other
groups. 1 he problem will remain of determining which particular group is
significantly different from which other groups. The Student’s range and other
tests can be used to determine exactly which bctwccn-group comparisons arc
not compatible with the null hypothesis (section 15.1.4).
15.4 ANALYSIS OF THE RESULTS: PROPORTIONAL DATA

Proportional data, such as the percentage cured, improved, or dead, require
statistical tests such as the chi-squared (y2) test, figure 15-2a gives the calcula­
tion of the usual chi-squared test which incorporates a continuity connection.
Tables of
must be consulted to determine whether or not the result exceeds
a given level of significance. When examining the change in a proportion over
a period of time in the same subjects we have to perform an analysis suitable
for paired data, figure 15-2b gives the calculation for McNcmar’s test [189],
and the result of the test has to be examined in tables of standardised normal
deviates. McNcmar’s test is useful in determining whether any significant
change is occurring within one treatment group rather than comparing the
data for differences between groups.
15.5 ANALYSIS OF THE RESULTS: SURVIVAL DATA

15.3.3 Transformation of the data

The most efficient ways of comparing continuous data are t tests and the
analysis of variance. These analyses require a normal distribution but data that
is positively skewed (to the right), such as plasma urea, can be rendered more
normal by a logarithmic transformation. One line within a computer program
can produce the desired transformation and the distribution of the transformed
variable examined to confirm that the skewness has been reduced.

The survival from a rapidly fatal disease in two groups can be compared using
a chi-square test as in section 15.4. However, when the disease is not quickly
fatal we will wish to take into account, not only the fact of death, but also the
length of time before death. Also, certain patients may be lost to follow-up,
and some allowance has to be made for this fact in the calculations. The
appropriate analysis is the construction of a life table.

15.5.1 The life table
15.3.4 The data cannot be transformed to a normal distribution
If the data do not conform to a normal distribution, they can be analysed using
nonparamctric statistical techniques, often with little loss of efficiency [189191). However, statistical techniques based on the normal distribution arc
most efficient and arc to be recommended when possible. When the change in
a particular variable is to be analysed, the distribution of the original data may
not be important if the change in the variable is normally distributed.

A life table is best represented as a graph in which the proportion of survivors
over time is plotted. The technique for constructing a life table is well de­
scribed in Armitage’s book on medical statistics [ 189) and in a useful article by
Pcto and his colleagues [192|. The data required on each patient for such an
analysis include the date of randomisation (not the date on which they devel­
oped the medical condition under investigation); the date of completion of the
study; whether the patient is dead or alive; if dead, the date of death; and if the
patient is lost to follow-up, the date the patient was last known to be alive.

TREATMENT
B

TREATMENT
A

a)

100 •

TOTAL

90

NO CHANGE

a

b

a+b

CURED

c

d

c+d

TOTAL

a+c

b+d

n

80

x'-

70

> 60

[(ad-bc) - |n)* n
(a+b) (c+d) (a+c) (b+d)

UJ

g 50
5

Observed

UJ

90

b)

AFTER 1 YEAR
T

CONDITION +

CONDITION

K

r

S

tn

I

o

E
N T CONDITION ♦
T R
R I CONDITION Y A
L

30

Actively treated
20

10
T

McNetnar's test
U _ r~ln

iVn

where n ■ r+s

Figure 15-2. The analysis of proportional data. Figure 15-2a. Two different groups of patients
receive different treatments. The chi-squared statistic assesses the results of the trial and the an­
swer should be compared to tables of \2 on 1 degree of freedom.
Figure 15-2b. A group of T patients is examined to sec if there is a change in a condition from
entrv to trial to a one-year assessment. McNcmar's test provides the approximate number of
standardised normal deviates (U) that can be examined in the appropriate table.

15.5.2 Statistical test of survival

A statistical test is required to compare the whole of the two survival curves
illustrated in figure 15-3. Pcto and colleagues have come down firmly in
favour of the log-rank test [202, 203] and have described how to perform the
test [192]. Statistics such as the median and average survival times can be very
inaccurate unless nearly everyone dies and the data arc extensive. Treatments
may also differ in their acute and long-term effects; for example, the life table
may show a benefit for one treatment after one year but this may be reversed
after, say, three years. The results in figure 15-3 arc straightforward in that the
results in the twp groups arc nonsignificant after five years (0.3 < P < 0.5).
The confidence limits arc discussed in section 15.6.2.

°0

2

1
YEARS

OF

3
FOLLOW-UP

5

Figure 15-3. Life-table representation for 60 very elderly hypertensive subjects treated with
nicthyldopa and 60 persons who were simply observed [201],

15.5.3 Life-table analysis to determine prognostic features

The life table may also be useful when subdividing the results according to
suspected prognostic features but within particular treatment groups. For ex­
ample, patients given placebo may have two survival curves constructed, one
for smokers and one for nonsmokers. The comparison of the curves will
indicate the prognostic effect of smoking.
15.6 CONFIDENCE LIMITS MUST BE GIVEN FOR ALL TRIAL RESULTS

Confidence limits arc important when a statistically significant result is re­
ported and essential when a negative result is described. If a trial demonstrates
a positive result—say, a rise in haemoglobin averaging 1 gm/100ml and the
result is statistically significant at the 5% level—then we know that within the
95 percent bounds of probability the result was not compatible with a zero
increase in haemoglobin. 1 lowcvcr, the range of increase compatible with a

1UI

probability of 0.95 may be large: for example, +0.2 to +1.8 gm/lOOml. The
limits of this range are known as the 95 percent confidence limits.

208, 209]. The 95 percent confidence limits for the negative results after five
years reported in figure 15-3 ranged from a benefit due to treatment of 12
percent and an increased mortality of 36 percent.

15.6.1 The confidence limits for normally distributed data
The confidence limits for normally distributed data arc simple to calculate
because standard errors arc calculated as part of the test of statistical sig­
nificance. The 95 percent confidence limits for the differences between two
treatments in a large number of patients arc given by the following formula:

I

i

15.6.3 Confidence limits when the result of the trial is negative

where SE (difference) is the standard error of the difference between the two
means. The 95 percent confidence limits arc obviously wider than the 90%
confidence limits and the latter arc calculated as follows:

When a trial reports a near-zero effect the confidence limits will indicate the
extent to which this result is compatible with a benefit from a treatment in one
direction and an adverse effect in the other direction. Of 16 trials with negative
results analysed by Baber and Lewis [206] 12 [75 percent] results were compat­
ible with a treatment effect in reducing mortality by 50 percent and eight (50
percent) were compatible with an increase in mortality of 50 percent. Gore has
also provided very useful examples of the place of confidence limits in assess­
ing the results of clinical trials, including a study where the confidence limits
demonstrated that the trial was needlessly large [207].

90% C.L. = mean difference ± 1.64 x SE (difference)

15.7 LUMPING AND SPLITTING

95% C.L. = mean difference ± 1.96 x SE (difference)

(15.5)

I

(15.6)

The importance of reporting confidence limits was stressed by Wulff [204]
and supported recently by other authors [122, 205].

15.6.2 Confidence limits for proportional data
For data in a proportional form the standard error of the difference has to be
calculated. Baber and Lewis [206] recently reported the confidence limits on a
series of trials of the secondary prevention of myocardial infarction. They
calculated the SE (difference) and confidence limits as follows:
let
and it2 be the numbers of subjects in two treatment groups.
let p be the overall death rate and pi and p2 be the death rates in the two
treatment groups.
let be the overall survival rate = 1 — p

then
P

»iPi + "2P2

n, + n2

SE (diff)

y/ f 1

V

.

1 ’

—+—
\”i
I

(15.7)

and

90% C.L.

(p, - p2) ± 1.64 x S.E. (diff) + '/2 (— +
"i

h2

(15.8)

where ’/2 (IZh, 4- 1/h2) is a correction factor as the data arc not continuous,
Confidence limits based on a life table arc more difficult to calculate [203,

t

Lumping the data together and then analysing them can be contrasted with the
practice of splitting the data, presumably into more homogeneous subsections,
and then analysing these fractions. McMichael [210] stated, “The aim of a
statistical trial is to include all the unpredictable multitude of factors which can
influence the outcome by a comprehensive sample. Unless the treatment
shows a convincing difference in outcome for the whole group it is not per­
missible to separate out afterwards a sub-division of better results. Any sub­
divisions should be done on other criteria before the trial begins.’’ Bradford
Hill would agree with the dangers of identifying a subdivision of better results,
but when there arc factors that influence outcome, he stated “Surely it is our
job and duty, to see whether in the analysis we can identify them and thus
make them predictable.’’ He added, “It is better to have looked and lost than
never looked at all.”
The first stage of any analysis must be to examine the groups, as randomised
and without any exclusions. This has been termed an analysis on the intentionto-treat principle. The next stage of the analysis may be to allow exclusions
and examine groups who followed the protocol, an analysis on the pcrprotocol basis. Lastly, the data can be analysed according to prognostic and
other factors.
15.7.1 Analysis on the intention-to-treat principle
An analysis on the intention-to-treat principle tests the strategy of offering a
certain treatment to a group of subjects irrespective of whether they receive or
persevere with the treatment. It is the safest way to analyse the results of a
clinical trial in order to avoid bias (section 8.6), but it has some disadvantages.

1. In a small trial, default, dropout, or noncompliance may affect only one or
two patients but their results may distort the overall result of the trial.

2. In a long-term trial of an active treatment versus placebo treatment, some
patients who arc intended to receive active treatment may not receive it and
some who arc intended to receive placebo will receive active treatment.
Analysis on the intention-to-treat principle ignores these problems.

1. When the condition being treated is similar.
2. When the active or new treatment is similar.
3. When the control treatment is similar (usually a placebo).
4. When randomisation has been employed.

In view of these problems analyses should be performed both on the inten­
tion-to-treat and on per-protoeol basis.

Peto and associates [192] have described how such trials can be pooled by
considering each trial as a retrospective stratum of a single large trial. The
results for trials of anticoagulants in the secondary prevention of myocardial
infarction have been pooled to give a clearer idea of the results of treatment

15.7.2 Analysis on the per-protoeol basis

An analysis on the per-protoeol basis only considers those in the trial groups
who received the treatments as specified in the protocol. Patients who do not
adhere to the protocol arc excluded and not transferred between groups. The
worrying effects of such a selection have been discussed in section 8.7, and one
trial that has been criticised for concentrating too heavily on this approach is
discussed in section 19.1. In this trial it was decided that all events between
randomisation and seven days could be ignored as an active treatment only
exerts its effect after seven days of treatment [48]. Similarly, events were
included for seven days after stopping treatment but not thereafter as the
treatment continues to act for only seven days. The results of the trial have not
been widely accepted owing largely to the selection of patients and their events
for analysis.
15.7.3 Examination of the data according to prognostic and other factors
It is essential for the full analysis of a trial that the data arc examined to
determine the characteristics of patients in whom the treatment was most
effective and least effective. This can only be done for large trials where
different subgroups contain considerable numbers of patients. This analysis
can be performed even when the divisions were not considered at the design
stage and even when splitting the data was suggested by the results. However,
the authors should state why they chose certain subdivisions and may be able
to report the results with more confidence if they state their intention in the
trial protocol. An interesting example was provided by the Hypertension De­
tection and Follow-up Program (HDFP) trial where patients in the United
States of America were randomised to specialist care for their hypertension or
referred to the usual community health services for treatment of their hyper­
tension. The overall intention-to-treat analysis showed a benefit from special­
ist care [132] but when the results were broken down by sex and race, there
was no observable benefit for white women [211]. This is an example of a
reasonable subdivision of the data yielding important results that may throw
further light on the conclusions to be drawn from the trial.
15.8 AMALGAMATING TRIAL RESULTS

The results from different trials may be pooled under the following cir­
cumstances.

[52].
15.9 CONCLUSIONS

The chapter has considered the common errors to be found in the analysis of
trial results, including errors concerning the experimental unit, failure to em­
ploy a statistical test, the use of an unspecified or incorrect test, the use of onetailed tests, and the incorrect interpretation of a negative result. Also outlined
were analyses to demonstrate that randomisation has been effective and tests of
significance for quantitative and qualitative data. It is essential that the data be
checked for errors, the distribution of the data examined, and a transformation
performed where necessary. The calculation of confidence limits was discussed
and their importance stressed. The analysis of survival data was briefly re­
viewed and the analysis of data on the intention-to-treat and per-protoeol basis
was discussed. Other examples of splitting the data and analysing subgroups
were presented, and lastly, the amalgamation of results from different trials

was mentioned.

is well,, or when recognised, would soon be forgotten. In order for a patient to
report ;a symptom it must be perceived and then remembered.

16.1.3 Reporting of the symptom
16. THE EVALUATION OF SUBJECTIVE WELL-BEING

When a patient has recalled the presence of a symptom he still has to report the
fact to the observer. Both the patient’s attitude to the observer and the observ­
er’s relationship to the patient may influence whether a symptom is reported or
not. For example, if a patient is very grateful to a physician for the care he has
received he may be unwilling to report symptoms, viewing such complaints as
an expression of ingratitude. On the other hand, a patient with a grievance
may possibly list more symptoms.
The attitude of the observer is often crucial in the reporting of symptoms. A
sympathetic observer may appear relaxed and encourage conversation in the
form of initial small-talk that reassures the patient and leads to a fuller disclo­
sure of complaints. Also the observer may assiduously enquire whether or not
the patient has certain symptoms; this may lead to a high rate of reporting. The
formidable, brusque observer is unlikely to be told of so many symptoms.

16.1.4 The recording of the symptom by the observer

16.1 INTRODUCTION

When trial end points include reports by the patients on their well-being, the
problems of validity and repeatability are greatly increased. Not only do the
patients report subjective assessments of their well-being that cannot be
confirmed, but different observers may interpret this information in varying
ways and record invalid and unrepeatable data.
The recording of a symptom as present will depend on the existence of an
organic or psychological cause; the recognition of the symptom by the patient;
the reporting of the symptom by the patient; and the recording of the symp­
tom by the observer. These four interrelated stages will be considered
separately.

j
16.1.1 The presence of an organic or psychological cause
The observer, especially if a physician, will usually be concerned about the
basis for a symptom. If a symptom docs not appear to have an organic cause he
may ignore it. However, to the patient a symptom may be equally important
whether justified by organic disease or due to psychological problems.

16.1.2 The recognition of the symptom by the patient
Anxious or depressed patients often have a multitude of complaints. Some of
these may have an organic basis but would remain unnoticed when the patient

I
i

The observer must report the patients’ symptoms without bias. In a clinical
trial a physician may take little notice of a symptom that appears to be of a
psychological origin, especially if numerous such complaints arc reported at
the same time. The variation in the frequency of reporting symptom side
effects in drug trials is illustrated in table 16-1. 1 he proportion of patients who
reported sleepiness in various trials of the antihypertensive but sedative drug,
mcthyldopa, varied between ten percent and 83 percent. Such variability is
obviously not acceptable but docs occur with current methods of assessment.
Table 15-2 also gives details of the control treatment and the percentage
complaining of sedation on this treatment. One treatment, clonidinc, pro­
duced sedation more often than mcthyldopa but the other control treatments
were not known to produce sedation. The difference between the proportion
complaining in the mcthyldopa group and the proportion in the control group
provides an estimate of the percentage of complaints attributable to mcthyl­
dopa. The attributable percentage varied from seven percent to 62 percent and
did not appear to be affected by whether the symptom was elicited by direct
questioning or arose as a spontaneous report. However, with such variability
it is important to standardise the method ot collecting these data.
The recording of symptoms in clinical trials is difficult but also of the
greatest importance. In a trial to compare two drugs the objective effects may
be similar but one drug may produce symptom side effects, thus precluding its
use in clinical practice. The remainder of this chapter will discuss standardised
methods for eliciting such data. Trials of treatment in psychiatric patients will
be reviewed as well. Lastly, the importance of measuring the overall quality of
life will be considered and methods for quantifying this concept will be dis-

Table 16-1. The percentage of hypertensive patients complaining of sedation when taking methyldopa; symptoms assessed by interviewer.

Percentage complaining
on methyldopa = (1)

Control treatment

io

Clonidine
Diuretic/
Placebo
Bethanidine
Debrisoquine
Clonidine
Guanethidine
Guanethidinc/
Bethanidine
Same

15
24
25
37
47
75
83

in

O'

q

i Ji
■£ 7

3:
in
0

H

3: 2.aq
3 0 0
3‘ O- C
CL o Q—
J2
2 =■ o

2n 1
H.
32 o' €§
□ 3

3

2


o

pj

c r
2

o

OQ

O

1

3
o

p7

PJ

5 ?;§-

w “ 3 3- n

2
8
3
’> 2
lo =■
£ q
3 aq j

"

CT 0

in

?

s-< O
3. o

O

2.-= S 8a
?T
r> i 5 „
3

^2 S'^'S

o 3- -■ 9C
pj
PJ

o

=

<
Ci

S

;
e
C

I

S-o
22 *
CT1
a. 3 a. o 3
S? 3
r~r

tu

—I
C-

pj

'Z)
■“*

„ 3

3 2
3 O
cz>

2
. . x
..

= =5.3 S B

DQ

212

6
12
18
63
0.5

9
12
7

-j

2 -

1

o

o
3‘ 3
3

c/j

M

a

OQ
O o
O 2“
QX)
S

q
<q

o' 3

pj

3

o

3
PJ


.

CT a
O 9r-r

rr

pj

O

5. 2
3

O



■>

rt

n

O

tu



2.
3‘

Go

CT)

3
<-»

2 3. o'
~
O

3

-i

n

H

3
~
cz;
cn
ro

o

2 2:
o

Hi
2

d_ ST

1
3- n

PJ
in

cr

<a

3. 3-7<

3

3

PJ

3

o

in

3.
—■

Q-2 3
?2 3 2

_s
n




2 o
T*

^2 2 2=5'
ri

=.p

o

2
C_

Q- >

u-J

o 7Q

pj

£ 3
a
in

>

c"
p
o s: o“ n 3 <5
2*
O 3--3 *0
pj
w

< 2

PJ

pj

*

3Q 32
2
3 p z
:
5 w

pj

2 <

O o
<

o'

Q
cn . T5
O

O

3

3C

SR
DQ

218

1
s

“■

2 2 — tN>
= 1
8
2 (✓)
5 3 3 q s-: o

i5

- 2 g z. £
oO -■
CT

A

CT

§-

"3

U

""

I=
3 '
pj

2 x,

<

o

"*


3“ CT
ro n

Q-


q
3.


OQ

PJ

q
S
O
o

>
Z
tn
O

rS

(7)

n

2cz>

H
Z

% g- n3 X)c 3"
-< §° 3- tn O
"U 3
o ao

5’
O

3- 5 o =■ S =
o o M tn "


qj

- 3

6
o

H

m
TS
O

tn

I'

z

11 c

ZItj O 3— - (T
— ’O
o ■° 8 5" 2
<J
2- 8H 5 n2 rrr. 7

-)

'C

^3 a3
z
2 2:
n -• o
3~ 3 (£ t/i3
..
~
-3
3 S 3 3
■H 7 cu
3 2
*
in —<

- S
n’

3

T3 <z)
W Tj

tn

2 "3

3CT

62
32

C

2 3
2

O

46

214
215
216
217

O

3- o
3



213

SR
DQ
DQ
DQ

?

Go

n 3- 2 ~
O
2 ci

S 3 "O

-t



?

13

o

2 S

g

56

51

N) c

—>
O

o' a43
n

Reference

8 ^t-S

o

2
PJ
3-



= T&iJI.

— rz>

z

UG
3
c

Direct question (DQ) or
spontaneous report (SR)

go'o-S9--s-gB-§3-s

<

2 g

n
2

<

3'^

3-

m

3-

Percentage complaining
due to mythvldopa
= (1) " (2)'

Go

T
u

c/)

3-



in

o
>0
a

3

=-"3" 2 jnO
u ”O
6'
2 3 2 =
' 0 Z 3—
O
in

Q
O
0

u J 6
3? 3-


Q

J3

<

>

”2
o
n

Percentage complaining on
control treatment = (2)

PJ

<z>
7T

a

Q

n

Q- O n
" <-r 3

zn

3—

3“

pj

- 3
o
3

c

"p
o

2
2'2
"3
3’



1

g

3'

§ ■3
2 a ~
pj
~33- 3H“ S3 -o-1
Q QG>
«-t C
2- =
S1 2 <x
30 naZ crq
x;
3
O 3 3 3 2 S
in
"2 2 2 2 s- mo

s-

PJ

5 p
n

=

3
O

w

1 ? = 3- S 3

50

o
o
</>
2
H
O

2
c/)

K -g =

H

H=r M5 11 s
zn c

PJ
pj

3
3 H*
_ o
r3- C

so g-p--

O'

I

o

In

2.^ X
o
$
.
2 &

Q_ w

o" n

o

l

in

I <? t f. z
i
F
8
£1 q S </> 11
1. Si. ’ ?.I. z I-- -3S'-

cn

2

M

PJ

er

_j

CT

OJ

s- S'

■g

2

o

r*

c

C

s< i5
2 "

n

pl 3
= 3

O

5

§ isfR?= §q
2

o

g S.

O 6' ’

o

3

’T5

aq

H

i

Q

PS

? -5 s*

= s s7T s2 n3
Q

o’

S--3

q

“3

in

2 E

=•

o' cs
3
o a 1 2 §;O
o
g 3
s O 5
cr; o' 5'
o
c 3 S- h 8 2Qo
3 q 3 3
_□
52 £X
5S •2.
3Q
c
PJ



t/>

PJ

<y>

3

-■

q

n. 2.

O'
M

3'2 I
2n
3

I
O £
2 S 2
pj

___

ir8

£.3^

£

r-r

3 _Q

3:
3

2 c O y
Q S
zr. 2 3„ c
1-2 =■ s _o
= 3 5'OT 25'
=■
S' J-w 2. 3?I O

cn Q
3 3
3- pj

Q 3

2
r>
Q
TJ

cl

q
q

I CM*

swers. Any tendency on the part of respondents to always agree either to the
first answer or to a positive or negative statement must be discouraged. Cer­
tain characteristics of good questions in general were reviewed in 13.9; that
section should also be consulted.

16.3.1 The question must be clear
Medical personnel often overestimate their patients’ understanding of medical
terms and no ambiguous terms should appear in the question. Most trials will
include some subjects of lower than average intelligence and the questions
should be designed so that these persons can understand and answer. Long
words and double negatives should be avoided and whenever possible a
difficult long word should be replaced by a short one. Bennett and Ritchie
[162] have reviewed the qualities of good questions and considered that leading
questions should be avoided where possible and that vague terms such as
occasionally or often must be replaced by precise numerical terms. It is also
important to limit the time over which the symptom should be recalled.
Consider the following questions [159].
“Since your last visit have you often felt sleepy during the day?”
Answers: Yes/No
“Have you, in the last three months, noticed weakness in the limbs?”
Answers: Yes/No

The first question uses the term often and also leaves the duration rather vague;
however, it would be suitable in a trial with a fixed interval between visits.
The second question could be used when the visits were three months apart or
longer.

16.3.2 The question must have a high repeatability
The measurement of repeatability has been discussed in section 9.1 and indices
have been provided that can be applied to the questionnaire answers. To assess
repeatability, a group of persons should be asked the same question twice,
with the occasions separated by a short interval, say, two weeks. The subjects
answering the questions should be from a similar background to those who
will be recruited to the trial. The questions must not be repeated after a very
short interval, lest the subjects remember their first replies; a number of ques­
tions can be tested simultaneously in order to limit this recall. A question that
is highly repeatable may be well understood but not necessarily valid: the
question may not measure what is intended.

16.3.3 The question must be valid

Validity has been discussed in chapter 5 and can be defined as the extent to
which the question measures what it is supposed to measure. An example can
be provided by a series of questions that were designed to detect the symptoms

that occur with a marked fall in blood pressure. Certain antihypertensive drugs
produce these symptoms when the subject stands up. Three of the questions
were as follows [159]:
“Since your last visit have you suffered from unsteadiness, light-headedness or
faintness?”
Answers: Yes/No
“Docs the unsteadiness or faintness occur only when you arc standing?”
Answers: Yes/No
“For how many hours in the day arc you troubled by unsteadiness or faint­
ncss?”
Answers: Less than one hour/one-two hours/more than two hours

When the blood pressure of the respondents was measured standing and
lying, a larger than average fall in pressure on standing was observed when
positive answers to the first two questions were reported together with a
duration of less than one hour.
This series of questions had a degree of validity but one question that did not
appear valid on a preliminary analysis was included in a series of questions to
detect a previous history of stroke. The question was
“Have you ever had, without warning, sudden loss of power in an arm?”
Answers: Yes/No

Most of the patients who responded positively to this question had not had a
cerebrovascular accident at any time. The false positive rate was therefore high
and this question was not acceptably valid. Perhaps the question, “Have you
ever had a stroke?” would be more suitable. However, this direct question
may still have a high false positive rate owing to confusion with other episodes
of illness and a high false negative rate if minor cerebrovascular accidents are to
be detected. Medical terms arc frequently not understood [220] but may be
better understood by those who have the condition that is to be detected [221].
In general, medical terms should be avoided.
16.3.4 The question must not be ambiguous

A question was designed to detect the presence of diarrhoea and was phrased as
follows [159]:
‘‘Arc your motions often loose or liquid?”
Answers: Yes/No

Faeces arc often referred to as motions in colloquial English, but this attempt
to use simple words may have led to ambiguity. The question could be
interpreted as an enquiry about physical mobility and should be modified to
nrrvrnt tbR nnssible error.

16.3.5 The question should only make one enquiry

problem of recall and arc most suitable when there arc a limited number of
possible responses [226]. Responses to open questions may be very difficult to
code and analyse [219].

The first question in section 16.3.3 asks whether the patients have had un­
steadiness, light-headedness or faintness. Patients answered yes if they had
suffered from any one of these symptoms or indeed all three. It would be
preferable to ask about each symptom separately as it is possible that faintness
results from postural hypotension (the condition that is to be detected)
whereas unsteadiness may be due to arthritis of the legs or vertigo. If the latter
is true, the inclusion of the symptom unsteadiness in the question may increase
the false positive rate.

16.3.9 A RESPONSE SET MUST BE DISCOURAGED

A response set is the tendency for a respondent to select a particular answer or
to give a certain reply [227], The individual may tend to select a particular
answer whether correct or not: for example, the first answer, the positive
response, the negative answer, or the neutral response such as “I don’t know.”
The effect of any response set may be limited by varying the position of any
answer options [228] and by providing a greater variety of answers than yes
and no [229].
The problem of a response set may tend to be less with open questions
and an interviewer than with closed questions and a self-administered ques­
tionnaire.

16.3.6 The question should be grammatically correct

It is more important that the question is well understood by the man in the
street than whether or notit is grammatically correct. A standardised, repeat­
able, and widely used questionnaire concerning angina [222] was criticised on
grammatical grounds by London Civil Servants who were planning to ad­
minister it. However, the extensive use of the questionnaire in other studies
precluded any changes, as comparisons with other populations would then be
impossible and further studies on repeatability and validity would have to be
carried out on the grammatically improved questionnaire.

16.4 THE INTERVIEWER-ADMINISTERED QUESTIONNAIRE

There arc certain advantages and disadvantages of having the questions read by
an interviewer rather than having the respondent complete a self-administered
questionnaire (table 16-2). The interviewer must be taught to standardise his
or her interview so that a question is always asked in the same manner, in the
same order, and after an identical introduction. This training may be difficult
and docs not ensure that two interviewers will get the same responses. How­
ever, interviewers have certain advantages. They will have an impression of
whether or not the question is understood and can make additional informa­
tion available if required. Extra clarifying statements must be stipulated in
advance and printed on an interviewer’s form. With an interviewer, the sub-

16.3.7 The question must be answered
When a question produces embarrassment or offence it may not be answered,
even with full guarantees of confidentiality. For example, it may be important
to ask questions concerning sexual activity, but older subjects, whether sexu­
ally active or not, may be less willing to answer than younger patients. Simi­
larly, religious beliefs may prevent a question’s being answered. A Muslim
may be as embarrassed if asked whether he drinks alcohol as a Christian would
be if asked if he commits adultery.

Table 16-2. The advantages of interviewer- and self-administered questionnaires.

16.3.8 Should open or closed questions be employed?
An open question is one where the respondent writes his answer in his own
words or an interviewer records the exact reply. With a closed question, the
respondent chooses between the answer options provided. The question in
section 16.2, ‘‘Have you any problems?” is an open question and the seven
questions in section 16.3 are examples of closed questions. With an open
question the subject has to recall something whereas with a closed question he
is asked to agree with a statement. This may constitute a fundamental differ­
ence between the two varieties of question [223],
Open questions should be used during the initial stages of questionnaire
design even when closed questions arc finally intended [162]. Mellncr consid­
ered that the failure to employ open questions may lead to a loss of informa­
tion [224], but Belson and Duncan suggested that more information may be
derived from closed questions [225]. Closed questions certainly minimise the

Advantages of the interviewer-administered
questionnaire
1. When a question is not understood, subsid­
iary information can be made available to
the subject.
2. Subject does not need to be able to read nor
have his glasses available.

I

3. Multipart questions that depend on an initial positive response are more easily
administered.
4. Completion rate is higher for the individual
questions.

Advantages of self-administered questionnaire

1. The conditions of giving the questionnaire
can be completely standardised, thus re­
moving observer variability.
2. The time taken to train an interviewer and
for the interviewer to administer the ques­
tionnaire is saved. Therefore the self-ad­
ministered questionnaire is relatively
inexpensive.
3. Can be sent through the post, avoiding in­
terviewer’s travel and other expenses.
4. May be less embarrassing and answers may
be more true.

1
jccts will also be able to answer the questions even if they cannot read or have
forgotten their glasses. In addition, when a series of questions has been asked
as a consequence of an initial (usually positive) response, the questions arc
easier to ask using a trained observer, as complicated instructions have to be
added to a self-administered questionnaire. For example, the instruction, “If
the answer is ‘No’ please proceed to question ...” may confuse a respondent.
Finally an interviewer can ensure that an answer is obtained to nearly every
question, whereas with a self-administered questionnaire some subjects will
not answer all the questions. However, the use of an interviewer is expensive
and the results subject to observer variation despite rigorous training.

16.6 RANDOMISED CONTROLLED TRIALS IN PSYCHIATRIC PATIENTS

Randomised controlled trials in psychiatric patients are often concerned with
subjective changes such as symptoms and may involve greater difficulties
when compared with trials in other patients. In psychiatry, new drugs often
have to be evaluated directly on patients rather than volunteers, diagnostic
difficulties arc extreme, and informed consent may be difficult to obtain. The
response to treatment may be difficult to measure, treatment may have to be
prolonged to exert an effect, habituation may occur to drug treatment, a
variable dose may be required, and the drugs in use may produce many side
effects.

16.5 THE SELF-ADMINISTERED QUESTIONNAIRE

16.6.1 The assessment of new drugs for use in psychiatry

The use of a self-administered questionnaire removes the effect of observer
variation. The method also tends to be much less expensive as the question­
naire has only to be handed or posted to the subject and an interviewer docs
not have to be trained and employed. However, the subjects have to be able to
read, and if they can some will not have their reading glasses available when
given the self-administered questionnaire and others may fail to complete
some of the questionnaire. Where possible, the self-administered questionnaire
should be administered under standardised conditions: for example, in a wait­
ing room prior to a clinical investigation.
In a comparison of results from an interviewer and self-administered ques­
tionnaire it was found that the self-administered questionnaire gave a higher
proportion of positive responses to sensitive questions [159]. For example,
male patients were asked about impotence by a male observer, and a self­
administered questionnaire including the same questions suggested a higher
rate for this complaint (47 percent against 28 percent). It appeared that the
patients were reluctant to admit so readily to this embarrassing symptom
when asked by an interviewer. In this study, less sensitive questions were not
affected by the method of collecting the information. With very personal
questions, the self-administered questionnaire may have an advantage.
The self-administered questionnaire is also particularly useful in multiccntrc
international trials where interviewer training and standardised conditions
would be difficult to achieve. When there arc differences in language between
the centres, the questions, whether self-administered or not, must be trans­
lated from the original language into the new and then translated back into the
original language by someone who has never seen the first questions. If the
back translation of a question is not close to the original, the first translator
must try again until satisfactory matching is achieved.
Whether a questionnaire is self-administered or not, the origin and purpose
of the questionnaire must be fully explained to the respondent and the ques­
tions must follow a logical sequence. It must be remembered that the respon­
dent will expect some relationship between adjacent questions; the order of
administration may affect the responses [230].

A new drug for use in cardiovascular medicine may have demonstrable effects
in animals and human volunteers and the clinical efficacy of this type of drug
can be studied to some extent in these groups. Antiinflammatory and other
drugs can be studied in animal models of disease, but animal models of schizo­
phrenia, depression, and anxiety are less well developed. Early in their devel­
opment, drugs for use in psychiatry have to be tried on patients.
16.6.2 Diagnostic difficulties in psychiatry

Defining a psychiatric condition may present great difficulty. A Medical Re­
search Council Trial of treatments for depression employed an operational
definition based on clinical impression, the presence of certain symptoms, a
short duration of illness, and lack of previous treatment [231]. However,
international differences on whether a mentally disturbed patient has schizo­
phrenia, manic depression, or another condition may exist so that it is very
important to carefully define the type of patient who may enter a trial.
16.6.3 Obtaining informed consent from the patient
This problem arises mostly with schizophrenia, severe depression, and mental
subnormality. If informed consent cannot be obtained from the patient, it
must be obtained from the closest relative or guardian.

16.6.4 Measurement of response
Hamilton [232] has defined four categories of improvement: subjective
changes; objective changes; improvement in personal relations; and working
capacity.

16.6.4.1 Subjective changes
5
_____ t can
be determined by an interviewer or a selfSymptomatic
improvement
administered questionnaire. When a self-administered questionnaire has been
used in psychiatry to diagnose or quantify the degree of depression or anxiety,
the questionnaires have been termed self-ratint’ scales. Discussion of these scales

is beyond the scope of this book and not all arc useful for detecting the
response to therapy that may be observed in a trial. For example, the wellknown Eysenck Personality Inventory [233] measures whether or not the
subject has a neurotic personality. This trait may be constant and not suscepti­
ble to short-term fluctuations.
Hamilton has reviewed the self-rating scales that may be employed to assess
the state of anxiety. He found that at least one self-administered questionnaire
can detect drug effects, namely, Taylor’s Manifest Anxiety Seale [234]. Hamil­
ton also considered that three interviewer-administered scales can be useful:
the Brief Psychiatric Rating Seale [235] for all psychiatric symptoms; the
Symptom Rating Test [236]; and Hamilton’s rating scale for anxiety states
[237]. A trial in psychiatry must employ standardised methods of assessment,
and one or more rating scales should be used where appropriate. The greater
the difficulty in assessing a response, the greater the importance of using
standardised methods that can be reproduced by another investigator.

investigations as even greater problems may arise from a failure to perform
randomised controlled trials. The randomised controlled trial is essential in the

16.6.4.2 Changes in personal relationships

well-being and that this effect can be demonstrated.
The factors influencing the quality of life, how this quality can be measured,
and the usefulness of such procedures in randomised controlled trials has been

Changes in personal relationships may be difficult to assess but the number
and duration of contacts with others can be estimated and information sought
on the subject’s relationships with family, friends, employers, and colleagues
at work.
16.6.4.3 Objective changes

The ability to resume full-time work could be an important end point of a trial
as could discharge from hospital, discharge from care, readmission to hospital
and, in depression, the frequency with which electroconvulsive treatment has
to be employed.
Any change in behaviour that can be documented must be carefully defined
at the onset of the trial. Hamilton [232] provided two examples: outbursts of
temper that could be observed in a hospital ward and the frequency of going
out of doors in a patient with a fear of open spaces (agrophobia).
16.6.5 Problems of drug trials in psychiatric patients

A long duration of treatment may be necessary, the dose of drug may be
varied, habituation may occur, and side effects may prove very troublesome.
Shepherd [238] considered that the highly lipid soluble drugs that act on the
brain may be metabolised at more variable rates than other drugs, making a
variable dose schedule of greater importance. Habituation to sedative treat­
ment often results in the side effect of sleepiness being lost after a week or so;
therefore, Hamilton suggested starting with very small doses and increasing
slowly, even though this will prolong a trial [232]. He also pointed out that a
patient habituated to one drug may possibly not respond to a second and
cautioned against the use of cross-over trials in psychiatry.
The difficulty of conducting trials in psychiatry should not inhibit such

field of psychiatric investigation.
16.7 THE IMPORTANCE OF MEASURING
THE QUALITY OF LIFE DURING A TRIAL

Treatment may produce side effects that interfere with a variety of aspects of
the quality of life. For example, antihypertensive medication can produce
gastrointestinal effects that interfere with the enjoyment of food and drink,
and side effects on the cardiovascular system may reduce sporting activities.
Similarly, pharmacological effects may prevent the enjoyment of sex, sedation
may interfere with both work and play, and some drugs may produce depres­
sion and interfere with personal relationships and social contacts. In a trial, it is
not sufficient to demonstrate that an antihypertensive drug lowers blood pres­
sure. It has also to be proved that side effects are not severe and that the quality
of life docs not deteriorate. It is to be hoped that a treatment improves general

discussed recently [239].

16.7.1 Factors affecting the quality of life
The severity of the condition to be treated may affect the quality of life and if
the patient is cured an improvement in well-being is to be expected. Unfortu­
nately, certain chronic medical conditions are associated with little in the way
of symptoms or disability prior to starting treatment and the latter may result
only in an unchanged or reduced quality of life. As discussed earlier, symp­
tomless hypertension is one of these conditions; hyperlipidaemia and mild
diabetes mellitus provide other common examples. During a trial of treatment
in these conditions any adverse effects of the disease or treatment should be
measured. However, it is well recognized that symptomatic complaints arc
not usually due to the abnormality being treated or the treatment being given
but arc associated with anxiety and depression [240]. The age, sex, and race of
the subject arc also associated with the volume of symptomatic complaints

[159, 241].
16.7.2 Measurement of the quality of life
An example has been published of the (admittedly retrospective) assessment of
the quality of life in a randomised controlled trial [239]—namely, the Veterans
Administration trial of treatment for hypertensive patients [37]. In brief, esti­
mates were made of any disablement that would prevent a patient’s mobility
or ability to work, any disability interfering with other aspects of a subject’s
life, and any
y discomfort such as that produced by minor symptom side effects.
Table 16-3 lists ten states of well-being and the scores that may be attached to

I
u. iK.rr, r.
'/ '■ i,^b>
A v ' ' K '• vX

Table 16-3. Health states or states of well-being (after Fanshcl and Bush (242J).
Health state

5,
52
$3

$4

Ss
S6

S7

S8
$9

5jo
Si,

Total well-being
Minor dissatisfaction
Very slight but significant deviation from well-being (e.g., caries,
glasses for reading.)
Discomfort
Subject has a symptomatic complaint with no significant reduction in
efficiency.
Minor disability
Daily activities continue but with a significant reduction in efficiency.
Major disability
Patients show a severe reduction in efficiency of usual functions.
Disabled
Unable to go to work but can get about in the community.
Confined
In an institution.
Bedridden
Isolated
For example, in intensive care.
Comatose
Dead

!

Heir S-.-'’L b ,

i

;

(.’/ r-<p <

Table 16-4. Sixteen questions that may prove useful in identifying health
states S3 to S7 during the course of a randomised controlled trial.

Score

1.0
0.975
0.875

Health State
53

(Discomfort.)

54

(Minor disability
suggested if 2, 3, or 4 is
true.)

0.8

0.75
0.625

5s

0.375

(Major disability
suggested if 5, 6, 8, or 10 is
true.)

0.125
0.025
0
0

each state. Following the suggestion of Fanshcl and Bush [242], total well­
being was arbitrarily allocated a score of one and an eleventh state, death, a
score of zero. The scores were based on the assumption that a patient is
prepared to trade a certain number of years of life in a reduced state of health
for a smaller number of years of life in improved state of health [239, 242]. It
must be admitted that the scores were somewhat arbitrary and open to discus­
sion. For example, the scores were calculated on the assumption that a person
aged 40 would consider a further 40 years of life in a disabled state to be equal
to 25 years of life in a state of total well-being. The outcome for each patient
was calculated from the health-status score multiplied by the number of years
lived in that state. Thus an overall score was computed for each treatment and
termed a health status index. As expected, the benefits observed in the actively
treated group (an increased survival and reduction in cardiovascular events)
proved to be greater than the disadvantages of the symptom side effects and
other adverse effects of treatment.
During the course of a randomised controlled trial the health state must be
determined by collecting certain information. A subject who has no symptom
or disability may be placed in health state S2 (minor dissatisfaction) as state Si
can probably only apply to a newborn child. Enquiry should be made into the
presence of symptoms and when present, the health state will be reduced to S3
(discomfort).
A questionnaire may be employed to determine health status during the
course of a trial; and possible questions arc reproduced in table 16-4. Sixteen
questions arc given that may prove effective in identifying health states S3 to

Question
1. Positive response to any question on symptoms, pro­
vided symptom experienced at least once a day on
more than half the days under consideration.
2. How far can you walk without stopping?
(Answer: less than 1 mile.)
3. How many flights of stairs can you climb at one go?
(Answer: less than 2.)

4. Has your health or treatment interfered with your
hobbies?
(Positive answer.)
5. Answer less than half a mile to Q.2.
6. Answer none to Q.3.
7. Have you been going out to work during the last n
months?
(Positive answer.)
8. If yes, how many days have you been off sick in the
last n months?
(Answer: 5 or more days.)

56

(Disabled state
suggested when 11, 12, or
13 is true.)

57

(Confined state
suggested if 14,15. or
16 is true.)

9. Have you been able to do all your usual jobs around
the house in the last n months?
(Positive answer.)
10. If yes, for how many days in the last n months were
you unable to do these jobs through illness?
(Answer: 5 or more days.)
11. Negative response to Q.7 when subject did not work
because of illness.
12. Negative response to Q.9.
13. Can you travel by bus on your own?
(Negative answer.)
14. Are you able to go out and about?
(Negative answer.)
15. Do you require assistance with bathing?
(Positive answer.)
16. Do you require assistance with dressing?
(Positive answer.)

n = number of months.

S7. It is hoped that the scores may prove useful in evaluating the results of a
randomised controlled trial where one treatment leads to a prolonged life in a
poor state of health and a second treatment to a shortened life in a better state
of health.
16.8 CONCLUSIONS

In randomised controlled trials attention must not only be directed to objective
measurements of outcome but also to the symptomatic well-being of the
subjects. This chapter considered how the presence of symptoms should be
determined from spontaneous reports and interviewer- and self-administered
questionnaires. The characteristics of a good question were discussed includ-

ing comprehension, validity, and repeatability. Section 16.6 discussed the im­
portance and difficulty of performing randomised controlled trials of treat­
ment in psychiatric patients where the outcome is often subjective. Finally,
section 16.7 extended the assessment of symptoms to other aspects of a pa­
tient’s life and considered how, in crude terms, the quality of life may be
measured.

17. EARLY TRIALS ON NEW DRUGS

This chapter considers the ways in which early trials of new drugs differ from
trials of established treatments. Greenwood and Todd [243| have defined three
phases of early trials: trials to determine safety and early clinical pharmacology
(phase I); trials to determine clinical efficacy and further clinical pharmacology
(phase II); and trials for the early clinical development of the drug (phase III). A
regulatory authority may be involved in these early trials and provide the
approval necessary for the general release or marketing of a drug.
17.1 APPROVAL BY A REGULATORY AUTHORITY FOR EARLY TRIALS

Until recently the usual form of authorisation in the United Kingdom was via
a clinical trial certificate. The manufacturer of the new drug applied to the
Medicines Division of the Department of Health and Social Security (DHSS)
giving the chemistry, pharmacology, and the details of animal experiments
with the drug [244]. The DHSS division reported to the Committee on Safety
of Medicines (CSM) with advice on whether or not a clinical trial’s certificate

should be issued.
Outside the United Kingdom, it is often only necessary to inform the regu­
latory authority of the intention to perform trials on a new substance. No
certificate is issued, but the regulatory agency can object. In the United States a
notice of Claimed Investigational Exemption for a New Drug is filed with the
Food and Drug Administration (FDA). Simon and jones summarised the
countries requiring only notification, and these included most European coun-

11

tries. They also listed the countries requiring a formal detailed submission and
approval by a regulatory authority, for example, Australia, Canada, India,
Israel, and South Africa [244].
The manufacturer of a new drug is very concerned with the brain-to-bottle
time and the regulations in the United Kingdom led to long delays (sometimes
over eight months). The number of clinical trial certificates issued fell from
over 170 per year in 1972-1974 to 87 per year in 1980 [245]; early trials were
conducted in other countries. In 1981 a new scheme was introduced where
exemption from the need to obtain a clinical trial certificate may be granted
when the licensing authority receives the following:
1. certified summaries of the basic data
2. a copy of trial protocol
3. confirmation that a medical adviser to the company, working in the United
Kingdom, is satisfied that the trial is reasonable.
If the licensing authority objects within 35 days, the pharmaceutical com­
pany can still apply for a clinical trial certificate as before [245]. The relaxation
of regulations would appear reasonable on two counts: first, exemption has
always been possible for doctors and dentists conducting trials on their own
initiative and second, phase I trials appear to be very safe.
17.2 PHASE I TRIALS

*

Phase I trials tend to be open, single-dose studies and, when not a randomised
controlled trial, fall outside the scope of this book. However, a randomised
controlled trial may be appropriate at this stage. For example, a tranquillizer or
antihypertensive drug may be expected to have sedative properties and the
dose may be increased stepwise in a trial to determine whether the therapeutic
effect occurs at a lower dose than the side effect of sedation. In such a trial,
with both subjective and objective assessments, a randomised controlled trial
is appropriate with placebo control but only single-blinding. Such a trial dif­
fers from studies on established drugs as all adverse effects will be unknown
and the trial must be carefully supervised, not double-blind, conducted in a
hospital or clinical laboratory, and usually accompanied by haemodynamic
and biochemical monitoring. The laboratory must be equipped with all those
items that would be required for emergency resuscitation and the staff trained
to use this equipment. Written informed consent is, of course, essential.
Phase I studies determine whether the new drug has a pharmacological
action that may be useful in treatment, and phase II trials examine whether this
action proves a benefit in patients with disease [246].
17.3 PHASE II TRIALS

Phase I studies, having provided data on the safety and clinical pharmacology
of the new drug, are extended in phase II in order to determine clinical efficacy

in patients and the doses to be employed. These studies may well be ran­
domised controlled trials, possibly with double-blinding and conducted on
outpatients. Careful monitoring for biochemical and other adverse effects will
be necessary.
17.4 PHASE III TRIALS

i

Greenwood and Todd [243] have considered certain objectives for phase III
trials: definition of those patients who would benefit from the use of the drug;
comparison of the new drug with existing drugs; detection of less common
adverse effects; determination of any tolerance to the drug’s effect, detection of
interactions with other drugs, tobacco, and alcohol; the use of the drug in
geriatric and paediatric patients; and further studies on the mode of action.
Trials in phase III may differ little from the standard randomised controlled
trial.
17.5 REGULATIONS GOVERNING WHETHER
A NEW DRUG CAN BE GENERALLY RELEASED

Before the drug regulatory authorities of different countries will authorise the
release of a new drug they must be satisfied about the efficacy, safety, and
quality control of the product. Norway also requires a medical need for the
new drug to be demonstrated. The activities of these authorities have been
reviewed by Lumbroso [247].
Stringent clinical trials are now required by all authorities, but in the past
countries have varied widely in their requirements. Regulatory authorities
were mainly established after therapeutic disasters; for example, in 1937 the
Food and Drug Administration (FDA) was created in the United States of
America following deaths from a sulphanilamide elixir containing ethylene
glycol. In France the regulations were strengthened in 1952 following deaths
from a preparation of diethyl tin diiodide. The thalidomide disaster in 1959—
1960 led to regulatory authorities being established in many countries. The
Committee on Safety of Drugs in the United Kingdom was established in 1964
and involved a voluntary system that became law between 1968 and 1971.
Interestingly, Lumbroso points out that in Western Germany (where
thalidomide was developed) regulations were first imposed by the EEC in
1972.
The regulatory authorities still vary in the amount and type of information
required. One of the most cautious authorities is the FDA, whose deliberations
may delay the introduction of new drugs by more than three years. The FDA
also requires a copy of all record forms completed during the course of clinical
trials. The strict regulations are designed to prevent the introduction of a
potentially dangerous drug but it is admitted that, in the process, the public
may be deprived of a beneficial drug.
Some countries also require research to be replicated in their country. Lum­
broso [247] feared that this requirement could be misused for a commercial

protectionist purpose but pointed out that the replication of studies can bring
to light new therapeutic indications and clarify the action of new drugs. I lowever, this additional information may better be sought from different and
specially designed studies. The difficulty of devising an internationally accept­
able standard trial protocol for new drugs arises from differences in attitudes
and legal standards between countries rather than disagreements on scientific
merit [248]. Whether a drug will be marketed in a particular country depends
not only on the absolute safety in, say, deaths per 10,000 treatments, but also
to the country’s attitude to such deaths. Where death can follow an infectious
disease, an inexpensive antibiotic with some risk is better than a treatment that
is safer but too expensive to be provided. However, the promotional activity
of pharmaceutical companies may determine which drug is prescribed rather
than considerations of cost effectiveness [249]. It would be naive to assume
that, at the present, permission to sell a relatively dangerous drug in one
country and not another rests on sound humane principles.
17.6 POSTMARKETING TRIALS

Clinical trials on a new drug do not cease with registration for sale. Very
important trials may be started after this event in order to examine the long­
term efficacy of treatment. Also, prior to registration, it is difficult or impossi­
ble to observe a sufficient number of patients for a long enough period to
detect rare adverse reactions. It is hoped to overcome this difficulty by moni­
toring adverse effects in postmarketing surveillance schemes. However, the
large randomised controlled trial provides the best opportunity of detecting
adverse drug effects as in the trials of clofibrate and oral antidiabetic drugs
discussed in chapter 19. Other important randomised controlled trials after
marketing include further trials of efficacy in comparison with other drugs,
moi^ trials to detect interactions, further trials in certain groups such as the
young and old, and more trials to determine the optimum dose and dose
frequency. Surprisingly, often little is known about these aspects at the time of
registration.
Postmarketing trials have also been termed phase IV studies. One particular
variety has been called promotional and has been devised to familiarise clini­
cians in general with the use of the drug. Such trials must be supported when
the drug represents an important new treatment. However, they are usu­
ally employed to introduce a further sedative, antiinflammatory, or beta­
adrenoceptor blocking drug and the purpose of the trials is to sell the product.
The trials are often carried out in general practice and inducements have been
provided to persuade a large number of doctors to use the new drug. At the
end of such a study the patient may continue to receive the drug at the patient’s
or government’s expense and the pharmaceutical company may recover more
than the cost of the trial. An article in the Sunday Times of January 29, 1978
summed up the situation.

I’alimts put at risk as doctors aid din,? firms in sales drive
The Sunday Times interviewed 39 GP’s. Four admitted that they do not tell the patient
or ask permission when they are testing a new drug. Twenty-seven satd takmg part in
the trial had influenced their choice of drug, and some said they would never have
chosen the drug for the patient if they had not been asked to test it.
This article questioned the ethical position of certain doctors involved tn
these trials and then criticised the trials for failing to collect important data on
adverse events and for paying the doctors to take part. The advantages and
disadvantages of promotional trials deserve further attention and their
usefulness may depend on the drug being investigated.
Lionel and Herxheimer [250] have stressed that a “good proportion of the
drugs available arc of little importance in terms of essential health care and are
marketed mainly because they can be sold and not because they benefit the
health of the population.” We should concentrate the limited and valuable
resources for randomised controlled trials on fewer drugs.
The clinical investigator should be less willing to investigate a new drug
when it is closely similar to many that are already available. These compounds
have been termed me-too drugs and lack of interest in them on the part of
investigators would help to regulate the provision of such drugs by the phar­
maceutical companies. Resources may be better employed in examining more
original drugs but it must be admitted that a few me-too drugs prove to have a
unique place and to represent a real advance.



i

17.7 LIMITATIONS OF TRIALS IN NEW DRUGS

Small trials during the early phases of drug development may fail to detect
severe adverse reactions and new drugs may be released and subsequently
withdrawn from the market. This happened recently with tienihc acid and
previously with practolol, even though the latter drug had been used in one
large randomised controlled trial involving over 3,000 patients [254].
17.8 CONCLUSIONS

Randomised controlled trials are necessary for the evaluation of new drugs and
this chapter considered the type of trial appropriate for each phase in the
investigation of a new compound. The contribution of regulatory authorities
was discussed as they may have to approve new drugs for use in trials or for
general release.
.
. . .
Clinical investigators can be more discriminating in the type of trial in
which they become involved. They should avoid promotional trials of me-too
drugs which originate from the marketing departments of pharmaceutical
companies and should take part in trials of potentially important new drugs or
trials of established drugs where an attempt is being made to answer an impor-

tant question.

5

IS EVENT
RECOGNISED AS A
POSSIBLE ADR?

18. THE DETECTION OF ADVERSE DRUG REACTIONS
I

.1

YES

NO

IS EVENT
COMMON WITHOUT
DRUG TREATMENT?

'
NO

YES

NO

YES

An adverse drug reaction (ADR) has been defined by the World Health Or­
ganisation [251] as “one which is noxious, unintended and occurs at doses used
in man for prophylaxis, diagnosis or therapy.” The detection of symptom side
effects was discussed in chapter 16; in this section we shall consider only life­
threatening events.
An ADR will be more easily detected when it is an event known to be
associated with drug treatment and is relatively rare in the absence of such
treatment.
18.1 IS THE EVENT KNOWN TO BE PRODUCED BY DRUGS?

Figure 18-1 gives the hypothetical steps in the recognition of an ADR. If a
condition is suspected to be an adverse event and is uncommon in the absence
of drug treatment, then an ADR will be detected. If an ADR mimics a com­
mon event, usually unassociatcd with drug treatment, it will be unlikely to be
detected. If an ADR is suspected but common in the absence of treatment, it
may still be detected as will a prcvously unsuspected ADR if it is rare in the
absence of treatment. The large randomised controlled trial gives the best
opportunity for detecting previously unsuspected ADRs provided data on all
events are collected during the trial and subsequently analysed.
18.2 FREQUENCY OF THE ADVERSE EVENT

Figure 18-2 provides a schematic representation of the chance of detecting an
adverse event during a trial and the alternative, of finding a reduction in this
214

ADR NOT
EASILY DETECTED

IS EVENT >
COMMON WITHOUT
DRUG TREATMENT’

ADR KAY BE
DETECTED

ADR DETECTED

Figure 18-1. Flow chart to illustrate the detection of an adverse drug reaction (ADR) in a trial.
Detection is greatest when an event is recognised as a possible ADR and is uncommon in the ab­
sence of drug treatment. An ADR may easily be missed when it is not known to be an adverse
effect of treatment and is common in the absence of treatment. The large randomised controlled
trial provides the best opportunity of detecting such an ADR.

event in the treatment group (a benefit from treatment). An ADR will be
detected when the frequency of the event is low in the absence of a particular
drug treatment and high when the drug treatment is employed. Conversely, if
the drug frequently reduces the incidence of an event that is common without
treatment, then a benefit may be detected.
The large randomised controlled trial is the ideal method for detecting
ADRs, which occur with a frequency greater than one in 300 treated patients
and not in controls [252]. However, trials will never detect rare ADRs such as
aplastic anaemia with phenylbutazone (one in 33,000) or thrombocytopenia
with diuretics (one in 15,000) [253]. There was also a notable failure of one
large trial to detect the more common oculocutaneous syndrome with practolol [254], possibly due to failure to collect the appropriate information dur­
ing the trial [255|. With very rare ADRs, the randomised controlled trial is not
the appropriate method for detecting these conditions. It is to be hoped that

A

18.3 THE SMALL TRIAL AND THE ADR

ADR
DETECTED

NO
DIFFERENCE
DETECTED
FREQUENCY
OF EVENT
WITH DRUG
TREATMENT
BENEFIT
DETECTED

FREQUENCY OF EVENT WITHOUT DRUG TREATMENT
18”2’ S,chemat‘c ^Presentation of the chance of detecting an adverse drug reaction
(ADR) in a randomised controlled trial. When the ADR is common in
' treated‘ group and rare
in the
tn the control group, an ADR may be detected. Conversely, an event tha^ is ksTcotnmon ,
-- -------- --------- in the
treated group will represent a benefit of treatment. The central area represents the results when
neither an advantage nor disadvantage of treatment is detected.

clinical intuition, specially designed surveillance programs, and the routine
examination of vital statistics will lead to the discovery of rare events. Lewis
has prepared a table of the number of patients who would have to be surveyed
in a treated group according to the background incidence of an event and the
additional incidence due to the drug. With a background incidence of one m
1,000 and a doubling of this incidence due to the drug, 32,000 patients would
have to be surveyed [256].
A small clinical trial including less than 50 patients stands little chance of
detecting an ADR with any certainty. However, small trials have produced
clues to the presence of adverse drug reactions, and large trials have succeeded
where the ADR is a common condition. Most life-threatening ADRs occur
with a frequency of less than one in 50, an exception being provided by
primary pulmonary hypertension with aminorex fumerate which occurred
with a frequency of one in ten patients.

Adverse drug reactions may be detected in small clinical trials, but the results
are often difficult to interpret. Verc discussed this problem [257] and reviewed
four trials of chlorpromazine that reported in 1954 when the first case reports
of jaundice with this drug were appearing. In these trials 17, 22, 24, and 27
patients were given chlorpromazine. The trial with 27 patients did report one
ease of jaundice but this could not be attributed definitely to the drug. Simi­
larly, in a trial of the antihypertensive drug guanoxan, two of 160 patients
developed obstructive jaundice and one serum jaundice. Seventy percent of the
160 patients showed rises in scrum transaminase concentrations during treat­
ment against forty percent before treatment [258]. The authors commented,
“Although a rare direct toxic effect . . . upon the liver cannot be excluded . . .
this explanation of the remaining abnormalities appears to be most unlikely
. . . no ease of jaundice has been described which can, with certainty, be as­
cribed to therapy with these compounds.’’ Many would now consider that
the episodes of jaundice were due to the drug.
Trials can also provide evidence of serious toxicity without the end result of
the toxic process being observed. For example, a small trial showed evidence
of an overall reduction in peak expiratory flow rate with a drug known to
produce bronchospasm in susceptible subjects [160]. Similarly, the fact that a
drug produces hepatitis may be detected in a small trial in which liver function
tests arc monitored; and a tendency to produce serious marrow depression
may be detected by a significant reduction in platelet or white cell count.
18.4 THE LARGE TRIAL AND THE ADR

Many methods of detecting ADRs depend on case reports, either published or
incorporated in central registers. These reports tend to involve drugs that are
already under suspicion and also to concentrate on events that have been
previously shown to result from drugs—for example, aplastic anaemia and
hepatitis. When an adverse effect mimics or is identical to a common disease, it
is less likely to be recognised as an ADR. The clinical trial, with its integral
control group, is capable of detecting an excess of common events when
associated with drug treatment.
Table 18-1 lists some adverse drug effects that have been detected in three
large controlled trials. In the Coronary Drug Project Research Group trial [45]
there was a one in 50 excess of nonfatal myocardial reinfarctions in those given
high-dose conjugated oestrogens for one year, an excess of arrhythmia in those
given niacin for a year, and an excess of one in 500 in gallbladder disease in
those given elofibrate [259]. In the University Group Diabetes Program trial
[46], there was an excess of cardiovascular deaths in patients given tolbutamide
and an excess of one in 60 in those given phenformin [260]. In the World
Health Organization elofibrate trial [50] there was an excess of total mortality
in patients given elofibrate. Oliver has argued that the excess mortality may

Table 18-1. Adverse drug reactions detected in three large controlled trials. The frequency represents
the difference between the rate observed in the actively treated group and the appropriate control group.

Drug

ADR

Frequency

Number of patients in
trial receiving drug

CONJUGATED OESTROGEN
5 mg daily for 1 year
CLOFIBRATE
1.8g daily for 1 year
NIACIN
3.0g daily for 1 year
TOLBUTAMIDE
1.5g daily for 1 year

Nonfatal myocardial
infarction
Gallbladder disease

1 in 50

1,119

1 in 500

1,103

Coronary Drug
Project [45]
Same [259]

Arrhythmia

1 in 100

1,119

Same [259]

Cardiovascular death

1 in 70

204

Cardiovascular death

1 in 60

204

University Group
Diabetes Program
[46]
Same [206]

Total mortality

1 in 900

5,331

PHENFORMIN
lOOmg daily for 1 year
CLOFIBRATE
1.6g daily for 1 year

3 "3
T'

"I

O

O

Ilf

QO

-T



3 cr y
3’ bm w

,.
n

U

3c_ - rj

H

2"I



2

3‘

“3 O
O -*

X

w
3

I
CM

n.

2 S
1

d

o' ..
<T

"3

tn

<

o

o 3a cr
<

rT q

- - c „
r-r

a

Q

c

o3X oCMIM 4
z
tn 3C “

3
3

y

7

”/r Oo' a 2
O
'1

CL r?
o'

n

3

3 CL o
3 n w
rr

- 3 ror23CT
?

B'

5o

3.

2,
5’

- w

. 3- O
■' l—’ 3

<n

5 3S s2.

° ° §
^r r S.'S-n

2

o S. •

< cm ’^ 3tn
2 5
~ 3- 3

S3
00

-1 2
Q- q q y
q o
Q- 3*
M

*-1 3 .
CZJ

q tn bj S e:
n n X"
_ -2
1 CM s
>-• •
5 ?5
o 2 <r
<0

I ? ?

~j

O Q S L

tn

00
3

SM

fy


M

q

GO
7T
<T>
0Q
o 70

00

tn
to
gM
3*

CL
S-s =
2
o -< -°J 2-^

3 O y
a.
S- 2= o
a
M o
o 3 M o

a s =• = s-

„ _ s o’ s n C. c
1

h 2?- = e- 3 a m
<-n n
Q.
o
o tn
O
S
S-s 5: f3 1-g n? -7q O
o o 2. < M
S- 5
£2. Qs>
§1 §; 3- 3 3 g2 o. 2

I

CTQ

SM

3 o °
S-S

S. r 3 s'?

■T
2.
o CL tM

°
3

2
3 CM oo :

SM

CT

= o1 O’ 2 CL
q
3“ C
~ 2
53^3^3^”

00

:
a. e* o ^.250-^3
? * Q
a.
3“

3 «
tn

CL

a a
73 -3

O O O
_
3 do E
CL CL 3 w 5 q
r? G> O 'L 4- 2 2 5 J
3
3

2. cl

- 2~

1

I
5*

J 5

3

LL

n

2L

SJ

*

Ln

w

CL '■

2 o

8 §

r>" S
< ‘00S' q
3

~ 3 3 3. p
Si a?
5

o
’ 5'
5’m S-*
C'l
r-\
CL h

2a. s%

o

3
H
3^ IM

o

Q- 3
O

5^
n
2 O

3

GH

r-r
Q
3-^

a
n
O

C6

2(O c 3
QL.

cm

C6
X TJ

3M

n

23
n 3
O-

ft]

3

("0

<n
o‘

C
3

3
(T> q CL <
o 5 • co ..
H r
2

S

I- B O. > |

3 3 QS 5 o
-• 2-XJ 2’ CL
w3 2- . O -•
C
nq 2
CT "32
CL
CL
o


3. 00 CM
6
3 O -a

2 3 3 " q 2.
3 - q
q »Q

ro
tn
tn
_

S S:

c? q

-q

tn
——*1

n

o

g'r^

3—

3
CL O
o

3

o Q- o
n
> cr c3 n

q > _

q 2 w C
S5 3 33 3 do ?
M

N

rr

X

3; 3 FT
P r tM

*2.
CL

C

<O 2. CM

3*
o

3

o 3 3 H
£
3“

5-5
n
3

<’ 2.
co

U “3

3- o
§ w O
u —
cr
n o rs o
d.
<



CL
O

3 g 2 P do 13

Q- 3CL

fij
r-r

tn
tn

q o ° n

rr

'In

CL

O
Q
fi)
tn

a>

3

-r 2 CL Q rr tn CL
CL
___ O„ *
S? O 3
3 2. 8 2“ S
w S? 2C
CL CL - - 3 O
cr ~

n
o

O

11
3

C
r-T

2’ 55. Q.

-• < £

O
01

Ju

r/x



r* -> 2
3 a-’S o Cl.
CM

o r>
CL

“S' *

CM
CL O CL
Q
W CL
IM
CL q.

"='2
O
<3 - tM
3

q

CL n

a?

3X 3
~ 00

q

73 r>

3

5

" ” £. CL S 2 »
q ~ S- q o
H

2

o

M 3

-I

G)

> 2 §
3"
3- q
q
<o 3 S.q
cr
n o
n>
q
3
2
2 o •e. 2
o
2 §;

z

CM 3 2.
3q3
2
O 3 S
o
=• g o El CL
<o 2

3

3» d S

2 3- S-'C
CM
5:32
2 o 3^*0
cl rs

2
c2. M (T)IM
— . tn
tn

l

5’ 2“

" 3
H

2

2
q>
CL
o
N

Reference

WHO clofibrate
trial [50]

32
3 d £
o r
5 c 3- -• tn
O "* FT S S’ 2
m
2 O
H
tn

u--?
o 2 g
3 “< oo

*3
O 2 3'
O — CL
3* w
<-T
2. 3
<0
SM

*3 CM
C.
w

tn
tn

tn

-■

CL

X
O
O
V)

O
n
O

- J t” ..o 3- m
H
2 E. s: a g ffl
O
D
o " r> im
H
2

3:
3“ cr
G>

cr
3 qo
_

tn
tn

'■

'r\

tn

CM

rs

5 s.
0 2-

CL

H

2
S
£
£
rj
3- g 2

c a
q
c
- S -i

2

■ X

5-

cL

3 2.

a

p

-2 8. cr
Q
CM


w N) 00
q
C\

3 ? 00 L o o
2q
n
s »
q <
a

O u—I

q 3’

n'

q
> 3
o £ “

33 3

Q

£ S.??

q

g.
CL - *

tn

t O
. s?
3" —
Q

n 3* °
o Q 2-

s.

3
00

-

-t

n

3
3

O
3
tn

O JD
3
3 *O

2 3
n
3

3

q*

n
o
O

tD

IT

3
O

G>

3“

. .
rr

□’
2- o
2

r?

3

O

2.Z
o
<

rs
3

3>
3

g
n

M

3

p
3-

5

M

n
-.
3- « „

CL
<0

cl □-

s"

o
r 3 q

5!
P w
3
n

q CM
C

iM

CM

CM

u
zn

O
X> 3•’ 3
*
3,t:G - 3 2
2 Er2^
oo
r
a.
^5 0 2 2
—.

S o
H 2 ?2

3“
C5

c

Z

a
S s s- 2 q >
2 S- Si-’H. FT z
OM £
P q >
CL
o
Qon o q Q- R- 33
3
3
M q
d
O
2
o’
Z
q o 3 n z
o
y> ■* •
>
tu 3
3- 2
tn

w

O

n

;

3.

. 5

S ■*,2.< 00

M

n

CL

rr

o’

3*
<0

3
n

cr
3

n
O

& 3_
CL
re
c
o (0
CL tn

q

cr FT

3

G>

Q
rr

CL O
Q-

22 -•
£-00

3 3

CM

-1

■<

<-t

3*

O

c

tion in glucose tolerance with diuretic treatment before the occurrence of
clinical diabetes mellitus [54].
18.5.4 Other investigations
19. FAILURE TO ACCEPT THE RESULTS
OF RANDOMISED CONTROLLED TRIALS

With certain treatments it may be appropriate to monitor other bodily func­
tions: for example, using repeated electrocardiograms, chest radiographs, peak
ventilatory flow measurements, or examinations of the optic fundi.
18.6 CONCLUSIONS

The methodology for detecting an ADR in a clinical trial must be improved so
that we can be confident that drugs monitored in large long-term trials arc safe
and to be preferred to those not so evaluated. However, some ADRs will be
very difficult to detect, especially changes in affect such as depression or anxi­
ety. How will we determine that a new drug produces schizophrenia in onehalf of one percent of cases? The large randomised controlled trial with a full
documentation of all disease episodes provides the best hope.

■ f

The result of a trial may not be accepted for several reasons: the result may be
at variance with preconceived ideas; an unusual group of patients may have
been recruited; the treatment groups may not be identical in important re­
spects; too few patients may have been recruited and the power of the trial may
be too low; the results of the trial may not have been interpreted correctly; the
trial result may not be consistent across different strata of patients; the trial
may provide a result that conflicts with the results of other trials; the treatment
may be difficult to administer or have too many adverse effects; and finally,
the trial may originate from a group with a vested interest in demonstrating
the observed result (for example, a pharmaceutical company). Before discuss­
ing each of these reasons we shall illustrate them by describing three trials
whose results have not been completely accepted and also three scries of trials
on related drugs, the collective results of which arc difficult to interpret. After
discussing these trials, we shall return to the reasons for rejecting the results of
a randomised controlled trial.
19.1 THE ANTURANE REINFARCTION TRIAL

The Anturane Reinfarction Trial was a randomised, double-blind, multicentre
trial comparing sulphinpyrazone (Anturane, 200 mg four times a day) with a
placebo for the secondary prevention of myocardial infarction. The trial
started in September 1975 and an interim analysis in July 1977 revealed a
statistically significant (P 0.02) reduction in cardiac deaths after an average

Table 19-1. Results of the Anturane (sulphinpyrazone)
trial, made available in two reports [48, 242|.

Placebo
group

Sulphinpyrazone
group

742

733
40

%
reduction

p=

33

Not specified

First report

Number patients
Total mortality
(„)
All cardiac deaths
Cardiac deaths analysed
Sudden deaths analysed

60
?

?

44
29

24
13

783
85
78
62
37
24
33
220

775
64
59
43
22

?

49
57

0.02
0.02

25
24
32
43
74

Not specified
Not specified
0.06
0.04
0.0003

Second report

Number patients
Total mortality
(«)
All cardiac deaths
Cardiac deaths analysed
Sudden deaths analysed
Sudden deaths (2-7 months)
Randomised but ineligible
Withdrew from study

6

38

Nonanalyzablc deaths therefore included all deaths in patients who dropped
out and deaths among patients who did not comply with instructions to take
their medication. Nonanalyzablc deaths also included those “attributed di­
rectly to surgery in which no association could be established with a nonfatal
event while the patient was on study treatment” [262]. The nonanalyzablc
deaths were excluded from an analysis on the per-protoeol principle (section
15.7).

195

follow-up period of 8.4 months. The results were published in February 1978
[48] and are given under the heading First Report in table 19-1. Recruitment to
the trial was stopped at the time of this report but the investigators disclosed
the short-term results to the subjects in the trial and sought their individual
consent to continue. All but seven patients agreed to continue; a second report
was published in January 1980 [262] and the results arc also given in table 19-1.
A 49 percent reduction in cardiac deaths was reported in the first article and a
32 percent decrease in the second. Unexpectedly, the reduction in deaths was
not due to the postulated decrease in further episodes of myocardial infarction
but to a reduction in sudden deaths, which were possibly related to arrhyth­
mias. The trial has been criticised and the adverse comments concern the
definitions employed in the trial, the organisation of the trial, and the manner
in which the trial was published.

19.1.1 Definitions employed in the trial

19.1.1.1 Ineligible patients
Ineligible patients were those who were randomised into the trial but were
excluded from analysis by the policy committee as the patients did not meet
the criteria of the investigation protocol. It appears that some patients in this
group were excluded after they had died and therefore the definition of an
ineligible patient may crucially affect the results.

19.1.1.3 Sudden death

A sudden death was one that was cither not observed or one that occurred
within 60 minutes of the onset of symptoms.
19.1.2 Organisation of the trial
The organisation of the trial has been criticised and included the following
features:

19.1.2.1 Coordinating centre
The coordinating centre was situated at the Ciba-Geigy Corporation and its
Operations Committee was responsible for the execution of trial procedures
and the reporting of data to a policy committee. The data were verified by
independent university departments of epidemiology and the trial procedures
were similarly audited.

19.1.2.2 Policy, Audit, and Electrocardiographic Committees
These committees were independent of the pharmaceutical company.
19.1.2.3 Financing of the trial
The Ciba-Geigy Corporation, the manufacturers of Anturane, financed the
trial.

19.1.3 Manner in which the trial was reported

Table 19-1 provides the results of the trial as published in the two reports.
There was a fair agreement between the effect of Anturane on sudden deaths;
the reduction was 57 percent and 43 percent in the first and second reports,
respectively. In the second article it was stated that this reduction all occurred
between the second and seventh months of treatment (74 percent reduction in
sudden deaths).
19.1.4 Comments that have been published concerning the trial
Four important reviews arc summarised in the following sections.

19.1.4.1 Editorial in the Reoieto Epidemiologie et Sante Publique

19.1.1.2 Nonanalyzablc deaths
Nonanalyzablc deaths included those occurring either within the first seven
days of starting treatment or more than seven days after stopping treatment.

Armitage wrote this editorial in 1979 [263]. First, he considered the problem of
repeated looks (section 10.7.3) and commented, “The ART made no explicit
allowance for such repeated testing, but it seems likclv that for the

design used, whereby the first examination on the data occurred after a sub­
stantial proportion of the total patient intake, the adjustment needed would
not be great. This statement was made before publication of the second
report when, presumably, further looks were undertaken.
When discussing the selection of some end points as cither analyzablc or
sudden deaths (the definitions in 19.1.1) Armitage stated, “The investigators
have chosen to discard the safe ‘pragmatic’ approach in favour of an ‘explana­
tory’ approach which may be more sensitive to the presence of real effects but
may also suffer from bias.”

19.1.4.2 Editorial in the New England Journal of Medicine
In an editorial in January 1980 Braunwald commented [264], “It would cer­
tainly be desirable to repeat the sulfinpyrazone study to confirm its results . . . ’’
but
. . despite the great desirability of learning more about this drug, the
information available suggests that sulfinpyrazone should be approved for use
after infarction and made available to the American public at the earliest possi­
ble time.’’
r
The Food and Drug Administration considered whether or not to license the
drug for use in the secondary prevention of myocardial infarction and did not
agree [265].
19.1.4.3 Article in Science entitled "FDA says No to Anlnrane”

Kolata [265] reported that the U.S. Food and Drug Administration refused to
approve sulphinpyrazone for the secondary prevention of myocardial infarction m April 1980 on die grounds that the case for this drug was not persua­
sive. She quoted some very important comments. Paul Meier made a remark
concerning nonanalyzablc deaths: “The idea of nonanalyzable deaths is an
innovation m the analysis of clinical trials that wc can do without.” Meier also
commented that the exclusion of ineligible patients magnified the differences
between the groups in the trial. In defence, Sol Sherry (chairman of the trial’s
policy committee) pointed out that bias was prevented by the double-blind
nature of the trial and that the exclusion of certain patients was done prospec­
tively and not retrospectively after seeing the data.
Kolata quoted an epidemiologist as saying that the report of the study “was
orchestrated [by Ciba-Geigy] for presentation in the scientific and public arena
so as to create an impression that there was an uncquivocable clear-cut, dra­
matic result. What happened was almost a con job.” Meier was also reported
as saying, It was an interesting but not a convincingly positive result. It was
made into a break-through by PR [public rclations|."
Most importantly, Kolata reports that Robert Temple, head of the Food and
Drug Administration’s cardiorenal division, audited the study and found the
following:
19.1.4.3.1 THE CLASSIFICATION OF MANY SUDDEN DEATHS WAS INCORRECT.
A reclassification removed the deficiency of sudden deaths in the sulphinpy-

razonc group. The definition of sudden death “wasn’t very well thought out.
It turned out to be more crucial than anyone would have anticipated.’’
It has subsequently been suggested by a member of the trial’s policy com­
mittee that Temple was biased in his reclassification.
19.1.4.3.2 THE RESULTS WERE NOT GREATLY AFFECTED WHEN NONANALYSABLE DEA 11 IS WERE ALSO INCLUDEI) IN 'H IE ANALYSIS. A 32 percent reduction in
analysablc cardiac deaths with the use of sulphinpyrazone was not altered to a
great extent and was only lowered to a 24 percent reduction by including
nonanalysablc deaths.
19.1.4.3.3 THE RESULTS WERE ALTERED WHEN INELIGIBLE PATIENTS WERE
INCLUDED. Some patients were ruled ineligible after they had died. Temple
considered “everyone in this business knows [such exclusions of dead patients]
just arc not done’’ [265].

19.1.4.4 Ret'ieu' in the British Medical Journal

Mitchell pointed out that the numerous exclusion criteria resulted in a highly
highly
selected trial population and that the results may not be applicable to other
groups of patients with myocardial infarction [266]. He was also worried
about the analysis of the trial and considered that the only acceptable analysis
of outcome is one based on the intention-to-treat (sec section 15.7). Mitchell
concluded, “For the present, my verdict on the claim that the report of the
ART has altered the state of the art must be ‘not proven’.”
19.1.5 Conclusions on the Anturane Reinfarction Trial
The basic design of the trial was sound and although the data should have been
analysed on'both the intention-to-treat and per-protocol basis, I have not seen
evidence that an erroneous conclusion was reached. However, anxiety remains
over the borderline level of significance achieved and the trial should be re­
peated. A repeat study would be ethical as substantial proof of benefit is
lacking. Section 19.1 illustrates many of the reasons for not accepting the
results of trials, and these are summarised in section 19.7.

19.2 A MULTICENTRE TRIAL OF STREPTOKINASE
The European Co-operative Study Group for Streptokinase Treatment con­
ducted a multiccntrc trial of 24-hour treatment with streptokinase in patients
suffering from acute myocardial infarction [267]. Entry to the trial had to be
within 12 hours of the onset of chest pain and only 13.5 percent of 2,338
patients with suspected infarction could be entered into the trial. After six
months, 48 control patients and 24 treated with streptokinase had died. Table
19-2 gives the deaths according to the time after treatment and reveals the
largest benefit from 21 days onwards.

19.2.1 Critical leading article
The results were criticised in a leading article in the British Medical Journal [268]
for the follnwincv rrnenne-

Table 19-2. Results of the European Co-operative Study Group for
Streptokinase Treatment trial according to the time after treatment |267|.
Time after
treatment

Deaths in the
control group

0-21 days
21-183 days

28
20

c

;e

5? s? s? §5 gS
riocor^m-

£I

Deaths in the
streptokinase group

Cl rO

W o

I

•—• — w—<

1111 +

18

6

1. The infusion of a lytic agent for the first 24 hours could
not influence late
mortality, and this was the mortality most affected.
2. Random allocation failed to balance out all the highly relevant risk factors
In the streptokinase group, fewer patients had suffered a myocardial infarchon in the past and fewer developed dysrhythmias in the coronary care
unit. (The latter may possibly have been due to streptokinase treatment )
3. There was a low randomisation rate into the trial and many patients were
excluded.
r
4. The leading article went on to consider the merits of the trial and to discuss
the possibility that the results were real. However, the trial results were
criticised for not fulfilling the author’s expectations and producing results
that were difficult to implement. To quote,”. . . the clinical and laboratory
complexities inherent in any effective and well-controlled lytic regimen
will limit the practical impact of the study on doctors ...” [268],

19.2.2 Conclusions on the multicentre trial of streptokinase
The criticisms of the multicentre trial do not appear too serious. Low random­

isation rates are a common and usually unavoidable problem when early treat­
ment is required, and the assumption that streptokinase could not affect late
mortality cannot be supported as acute treatment may limit the size of the
infarct. Random allocation often results in groups that are unequal in a small
number of respects, but an adjustment can be made during analysis. Finally
although lytic treatment is a difficult procedure there is now a vogue for
infusing streptokinase directly into the
coronary arteries—a much more onerous task.
19.3 TRIALS OF ASPIRIN IN THE SECONDARY
PREVENTION OF MYOCARDIAL INFARCTION

Elwood [269] summarised the results of six randomised controlled trials of
aspirin agamst placebo in the secondary prevention of myocardial infarction.
able 19 3 is denved from his work. Five of the trials demonstrated a reduction
in total mortality of between 15 and 30 percent whereas the sixth and largest
found an increase in total mortality of 11 percent [275], However, this large
trial (the Asp.nn Myocardial Infarction Study) did find a decrease in nonfatal
intarctions of 22 percent.
The average reduction in m-ortality with aspirin was
15 percent (8 percent if

3
u
•n
'"o
o O
on_D

2 8
5 ^3
8 Q-

£ §

CO r<) <--) oo CO h'

O' cd o ~t ci o

_c

'"o
V

on c

2;-

§ 3
S. §

c
■q.

'O 00 tD rO O' 00
iri cd ri 6 c



I?
V

■E P

w £
1W
o E

—.

C <2

H .£

.s t
E 3.

is
L-

(J

c

rt

5 5

0'O''OC'4'©Tt
C3 c; n x — Cl

ci tn

\c ci in

8
5
V

a.

o — n rn -t m
r- r-'
r-

ci ci ci ri ri cm

■E
r-3 rt
<-> U

gc £
E

u

>

’t \O 00 O' o o
('■' f''' t'' p' oo OO
O' O' O' O' O' O'

S'o
2 c
o
o c

H

22

.2 ‘o1
<J u.
O CL
iz> bn

iss
. "O

£
c

8
$

GO

It -g

c

I o

I

S 8
jo -£

H

w O D lu a. <

the result is adjusted according to the numbers in the trials). Aspirin would
appear to have a small effect that is difficult to detect in randomised controlled
trials.

Di .£ -U

H

in

H

o

s

o

> 3

o oo
Cl

y-g

"c

I

CL

ci <n

o
CL O

X8
E ’T w
E>
U Cl£L

sc C)
r-’ o

-C

cT
m
a.

v 2
-2 O

•s g

o

o

an

"so

k-t

rt

H

_c

■§

5)
o
o
c

Q

O O'
O 't

-=1

<

c *>

■“

- J3
C 3
« on
LU

“8

>
CL

o “
o C
•2 <

o
*t r-

o

si

JJ

■o

o
H

is

'•n

O S
xi

EL. • §-<

2Q-Q.>

2 3
^2
oE

§

u

O'
Cl 00

g

s

>
<

rc -f
— m

in

2
§
u

m

m

O'

ci m
3 in

00

O'
'O

o o
2 in

o

o
o

O'

>
<

m

sO

o u

E 'T w
E>
Q CL CL

§8

m ci

T3 CL
1 >'

ST

3 e
Q.

E
8 c

O

c io
'.2
o c

g
u

o.2
.E E

o o

L> -O

^8

ol
u

8 S

o

O

-S E

o

o

ou

3*

19.5 TRIALS OF BETA-ADRENOCEPTOR BLOCKING DRUGS IN
THE SECONDARY PREVENTION OF MYOCARDIAL INFARCTION

Vedin [277] gave an address at an international conference and considered the
results of five trials of beta-adrenoceptor blocking agents in the secondary
prevention of myocardial infarction. Table 19—5 reproduces the results of Vcdin’s review in which he successively pooled the level of significance for these
trials and concluded that beta-adrenoceptor blocking drugs (or at least alprenolol and practolol) were effective in reducing total mortality after a my­
ocardial infarct. Moreover, he suggested that these data were so conclusive
that it may be unethical to allow any further placebo-controlled trials on this

in

§ §x

19.4 TRIALS OF ANTICOAGULANTS IN THE SECONDARY
PREVENTION OF MYOCARDIAL INFARCTION

An international anticoagulant review group combined the results of nine trials
of long-term anticoagulant administration after myocardial infarction [52],
The pooled results arc given in table 19-4 for males and females separately.
Total mortality was 20 percent lower in men given anticoagulants (P < 0.01)
but only 8 percent lower in women. It is easy to understand why anticoagulant
therapy was abandoned for women in most countries, but why was this treat­
ment rejected for men? Anticoagulant therapy for myocardial infarction has
continued in the Netherlands and is used for both men and women, even the
elderly [276]. In most other countries the gains from therapy were not thought
to be worth the difficulty of administering anticoagulants. To quote Mitchell
[266], “. . . even if the claims [for anticoagulants] were valid the apparent
benefit was too small to justify the hassle of conventional anticoagulant regi­
mens.”
A 20 percent reduction in mortality has not been considered worth continu­
ous anticoagulant therapy. This treatment involves repeated estimations of
blood coagulability, and bleeding may occur as an adverse effect of too much
treatment. The occasional patient may therefore die as a result of treatment and
it may be unacceptable to the prescribing clinician to cause one death through
treatment even though he may witness several deaths that may (but often may
not) have been prevented by treatment.
Of the 16 members of the International Anticoagulant Review Group [52],
14 considered ”. . . the findings warranted a conclusion that anticoagulant
therapy probably prolonged survival at least over two years but that benefit
was largely restricted to patients with a history of anterior or previous infarc­
tion.”
The remaining two members ‘‘were not convinced that long-term therapy
prolonged survival.”

<

J

on
ou

^8
•§ in
2 c
t .2

y
5

<z>

V

Q

c> m

tn o
■m- '.2

’t o-

o’

Q

-H*

Q-

<

< i

s

Cl Cl

F
Cl

z

° .£

ii

u

Ji.
<

<

J2 _

o
o

Q

00
Cl

co m

Cl

Cl

UJ

5

u

£ 2.
•m-. e
o

w

— ’t

Q-i

m’

>. O

iIE°

O' co
m m

J. cd£•
X o

£ <✓>8

U

o

O

s §

i
C
S *
E

•5 ?■

CQ

Q
w
h

Q.

u

4 5.
*—«

2

>
>

5

u
w
u
w
</)

h
.

s
k.

in cl

c/^

O'

H

w

£

JO

o

£g

<

I
c

1

u

E

rd
u- (Z)

5 r~
S kJ <

r- *-*
CO

£

- V
rt
''
u

in

o

g
y

5

J O "3

subject. “Nearly 20,000 patients round the world are either already enrolled in
or will be enrolled in prospective secondary prevention trials with beta­
blockers. This massive program-costing an estimated 30 million dollars a year
is unlikely to benefit either patients or science ...”
Reviewing the same data, Hampton [282] stated, “In 1965, Snow described
a clinical trial of propranolol in patients with acute myocardial infarction; he
found a considerable reduction in mortality . . . This was the first post­
infarction trial of a beta-blocker, and none of the many subsequent trials have
demonstrated such marked benefit. Baber and Lewis reviewed the trials on
beta-adrenoceptor blocking drugs and published the 90 percent confidence
limits [123]. Of 18 trials, eight had confidence limits encompassing a 50 per­
cent increase in mortality, and 14 a decrease in mortality of 50 percent.
Trials now being reported are larger and support the concept that beta­
blocking drugs arc helpful in the secondary prevention of myocardial infarc­
tion [170, 283]. Some of these drugs may be more useful than others in
secondary prevention but this would only partly explain the divergent results.

V

1,
>-§

ri

sc

fS

co

£

CM

I
-o

s s

_rt

£
<Z) u
u

O

o
ei

o

R

o

19.6 THE UNIVERSITY GROUP DIABETES PROGRAM TRIAL

The University Group Diabetes Program study was a trial of treatment in
newly diagnosed diabetic patients who did not require insulin and had a good
prognosis for a five-year survival [46]. Patients were randomly allocated to
placebo (PLBO), the oral hypoglycaemic drug tolbutamide (TOLB), a vari­
able dose of insulin (IVAR), or a standard amount of insulin (STD). All
patients were given dietary advice. An additional group was randomised to
receive phenformin but they were not recruited at the start of the trial and
cannot be expected to be identical to the other groups. Table 19-6 gives the
results of the trial excluding those in the phenforamin group [260]. There was
a statistically significant excess of cardiovascular deaths in the group treated
with tolbutamide. Total deaths were also increased in this group but the excess
did not reach statistical significance. Not surprisingly since tolbutamide is a
popular treatment for diabetes, this trial has been criticised.

H
a?
0
O
D

Jo
H

CJ

S'

8

$>CM

H

E
2
o^

U1

c!
JJ

^O

C5

J3

Q

8

a,

2 £

ij^

CCM

CM

Q.

19.6.1 Criticisms of the University Group Diabetes Program trial

The conclusion that tolbutamide therapy was associated with an excess of
cardiovascular deaths has been criticised on the grounds of inadequate data
collection, admission of ineligible patients, administration of a fixed dose of
tolbutamide, failure to detect a statistical increase in total mortality, inequality
of the groups after randomisation, an abnormal outcome in the placebo group,
excess deaths not being observed in every clinic, and patients being transferred
from one group to another.
19.6.1.1 Inadequate data collection

No data were collected on important risk factors such as smoking or the
duration of diabetes prior to entering the trial [284,
.
, 285].j- There
___ e was no way of
knowing whether the groups were comparable for these important factor*;

O

d

s.
§

u

bi)

>

•so



■S

^O

13

”3

1•u

a

u

Js

<Z1

4>

O

1H

E

Z

s.

Q

O

Q-

nJ

o

so

£

8
*"5

u
J**
§
z

G

o

</u>
3

g

til
II

<

0.

233

19.6.1.1.1 defence. Randomisation and the large numbers in the trial make
biologically important differences between the groups unlikely.
19.6.1.2 Admission of ineligible patients

Certain patients were admitted to the study who should have been excluded on
the basis of a poor prognosis for a five-year survival. These patients may have
been unevenly distributed between the groups. On the other hand, in many
C,aSeS,j C ,7etes was very mild °r its presence questionable. These patients
should only have been eligible for dietary advice and should not have entered
the trial. Sixty-mne patients were admitted without meeting a glucose toler­
ance test criterion [284, 285].
19 6.1.2 1 DEFENCE. Again, randomisation and the large numbers in the trial
would make an important and unequal distribution between the groups un­
likely. It can be agreed that patients with borderline diabetes should only
receive a diet, but that statement can only be made with confidence now that
we know the result of the trial. Also, the dilution of the treatment groups with
patients who do not have classical diabetes should not bias the results However inehg^le patients, as defined by exclusion criteria, should not have been
included.

19.6.1.3 Administration of a fixed dose of tolbutamide

°fto,butamide was givcn to all subjects without regard to their
individual needs [285].
19;611-3'1 DEFENCE- The result of the trial is relevant to a fixed dose. A
variable dose may produce a different result but this remains to be proved.
19.6.1.4 No definite increase in total mortality

There was no statistically significant increase in total mortality in the tol­
butamide group and noncardiovascular deaths were reduced in this group
19.6.1.4.1 DEFENCE. Total mortality was still higher in the tolbutamide
group.

19.6.1.5 Inequality of the groups after randomisation
The tolbutamide group had more patients with a high serum cholesterol or
major electrocardiographic abnormalities, more males, more obese patients
and more w.th a history of angina [284], Similar remarks were made about the
group randomised to phenformin [286].
19.6.1.5.1 DEFENCE. The tolbutamide group included fewer hypertensive
patients. Moreover, an adjustment for baseline differences did not materially
affect the results [287].
7

19.6.1.7 The excess deaths were not observed in every clinic
Most of the excess deaths in the group given tolbutamide occurred in only
three of the 12 clinics. Schor remarked, “It would appear to any reasonable
statistician that for some reason or other the randomisation procedure broke
down in these three clinics over some period of time ...” [284].


,^FENCE- No evidence has been provided that randomisation
brokc down. Treatment
------------- 1 was assigned by the coordinating centre and not the
treating clinic.
19.6.1.8 Transfer ofpatients from one group to another

Some patients transferred from one group to another and analyses were only
performed on the intention-to-treat principle andI not by the per-protocol
method.
19.6.1.8.1 DEFENCE. There were ver y few transfers, but both kinds of analysis should have been presented.

19.6.2 Conclusions
The University Group Diabetes Program trial was well designed, and the
analysis has not been seriously faulted. When a trial shows a possibly beneficial
result of treatment it can be repeated; however, it would not be ethical to
repeat a trial to confirm a suspected adverse effect. The results must therefore
be accepted at this stage, but as patients arc still being given hypoglycaemic
drugs, it is hoped that observational studies will clarify any adverse conse­
quences of such treatment and indicate which drugs, if any, may be safely
prescribed. It may then be possible to arrange a randomised controlled trial on
these compounds. A trial could also be restricted to those patients who do not
wish to take insulin and do not diet effectively. These subjects could ethically
be randomised to tolbutamide or placebo provided they had symptoms from
hypcrglycacmia that could benefit from treatment and were also informed of
the result of the University Group Diabetes Program trial prior to giving
informed consent. This trial would be of possible benefit to the patient and
could be justified (section 3.12).
The Committee for the Assessment of Biometric Aspects of Controlled
Trials of Hypoglycemic Agents [288] concluded, “We consider that in the light
of the UGDP findings it remains with the proponents of the oral hypoglyccmics to conduct scientifically adequate studies to justify the continued use of
such agents.”

19.6.1.6 Abnormal outcome in the placebo group

19.7 REASONS FOR NONACCEPTANCE OF
RANDOMISED CONTROLLED TRIAL RESULTS

There were no deaths from myocardial infarction in the placebo group [284]
19.6.1.5 1 DEFENCE. This could have been a chance occurrence because the
mortality from myocardial infarction was very low with this

The reasons for nonacceptance include: results at variance with preconceived
ideas; errors in performance of the trial; errors in analysis; an atypical selection
of patients; failure of randomisation to produce equivalent groups; failure to

>35

results within different groups in the same trial and between different trials; the
adverse effects of treatment; and the fact that the trial orginates from a group
with a vested interest in a particular result.

19.7.3.3 Classification of end points not well defined

The definition of sudden deaths in the Anturane Reinfarction Trial was inade­
quate (section 19.1.4), and a reclassification of these deaths may have altered
the conclusions to sonic extent.

19.7.1 Preconceived ideas not in agreement with the results
The Anturane Reinfarction Trial claimed that Anturane produced a large re­
duction in sudden deaths after myocardial infarction. This was not expected by
most medical practitioners. A reduction was anticipated only for recurrent
myocardial infarctions, and the unexpected result was one reason why the
result of the Anturane Reinfarction Trial has not been accepted (section 19.1).
The European Trial on Streptokinase treatment showed a reduction in late
mortality. This was unexpected and the trial criticised (section 19.2).

19.7.4 Restricted selection of patients at entry

Feinstein has criticised the University Group Diabetes Program trial for failing
to define terms such as congestiue heart failure-, for having vague selection crite­
ria; for failure to obtain important baseline data and information on the quality
of life during the trial; for the quantity of missing data; for the difficulties in
standardising the protocol (four clinics initially employed serum rather than
whole blood determinations of glucose); and for discontinuing the use of
tolbutamide before stopping other treatments [285].

Patients entering a trial do not usually represent patients in general. This fact
has led to criticism of many trials of the secondary prevention of myocardial
infarction, including the Anturane Reinfarction Trial (19.1.4). Many trials will
fail to achieve general validity (chapter 5), but in the Anturane Reinfarction
Trial, patients originally diagnosed as having a myocardial infarction were
removed from the trial after randomisation, apparently distorting the results
(19.1.4). However, the exclusion of patients before randomisation only re­
duces the general applicability of the results and does not produce bias.
The European trial of streptokinase treatment was criticised as only 13.4
percent of available patients were entered into the trial. Most of the patients
were excluded due to inability to give the treatment within 12 hours of the
onset of chest pain. It was reasonable to exclude these patients from the trial as
the treatment was only thought to exert an effect in the first 12 hours. It would
be more sensible to recalculate the inclusion rate with the numbers presenting
within 12 hours as the denominator.

19.7.3 Errors in the analysis of the trial

19.7.5 Treatment groups not identical at entry

Errors in analysis were discussed in section 15.1. Three very important errors
are discussed next: failure to analyse on the intention-to-treat principle; the
effects of repeated looks; and an inadequate classification of end points.

Randomisation worked effectively in the Anturane Reinfarction Trial to give
similar groups, but this does not always occur. In the European trial of strep­
tokinase and in the University Group Diabetes Program trial the groups were
not identical in some important respects.

19.7.2 Errors in the performance of the trial

19.7.3.1 Failure to analyse on the intention-to-treat principle
The analysis of the Anturane Reinfarction Trial provides a classic example of
the failure to analyse on the intention-to-treat principle (section 15.7, 19.1).
Sulphinpyrazone takes seven days to exert a full effect and the effect will be
absent seven days after stopping the drug. The investigators therefore ex­
cluded patients in whom the drug could not have been active. They also
excluded the corresponding placebo treated patients but discarded the safe
intention-to-treat approach to analysis in favour of the per-protocol approach
which may suffer from bias. Conventionally, patients should be retained in
their groups after randomisation [124], and this was the approach adopted by
Elwood in his trials of aspirin following myocardial infarction [270, 273],

19.7.3.2 Effect of repeated looks on the significance of the statistical tests reported

It appears that the problem of repeated looks (section 10.7.3) was not consid­
ered initially in the Anturane Rcinfarction Trial.

19.7.6 Too few patients in the trial

The small numbers of patients in many of the trials of beta-adrenoceptor
blocking drugs has led to several being reported as negative but with very low
power. Baber and Lewis [123] have graphically illustrated the low power in
these trials by providing the 90 percent confidence limits.
A trial may give a negative result not only if too few patients are entered but
also if the true effect of treatment is very small. If a treatment confers only a
small benefit, the trials have to be larger to prove this with any certainty. For
example, if the true reduction in mortality with active treatment is greater than
50 percent, fewer patients will be required to prove this effect than to demon­
strate a 20 percent reduction. In the secondary prevention of myocardial in­
farction, the effects of certain treatments may be small; for example, reduc­
tions in total mortality of up to 15 percent for aspirin and 20 percent for
anticoagulants.

19.7.7 Faulty interpretation of the trial results

19.7.9 Different trials give different results

An example of faulty interpretation is provided by trials of antihypertensive
agents where baseline blood pressure is determined before the start of the trial.
As the patients become accustomed to the procedures adopted during the trial,
their average blood pressure tends to fall. This order effect may thus be
superimposed on any treatment effect, inflating the apparent effect of treat­
ment. Douglas-Jones and Cruickshank [289] examined the use of atenolol as an
antihypertensive agent and compared three different doses in a cross-over,
random order, double-blind fashion. Unfortunately, baseline blood pressure
was always determined prior to the commencement of the trial. The authors
were able to conclude that there was no difference between the blood pressure
on the three doses, but they should have been more cautious in concluding,
“Atenolol effectively decreased lying and standing blood pressures” as the fall
in pressure from the start of the trial may have been enhanced by giving the
active doses last. The baseline assessment should have consisted of a double­
blind period of placebo treatment given in random order during the body of
the trial. However, the authors appear to have determined a correct baseline
pressure, neither inflated by observer bias nor order effect, as the result of this
trial agrees quantitatively with other trials where baseline blood pressure was
determined correctly [290].

In three sections on trials in the secondary prevention of myocardial infarction
(section 19.3 on the use of aspirin; 19.4 on anticoagulants; and 19.5 on betaadrcnoccptor blocking drugs) different results were apparent, some trials
showing benefit and others none. It is a small wonder that the positive results
have not been widely accepted. However, combining the results can provide a
more clear picture of any overall pattern [52].

If a treatment is uniformly effective and capable of being provided, it will
presumably be offered irrespective of difficulties of administration and labora­
tory control. However, treatments arc not uniformly effective; only a propor­
tion of patients will benefit and the costs and difficulties have to be taken into
account. The complexity of treatment was given as a reason for not imple­
menting the results of the European Trial of Streptokinase in acute myocardial
infarction (section 19.2).
1 he trials of anticoagulant treatment following myocardial infarction (sec­
tion 19.4) indicated a reduction in male mortality of 20 percent, but the com­
plexity of treatment was such that its use has declined except in the Nether­
lands, where it is still employed.

19.7.8 Trial results not consistent in different subgroups

19.7.11 Vested interest of originating group

It is desirable to examine the results of the trial in subgroups that arc obviously
important such as the two sexes, different races, and various age groupings. It
is less desirable to invent subgroups after examining the data. For example, the
best results may be observed in unmarried Chinese women over the age of 70
but the trial may include few such persons and a report on a small selected
subgroup may be misleading.
After analysing the results in different groups, the effect of treatment may be
shown to be inconsistent; this may raise doubts as to the generality of any
conclusions. In an important trial of specialised care for hypertensive patients,
the Hypertension Detection and Follow-up Program Trial, small im­
provements in blood-pressure control were associated with an overall decrease
in mortality. The result was not found in white women when analysed sepa­
rately [132, 211]. This subgroup was not small and raises the question of the
generality of the results. Similarly, it appears that men, but not women, may
benefit from aspirin in the prevention of venous thromboembolism [2911. In
the trials of bcta-adrcnoccptor blocking drugs, different effects have been
reported in the elderly [281] that were not confirmed in other trials [170, 283].
In one trial of these drugs, there was a better response with an anterior my­
ocardial infarction [251] whereas in a second trial a greater treatment effect was
observed with a posterior infarct [170]. Great caution has to be employed in
subgroup analysis (section 15.7).

The Anturanc Reinfarction Trial was criticised on account of being funded and
analysed by the pharmaceutical company making Anturanc (section 19.1.4),
but this trial employed independent policy and audit committees and the criti­
cism should not be taken seriously. However, pharmaceutical companies arc
responsible for a number of promotional trials and some of these require close
attention. These trials arc usually concerned with the acceptability of their
products and a comparison of these with those of their competitors. The
motivation for these trials arises from competitive marketing. An example was
provided in section 1.3.

19.7.10 The treatment is too difficult or has too many adverse effects

19.8 CONCLUSIONS

This chapter considered 11 reasons why the result of a particular trial may be
rejected and gave several examples to support these assertions. As trial design,
execution, and analysis improve it is hoped that the proportion of results that
arc rejected will be reduced. Results will still be falsely positive on occasions,
and little can be done to reduce the strength of a reader’s preconceived ideas,
avoid inconsistency of the treatment effect in different subgroups, or prevent
different results emerging from separate trials. However, errors of perfor­
mance, analysis, and interpretation can be minimised, sufficient representative
patients can be recruited, and trials sponsored by the pharmaceutical com­
panies can be monitored and analysed by independent bodies. Even if a result

is accepted it may not lead to any change in clinical practice. A treatment will
not be employed if it is considered too difficult, has too many adverse effects,
or is too expensive.
As the standard of randomised controlled trials improves, so will the quality
of critical appraisal. We must admit that in the future the results of trials may
be as frequently rejected as they are today.

20. THE ADVANTAGES AND DISADVANTAGES
OF RANDOMISED CONTROLLED TRIALS

This chapter reviews the advantages and disadvantages of performing ran­
domised controlled trials. Observational studies may fail to include controls or
may use an inappropriate comparison group. The advantage of the ran­
domised controlled trial lies in the greater confidence with which we can
accept the results. Randomised controlled trials have prevented the introduc­
tion and continued use of useless and dangerous treatments.
On the debit side, although we may have confidence in a result, the result
may nevertheless be incorrect. Also, it is possible that the performance of a
randomised controlled trial may delay the introduction of a useful treatment.
Further, when a patient is randomised to the treatment that proves least effec­
tive, that individual may suffer as a consequence. Although this is true, a
randomised controlled trial should only be used when there is genuine doubt
about the efficacy of the treatments and the investigator cannot know
if patients will suffer. A placebo may not prove to be the less desirable
treatment as in some trials this has proven to be the most beneficial therapy
[46, 50]. Lastly, it is expensive to perform a randomized controlled trial and
when the effect of treatment is very marked it may be unnecessary.
20.1 ADVANTAGES OF RANDOMISED CONTROLLED TRIALS

20.1.1 Randomised controlled trials as the best
estimate of a beneficial effect of treatment

The benefit to be derived from treatment must be established by comparing
the results with cither no treatment or some alternative therapy. Such control

n

groups may either consist of patients studied in the past or subjects included in
a randomised controlled trial where randomisation provides a comparable
control group that is treated simultaneously. Attempts to establish that histor­
ical controls are adequate [292] have not been supported by the literature [5,
21, 88]. Retrospective data arc not valid for comparative purposes if the pat­
tern of disease changes with time or the type of patient differs between two
periods of observation. For example, an investigator may study a scries of
patients and then announce that he intends to evaluate a new treatment in the
future. Patients that are referred to him for the new treatment may be more or
less severely affected than the control group. However, an anonymous author
concluded, “If the change is rapid................. then a randomised trial may be
invalidated just as much as any other” [293]. This is difficult to accept as the
control group will be treated simultaneously with the treatment under investi­
gation and the result of a randomised controlled trial will be much less in­
fluenced by any change than a comparison with historical controls.

20.1.2 Randomised controlled trials as proof that
a supposedly beneficial treatment is dangerous
One of the largest contributions to scientific knowledge resulting from the use
of randomised controlled trials has been the discovery—not only that a treat­
ment may be useless—but that it may be dangerous. This was discussed in
chapter 18, and it cannot be stressed too strongly that even if a treatment
appears to have a beneficial action in the short term, the long-term effects must
still be assessed in a randomised controlled trial. For example, elofibrate, a
drug that lowers serum cholesterol, was compared with placebo in the pri­
mary prevention of ischaemic heart disease. Although scrum cholesterol was
lowered by treatment and some heart attacks were prevented, overall mortal­
ity was increased by this drug treatment—an unexpected finding that could
not have been detected without a large randomised controlled trial.

20.2 DISADVANTAGES OF RANDOMISED CONTROLLED TRIALS
20.2.1 Falsely negative results
The most common example of a misleading answer is the false negative result.
The term misleading is used rather than incorrect as a trial provides an estimate
of a treatment effect and the authors and readers should consider the
confidence limits for that result. A treatment effect may not be statistically
significant but if the 95 percent confidence limits lie between a deterioration of
30 percent and an improvement of 50 percent, the conclusion of no proven
effect may be misleading but not erroneous. Put another way, if a trial is too
small to detect a true benefit, this is not a criticism of randomised controlled
trials in general but only of a particular trial. Several small trials may some­
times be combined to give an overall estimate of the effect of a treatment
(section 15.8). The methods for combining the results from many trials have

been published [294, 295] and such analyses may be able to utilise the results
from small trials.

20.2.2 Falsely positive results
It appears rare for a trial to give a false positive result. At a recent symposium
on clinical trials Sir Richard Doll identified one example [296] and Maxwell
has quoted another [297]. Maxwell considered that with results achieving the 5
percent level of significance, one in 20 would not be expected to be in error as,
“Ethical considerations demand a sound scientific reason for believing that the
null hypothesis [no difference between treatments] is certainly not likely to be
true. Thus, in clinical research this error is much rarer than 5 percent and its
detection even rarer still—also for ethical reasons” [298]. We can agree that a
trial with a positive outcome in favour of a treatment is unlikely to be repeated
for ethical reasons and therefore a false positive result may not be detected. But
a five percent level of significance implies a one in 20 chance of a false positive
result when a drug truly has no advantage. However, when the treatment is
effective then, by definition, a false positive result will not be observed. A
randomised controlled trial is no worse than any other method of investigation
in rejecting the null hypothesis when it is correct and the absence of bias in
well-conducted randomised controlled trials will reduce the number of false
positive results.
When early trials differ in reporting a benefit then a scries of trials may be
published. Pooling of the results often suggests that the benefit from treatment
is small (sections 19.3 and 19.4). Unfortunately, there may be a tendency not
to publish the results from trials that prove negative. Both authors and editors
will be inhibited from publication and Pcto [69] and others [299] have ob­
served that smaller trials tend to show greater positive effects than larger trials.
Presumably a large trial, requiring more effort and providing a more depend­
able result, tends to be published whether negative or otherwise, whereas
small trials may be published more frequently when a positive result is ob­
served. Therefore Maxwell has suggested that journal editors “assist the
identification of negative results by publishing such work by title only—rather
than not at all” [298]. Such papers would have to be reviewed and registered
with the editor even if they were not printed.
20.2.3 Delay of a valuable new treatment

An editorial in the British Medical Journal discussed early randomised controlled
trials with a new treatment and stated [293], “Our current insistence on ran­
domised controlled trials has undoubtedly had a salutary effect on loose think­
ing but more than once this has been at the expense of progress.’’ Unfortu­
nately, an example was not provided to substantiate the latter claim but it is
theoretically possible that the performance of randomised controlled trials will
delay the introduction of a new and effective treatment. Fortunately, an insis-

tcncc on randomised controlled trials may also prevent progress with a useless
or dangerous treatment.

20.2.7 No detection of rare adverse effects

A very rare adverse effect may not be detected even by a large randomised
trial. Clinical impression, monitoring systems, and national vital statistics
provide the only hope that a very rare adverse reaction will be discovered
[252|.

20.2.4 Best treatment denied to control group

Finckc discussed a hypothetical trial of a new anticanccr drug and concluded
that if more patients die in the control group, then the doctors arc guilty of
manslaughter [300]. However, it can be argued that if a randomised controlled
trial is not performed and a new treatment is not discovered to be associated
with more deaths than the control treatment, then if the treatment is widely
introduced, many doctors will be guilty of manslaughter and many more
patients will be victims.
Ethical problems arc avoided if there is genuine doubt about the efficacy of a
new treatment. Also, when informed consent is sought the patient knows he
may be randomised to a control group and gives his consent to take part in the
trial. A leading article in the Lcincct discussed the problems of not obtaining
informed consent and quoted a double-blind placebo-controlled trial of the
side effects of oral contraceptives. Six pregnancies occurred on placebo when
the subjects were not aware that a placebo was being employed, although they
were advised to use a spermicidal cream [301, 302]. To avoid ethical problems,
the subjects must be fully informed of the nature of the trial. This poses
difficulties in paediatrics, psychiatry, and sometimes in the treatment of pa­
tients with cancer [303].
When a disease is nearly always fatal and no effective treatment is available,
then a randomised controlled trial may be unnecessary since any improvement
is obvious [301]. When the outcome is not always predictable a sequential trial
or a trial with variable allocation of subjects to treatment may avoid some of
the ethical pitfalls of withholding effective treatment from large numbers of
patients (section 11.7).

20.2.8 Effect obscured by several confounding factors
An outcome may be affected by many factors other than treatment. Black has
argued that if attempts arc made to restrict randomisation according to many
factors, effective randomisation becomes impossible [304]. The answer is not
to restrict randomisation but to allow for any difference retrospectively.
20.3 CONCLUSIONS

This chapter in itself is not intended to persuade the reader that the advantages
of randomised controlled trials outweigh the disadvantages. Rather, the entire
book is directed to this end. It may be fitting to conclude with Cochrane’s
finding that, in the Northern Hemisphere, randomised controlled trials have
failed to spread to the Catholic South and Communist East. Cochrane consid­
ered that the main explanation is the lower extent to which medical students
arc scientifically educated in these regions. However, he pointed out that
Sweden, although situated in the North and West, is an exception. There arc
few randomised controlled trials carried out in Sweden compared to the num­
ber of meticulous observational studies performed. Cochrane admits that
other factors may influence the performance of randomised controlled trials:
the memory of war crimes in Germany, the extent of private practice and, the
authoritarian structure of Soviet medicine [305].
The main advantage of randomised controlled trials lies in the confidence
with which we can view the results; the disadvantages arc trivial in compari­
son. I therefore hope that more randomised controlled trials will be performed
in the future and their use extended much further in the fields of surgery,
obstetrics, orthopaedics, psychiatry, physiotherapy, and sociology. This book
is dedicated to any person who embarks on a randomised controlled trial as a
result of reading it.

20.2.5 Expense of randomised controlled trials
The performance of a randomised controlled trial is more expensive than
uncontrolled observational studies. However, when a comparison group is
required to demonstrate a treatment effect, historical controls arc not adequate
and a randomised controlled trial should be performed. The additional expense
will be amply repaid by the improved quality of the data.

20.2.6 Undetectable small treatment effect
Randomised controlled trials do have their limitations and will not detect small
effects of treatment. A trial to detect a 10—20 percent reduction in a common
event such as myocardial infarction will require an impossibly large number of
• subjects studied for a long period of time. A large trial may detect a 25-50
percent reduction in a common event but not an equivalent reduction in a rare
event.

I

.!

21. REFERENCES

1. Bradford Hill. A. The clinical trial. Br. Med. Bull. 7:278-282, 1951.
2. Cochrane, A.L. Effectiveness and efficiency: Random reflections on health services. London, The
Nuffield Provincial Hospital Trust, 1972.
3. Medical Research Council Investigation. Streptomycin treatment of pulmonary tuberculosis.
Br. Med. J. 2:769-782, 1948. Material reproduced with permission.
4. Doll, R., and Peto, R. Randomised controlled trials and retrospective controls- Br Med J
44:280, 1980.
J
" 5. Rose, G. Bias. Br. J. Clin. Pharmacol. 13:157-162, 1982.
6. L Etang, J.C. Historical aspects of drug evaluation. In: The principles and practice of clinical
1970 parr,3S’ E L” and Fitzgera,dJ D- (eds-)- Edinburgh and London, E. and S. Livingstone,

7 248^^95 Th6 h*stor’cal development of clinical therapeutic trials. J. Chronic Dis. 10:2188. Bradford Hill, A. Statistical methods in clinical and preventive medicine. Edinburgh and London,
E. and S. Livingstone. 1962.
9. Maitland, C. Account of inoculating the smallpox. London, |. Robertson, 1972.
10. Williams, W. Masters of Inedicine. London, Pan Books, 1954. P. 23.
11. Jenner, E. An inquiry into the cause and effects of the variolae vaccinae. London, S. Low, 1798.
12. Pearson, G. An inquiry concerning the history of cowpox. London, J. Johnson, 1798.
13. Waterhouse, B A prospect of exterminating the small pox. Boston, Cambridge, 1800.
14. Haygarth, J. Of the imagination as a cause and cure of disorders of the body, new cd Bath R
Crutwell, 1801. P. 3.

15. Sutton, H.G. Cases of rheumatic fever treated for the most part by mint water Guy's How
Rep. 2:392-428, 1865.
F’
16. Laplace, P.S. Theorie analytique des probabilites. Paris, 1812.
17. Louis, P.C.A. Essay on clinical instruction. Translated by Martin P. London, S. Highley, 1834.
18. Louis, P.C.A. Recherches sur les effects de la saignee. Paris, De Mignarct, 1835.
19. Bartlett, E. An essay on the philosophy ofmedical science. Philadelphia, Lea and Blanchard, 1844.
20. Lister. J. On the effects of the antiseptic system upon the salubrity of a surgical hospital.
Lancet 1:4 and 40, 1870.

21. Pocock, S.J. Randomised controlled trials [letter], Br. Med. J. 1:1661, 1977.
22. Vallery-Radot, P. Pasteur 1822-1895. Paris, 1922.
23. Fibiger, J. Om scrumbehandling af Diftcri. Hosp. Tid., Kjebenh 4:6, 309, 337, 1898.
24. Porrit, A.E., and Mitchell, G.A.G. An investigation into the prophylaxis and treatment of
wound infection. In: Penicillin therapy and control in 21 army group. London, 21 Army Group,
1945. P.7.
25. Amberson, J.B., McMahon, B.T., and Pinner, M. Clinical trials of sanocrysin in pulmonary
tuberculosis. Am. Rev. Tuberc. 24:401-435, 1931.
26. Glaser, E.M. Ethical aspects of clinical trials. In: The principles and practice of clinical trials,
Flarris E.L. and Fitzgerald, J.19. (eds.), Edinburgh and London, E. and S. Livingstone, 1970.
27. Wade, O.L. Human experiment 2, clinical aspects. In: Dictionary o f medical ethics, Duncan.
A.S., Dunstan, G.R., and Wclbourn, R.B. (eds.). London, Darton Longman and Todd,
1977.
28. Mitschcrlich A., and Mielke, F. Doctors of infamy: The story of the nazi medical crimes. New
York, Shuman, 1949. Pp. xxiii-xxv.
29. World Medical Association. Human experimentation, code of ethics of the World Medical
Association. Declaration of Helsinki. Br. Med. ]. 2:177, 1964.
30. Bradford Hill, A. Medical ethics and controlled trials. Br. Med. J. 1:1043-1049, 1963. Mate­
rial reproduced with permission of author and editors of Br. Med. J.
31. Medical Research Council. Responsibility in investigations on human subjects. Br. Med. J. 2:
178-180, 1964.
32. Committee Appointed by the Royal College of Physicians of London: Supervision of the
Ethics of Clinical Investigations in Institutes. Br. Med. J. 2:429-430, 1967. Later report
published by the college: Report of committee on the supervision ofthe ethics ofclinical investigation
in institutions. London, 1973.
33. Report of a Committee Appointed by Governor Dwight H. Green of Illinois. Ethics govern­
ing the service of prisoners as subjects in medical experiments. {AMA 136:457-458, 1948.
34. Zelen, M. A new design for randomized clinical trials. N. Enql.J. Med. 300:1242-1245,
1979.
35. Beecher, H.K. Ethics and clinical research. N. Engl. J. Med. 274:1354-1360, 1966.
36. Beecher, H.K. Pain, placebos and physicians. Practitioner 189:141-155, 1962.
37. Veterans Administration Co-operative Study Group on Anti-Hypertensive Agents. Effects
of treatment on morbidity in hypertension. II Results in patients with diastolic blood pres­
sure averaging 90 through 114 mm Hg. JAMA 213:1143-1152, 1970.
38. Beecher. H.K. Surgery as placebo: A quantitative study of bias. JAMA 176:1102-1107, 1961.
Material reproduced with permission. Copyright 1961, American Medical Association.
39. Cobb, L.A., et al. An evaluation of intcrnal-mammarv-artery ligation by a double-blind
technique. N. Engl. J. Med. 260:1115-1118, 1959.
40. Dimond, E.G., Kittle, C.F., and Crockett, J.E. Evaluation of internal-mammary artery
ligation and sham procedure in angina pectoris. Circulation 18:712-713, 1958.
41. Adams, R. Internal-mammary-ligation for coronary insufficiency: An evaluation. N. Engl. J.
Med. 258:113-115, 1958.
42. Fish, R.C., Grymes, T.P., and Lovell, M.G. Internal-mammary-artcry ligati<ion for angina
pectoris: Its failure to produce relief. N. Engl. J. Med. 259:418-420, 1958.
— 43. Amery, A., ct al. Antihypertensive therapy in elderly patients: Pilot trial of the European
Working Party on high blood pressure in the elderly. Gerontology 23:426-437, 1977.
44. McPherson, K. Statistics: The problem of examining accumulating data more than once. N.
Engl. J. Med. 290:501-502, 1974.
45. Coronary Drug Project Research Group: The Coronary Drug Project. JAMA 214:13031313, 1970.
46. University Group Diabetes Program: A study ot the effects of hypoglycemic agents on
vascular complications in patients with adult-onset diabetes. Diabetes 19, suppl. 2:747-830,
1970.
47. Veterans Administration Co-operative Study Group on Anti-hypcrtcnsivc Agents: Effects of
treatment on morbidity in hypertension. Results in patients with diastolic blood pressure
averaging 115 through 129 mm Hg. JAMA 202:1028-1034, 1967.
48. 1 he Anturanc Rcinfarction Trial Research Group: Sulfinpyrazone in the prevention of car­
diac death after myocardial infarction. N. Engl. J. Med. 298:289-295, 1978.



49. Bulpitt, C.J., Scmmence, A., and Whitehead, T. Blood pressure and biochemical risk fac­
tors. Acta Cardiol. 33:109-110, 1978.
50. Committee of Principal Investigators: A Cooperative trial in the primary prevention of
ischaemic heart disease using clofibrate. Br. Heart J. 40:1069-1118. 1978.
51. Elwood, P.C., et al. A randomised controlled trial of acetyl salicylic acid in the secondary
prevention of mortality from myocardial infarction. Br. Med. J. 1:436-440, 1974.
52. An International Anticoagulant Review Group. Collaborative analysis of long-term antico­
agulant administration after acute myocardial infarction. Lancet 1:203-209, 1970.
53. A progress report of the European Working Party on High Blood Pressure in the Elderly
(EWPHE). Cardiac and Renal Function with increasing Age in Elderly Hypertensives. In:
Mild hypertension: Natural history and management, Gross, F. aiid Strasser. T. (cds.). Tunbridge
Wells, Pitman Medical, 1979. Pp. 181-197.
54. Amery, A., et al. Glucose intolerance during diuretic therapy: Results of trial by the Euro­
pean Working party on 1

Hypertension in the Elderly. Lancet 1:681-683, 1978.
55. Fowler, F.G. and Fowler, H.W. The Pocket Oxford Dictionary of Cuurrent English, SR cd.
Oxford Clarendon Press, 1969.
' 56. Rose, G., and Hamilton, P.J.S. A randomised controlled trial of the effect on middle-aged
men of advice to stop smoking. J. Epidemiol. Community Health 32:275-281, 1978.
57. Veterans Administration Co-operative Study Group on Antihypertensive Agents. Effects of
treatment on morbidity in hypertension: II. Influence of age, diastolic pressure, and prior
cardiovascular disease: Further analysis of side effects. Circulation 45:991 — 1 (M)4, 1972.
58. Hypertension Detection and Follow-up Program Cooperative Group. Five-year findings of
the hypertension detection and follow-up program: II. Mortality bv race sex and ace
JAMA 242:2572-2577, 1979.
b ’

77. Beecher, H.K., ct al. Effectiveness of oral analgesics (morphine, codeine, acetylsalicylic acid)

1

59. ^79^ P Terminating a trial—The ethical problem. Clin. Pharmacol. Ther. 25:633-640,
60. Stamler, J. When and how to stop a clinical trial: Invited remarks. Clin. Pharmacol Ther
25:651-654, 1979.
61. Benedict, G.W. LRC Coronary Prevention Trial. Baltimore. Clin. Pharmacol Ther 25 685687, 1979.
62. Schoenbcrger, J.A. Recruitment to the Coronary Drug Proj<jcct and the Aspirin Myocardial
Infarction Study. Clin.Pharmacol. Ther. 25:681-684, 1979.
63. Croke, G. Recruitment for the National Co-operative Gallstone study. Clin. Pharmacol
Ther. 25:691-694, 1979.
64. Prout, T.E. Patient recruitment: Other examples of recruitment and solutions. Clin. Phar­
macol. Ther. 25:695-696, 1979.
65. Zelen, M. The randomisation and stratification of patients to clinical trials. /. Chron Dis
27:365-375, 1974.
66. Wright, I.S., Marple, C.D., and Beck, D.F. Myocardial Infarction. Its clinical manifestation and
treatment with anticoagulants. New York. Grune and Stratton, 1954. Pp. 8-10.
67. Weinstein, M.C. Allocation of subjects in medical experiments. N. Engl. J. Med. 291:12781285, 1974. Material reproduced with permission.
68. Edcrer, F. Patient bias, investigator bias
bias and
and the
the double-masked
double-masked procedure
procedure in
in clinical
clinical trials
trials.
Am. J. Med. 58:295-299, 1975.
69. Pcto, R. Clinical trial methodology. Biomedicine Special No: 24-36, 1978.
70. I eto, R., ct al. Design and analysis of randomised clinical trial requiring prolonged observa­
tion of each patient. I. Introduction and design. Br.J.
Hr. J. Cancer 34:585-612, 1976.
71. Roethlisbergcs, F.G., and Dickson, W.J. Management and the worker. An account ofa research
program conducted by the Western Electric Company, Hawthorne Works, Chicago. Cambridge,
Mass., Harvard University Press, 1946.
72. Beecher, H.K. The powerful placebo. JAMA 159:1602-1606, 1955.
73. Keats, A.S., Beecher, H.K., and Mostellcr, F.C. Measurement of pathological pain in distinction to experimental pain. J. Appl. Physiol. 3:34-44, 1950.
Keats, A.S., D Alessandro, G.L. and Beecher, H.K. Report to the council on pharmacy and
chemistry. JAMA 147:1761-1776, 1951.
75. Beecher, H.K., ct al. Field use of methodonc and Icvo-iso-mcthadonc in a combat zone.
U.S. Armed Forces Med. J. 2:1269-1276, 1951.
76. Lasagne, L., ct al. Study of placebo response. Am. J. Med. 16:770-779, 1954.

)

I

and problem of placebo ‘reactors’ and ‘non reactors’. J. Pharmacol. Exp. Ther. 109:393-4(X),
1953.
78. Travcll, J , ct al. Comparison of effects of alpha-tocopherol and matching placebo on chest
pain in patients with heart disease. Ann. N.Y. Acad. Set. 52:345-353, 1949.
79. Evans, W., and Hoyle, C. Comparative value of drugs used in continuous treatment of
angina pectoris. Q. J. Med. 2:311-338, 1933.
80. Greiner, T., ct al. Method for evaluation of effects of drugs on cardiac pain in patients with
angina of effort; study of khcllin (visamin). Am. J. Med. 9:143-155. 1950.
81. Bulpitt, C.J. Heparin as an analgesic in myocardial infarction. A double-blind trial. Br. Med.
J. 3:279-281. 1967.
82. Jcllinck, E.M. Clinical tests on comparative effectiveness of analgesic drugs. Bioinet. Bull.
2:87-91, 1946.
83. Gravcnstein, J.S., Dcvloo, R.A., and Beecher, H.K. Effect of antitussive agents on experimental and pathological cough in man. J. Appl. Physiol. 7:119-139, 1954.
84. Hillis, B.R. The assessment of cough-suppressing drugs. Lancet 1:1230-1235, 1952.
85. Shapiro, A.K. Factors contributing to the placebo effect. Their implications for
psychotherapy. Ain. J. Psychothcr. 18: suppl. 1. 73-88, 1964.
86. Dollcry, C.T. A bleak outlook for placebos (and for science). Ear. J. Clin. Pharmacol. 15:219221, 1979.
87. Byar, D.P., ct al. Randomised clinical trials. Perspectives on some recent ideas. N. Engl. J.
Med. 295:74-80, 1976.
88. Christie, D. Bcfore-and-aftcr comparisons: a cautionary role. Br. Med. J. 2:1629-1630, 1979.
89. Doll, R., and Pcto, R. Mortality in relation to smoking: 20 years' observation on male
British doctors. Br. Med. J. 2:1525-1536, 1976.
90. Ballintine, E.J. Randomized controlled clinical trial. National Eye Institute workshop for
ophthalmologists. Objective measurements and the double-masked procedure. Am. ].
Ophthalmol. 79:763-767. 1975.
91. Pozdena, R.F. Vcrsuchc iibcr Blondlots ‘Emission Pcsante’. Ann Physik. 17:104, 1905.
92. Seabrook, W. Doctor Wood. New York, Harcourt, Brace, 1941. P. 234.
93. Fletcher, C.M. Criteria for diagnosis and assessment in clinical trials. In: Controlled clinical
trials, Hill, A.B. (cd.). Springfield, III., Charles C. Thomas, 1960.
94. Kahn, H.A., ct al. Scrum cholesterol: its distribution and association with dietary and other
variables in a survey of 10,000 men. Isr.J. Med. Sci. 5:1117-1127, 1969.
95. Wilson, E.B. An introduction to scientific research. New York, McGraw-Hill. 1952.
96. Bcarman, J.E., Locwcnson, R.B., and Gullcn. W.H. Muench's postulates, laws, and corol­
laries or Biometricians’ views on clinical studies. Biomcfiics note 4, National Eye Institute,
Bethesda, Md., 1974.
97. Foulds, G.A. Clinical research in psychiatry. J. Mcnt. Sci. 104:259, 1958.
98. Report of a co-operative randomised controlled trial. Control of moderately raised blood
pressure. Br. Med. J. 3:434-436, 1973. Material reproduced with permission.
99. Knowclden, J. In: Prophylactic trials, medical surveys and clinical trials, Witts. L.J. (cd.). London,
Oxford University Press, 1959.
100. National Diet-Heart Study Research Group. The National Diet-Heart Study. Circulation 37,
suppl. 1:1253-1259, 1968.
101. Pearson, R.M., Bulpitt, C.J., and Havard, C.W.H. Biochemical and haematological changes
induced by ticnilic acid combined with propranolol in essential hypertension. Lancet 1:697699, 1979.
102. U.S. Department of Health, Education and Welfare. National Institutes of Health. Cold
study reveals some vitamin C influence; more research needed. Bethesda, Md. NIH Record 25:4,
1973.
103. Heaton-Ward. W.A. Influence and suggestion in a clinical trial (Niamid in mongolism). /.
Ment. Sci. 108:865-870, 1962.
104. Abraham, H.C., ct al. A controlled clinical trial ofimipramine (Tofranil) with outpatients.
Br.J. Psychiatry 109:286-293, 1963.
105. Report to the Medical Research Council by its clinical psychiatry committee: Clinical trial of
the treatment of depressive illness. Br Med. J. 1:881-886, 1965.
106. Report of Medical Research Council Working Party on Mild to Moderate Hypertension.

MeTj™!

°f trCatn,Cnt f°r miId hyPcr'cnsion: design and pilot trial. Br.

by thC Mangement Committee. Initial results of the Australian Therapeutic Trial in
mild hypertension. Clin. Sci. 57:449s-452s, 1979.
108. Cooper, G.R. The World Health Organization Centre for Disease Control Lipid Standardiza97ni05°grai11 11
co"lrc,l chemistry. Berlin, Walter de Gruytcr and Co., 1976. Pp.

I07’

109. Rogot, E. and Goldberg, I.D. A proposed index for measuring agreement in test-rctest
studies. J. Chron. Dis. 19:991-1006, 1966.
110. Bulpitt, C.J., Dollery, C.T., and Came, S. Change in symptoms of hypertensive patients on
referral to hospital clinic. Br. Heart J. 38:121-128. 1976.
11L
^XCSS‘On t0WardS nicdiocrity in hereditary stature. J. Anthropological hist.

i

lD.Z*tO—ZoJ, looO.

112. Ferris F.L and Edercr, F. External monitoring in multiclinic trials: Application
from
ophthalmologic studies. Clin. Pharmacol. Ther. 25:720-723, 1979.
' an^ I')°rC’ C
A random-zero sphygmomanometer. Lancet 1:337-338,
■*•114. Rose, G.A., Holland, W.W., and Crowley. F.A. A sphygmomanometer for cpidcmiologists. Lancet 1:296-300, 1964.
115. Kahn,
1975 H.A., et al. Standardising diagnostic procedures. Am. J. Ophthalmol. 79:768-775,

116.

fram<:"'ork for thc

assurance of clinical data. C/in. PI,anneal. Ther.

Zj:/vu—/vZ, iy/v.

117. Schwartz, D., Flamant, R., and Lellouch, J. Clinical trials. Translated by M.J.R. Healy.
London, Academic Press, 1980. Pp. 29-33.
118. Hamilton, M., Thompson, E.M.. and Wisniewski. T.K. The role of blood pressure control
in preventing complications of hypertension. Lancet 1:235-238, 1964.
119. Clark, C.J. and Downic, C.C. A method for the rapid determination of the number of
patients to include in a controlled clinical trial. Lancet 2:1357-1358, 1966.
120. Coronary Drug Project Research Group. The Coronary Drug Project: Design, methods and
baseline results. Circulation 47 and 48, suppl. 1:12-137, 1973.
121. National Diet-Heart Study Report. Appendix Aa-c. Sample size estimates for medical trials
Circulation 37 and 38, suppl. : 1279-1308. 1968.
122‘ ?flnS?o2‘Am0ACta’blOCkerS ’n i,n,ncdiatc treatment of myocardial infarction. Br. Med. I.
Zolrlvoo, 19o0.
123. Baber N S and Lewis, J. A. Beta-blockers in the treatment of myocardial infarction (let­
ter). Br. Med. J. 3:59, 1980.
124. Schwartz. D and Lellouch. J. Explanatory and pragmatic attitudes in therapeutical trials. J
Chron. Dis. 20:637-648, 1967.
J
125. Halperin M et al. Sample sizes for medical trials with special reference to long-term
therapy. J. Chron. Dis. 21:13-24, 1968.
126. George, S.L., and Desu, N.M. Planning the size and duration of a clinical trial studying the
Chroit. Dis. 27:15-24, 1974.
time to some critical event. J. Chron.
127. Sondik E.J., Brown, B.W., and Silvers, A. High risk subjects and the cost of large field
trials. J. Chron. Dis. 27:177-187, 1974.
128. Cochran, W.G. Sampling techniques. New York: Wiley, 1963. P. 14.5.
129. Nam, J M. Optimum sample sizes for the comparison of the control and treatment B/ometnes 29:101-108. 1973.
130. Gail. M., et al. How many controls? J. Chron. Dis. 29:723-731, 1976.
131. Dunnett, C.W. Multiple comparison procedure for
comparing several treatments with a
control. J. Am. Stat. Assoc. 50:1096, 1955.
132. Hypertension Detection and Follow-up Program Cooperative Group. Five-year findings of
the hypertension detection and follow-up program. 1. Reduction in mortality of persons
with high blood pressure including mild hypertension. JAMA 242:2562-2571, 1979.
133. Hills M and Armitage, P. The two-period cross-over clinical trial. Hr. J. Clin. Pharmacol,
ci:/—20, 1979.
134. Meier, P. and Free, S.M. Further
Further consideration
consideration of
of methodology
methodology in
in studies
studies of
of pain
pain relief
relief.
Biometrics 27:576-583, 1961.

I
1

i

J

I

135. Fisher, R.A. In: The design of experiments. Edinburgh, Oliver and Boyd, 1947.
136. Wilson, C., Pollock, M.R., and Harris, A.D. Diet in the treatment of infective hepatitis.
Therapeutic trial of cysteine and variation of fat-content. Lancet 1:881-883, 1946.
137. Acnishanslin, W., et al. Antihypertensive therapy with adrenergic beta-receptor blockers and
vasodilators. Eur. J. Clin. Pharmacol. 4:177—181, 1972.
138. Pearson, R.M., ct al. Trial of combination of guanethidine and oxprenolol in hypertension.
Br. Med. J. 1:933-936, 1976.
139. Chalmers, J., ct al. Effects of timolol and hydrochlorothiazide on blood-pressure and plasma
renin activity. Double-blind factorial trial. Lancet 2:328-331, 1976.
140. Lynch, P., et al. Objective assessment of anti-anginal treatment: a double-blind comparison
of propranolol, nifedipine and their combination. Br. Med. J. 1:184-187, 1980.
141. Williams, E.F. Experimental designs balanced for the estimation of residual effects of treat­
ments. Aust. J. Sci. Res. Assoc. 2:149-168, 1949.
142. Cochran, W.G., and Cox, G.M. In: Experimental designs, 2nd cd. New York, Wiley & Sons,
1957. P. 133.
143. Armitage, P. Sequential medical trials. Oxford, Blackwell Scientific Publications. 1960.
144. Wald, A. Sequential analysis. New York, Wiley. 1947.
145. Robertson, J.D., and Armitage, P. Comparison of two hypotensive agents. Anaesthesia
14:53-64, 1959.
146. Snell, E.S., and Armitage, P. Clinical comparison of diamorphinc and pholcodine as cough
suppressants, by a new method of sequential analysis. Lancet 1:860-862, 1957.
147. Anscombe, F.J. Fixed sample-size analysis of sequential observations. Bioinelrics 10:98—100,
1954.
148. Cochran, W.G. Newer statistical methods, hi: Quantitated methods in human pharmacology and
therapeutics, Lawrence, D.R. (cd.). London. Pcrgamon. 1959. Pp. 119-143.
149. Zclen, M. Play the winner rule and the controlled clinical trial. J. Am. Stat. Assoc. 64:131 —
146, 1969.
150. Meier, P. Terminating a trial—the ethical problem. Clin. Pharmacol. Ther. 25:633-640. 1979.
151. Chalmers, T.C. When and how to stop a clinical trial: Invited remarks. Clin. Pharmacol.
Ther. 25:649-650, 1979.
152. Hill, C., and Sancho-Garnier, H. The two-armed bandit problem, a decision theory ap­
proach to clinical trials. Biomedicine 28 42-43, 1978.
153. Bcarman, J.E. Randomized controlled clinical trial. National Eye Institute Workshop for
Ophthalmologists. Writing the protocol for a clinical trial. Am. J. Ophthalmol. 79:775-778,
1975.
154. McFate Smith, W. Problems in long-term trials. In: Mild hypertension: Natural history and
management. Gross, F. and Strasser, T. (cds.). Pitman Medical. Tunbridge Wells, England,
1979'. Pp. 244-255.
155. Sprict, A., and Simon, P. Questions a sc poser pour verifier un protocole d'essai thcrapeutique avant d’en entreprendre I'cxccution. Therapie 32:633-642, 1977.
156. Clinical Trials Unit, Department of Pharmacology and Therapeutics, London Hospital
Medical College. Aide-memoire for preparing clinical trial protocols. Br. Med. J. 1:1323—
1324,1977.
157. Hamilton, M. Computer programmes for the medical man: A solution. Br. Med. J. 2:10481050, 1965.
158. Wright, P., and Haybittie, J. Design of forms for clinical trials (1), (2) and (3). Br. Med. J.
2:529-530,590-592,650-651, 1979.
159. Bulpitt, C.J., Dollcry, C.T., and Came, S. A symptom questionnaire for hypertensive
patients. J. Chron. Dis. 27:309-323, 1974.
160. Nicholls, D.P., ct al. Comparison of labetalol and propranolol in hypertension. Br.J. CLin.
Pharmacol. 9:233-237, 1980.
161. Survey Control Unit. Central Statistical Office. Ask a silly question. Government Statistical
Service HMSO, 1976.
162. Bennett, A.E., and Ritchie, K. Questionnaires in medicine: A guide to their design and use.
Oxford: Nuff, Prov. Hosp. Trust. 1975.
163. Tinker, M.A. In: Bases for effective reading. Minneapolis, Minnesota Press, 1965.
164. Clark, H.H. Psychol. Rev. 76:387. 1969.
165. International nonproprietary names (INN) for pharmaceutical substances: WHO, 1976.

23

ill

16fi. Bdpitt,^C.J., ct al. Thc symptom patterns of treated diabetic patients. J. Clnvn. Dis. 29:571-

= |i'

*

MeVhuetCpa^bX.^^

"""XS X°

192. Pcto, R., ct al. Design and analysis of randomized clinical trials requiring prolonged observa­
tion of each patient. II: Analysis and examples. Hr. J. Cancer 35:1-39, 1977.
193. Hogben, L., and Sim, M. Thc self-controlled and self-recorded clinical trial for low-grade
morbidity. Hr. J. Preu. Soc. Med. 7:163-179, 1953.
194. Bulpitt, C.J. Thc design of clinical trials. Hr. J. Hosp. Med. 13:611-620, 1975.
195. Feinstein, A.R. Clinical biostatistics. A survey of thc statistical procedures in general medical
journals. Clin. Pharmacol. Ther. 15:97-107. 1974.
196. Zar.J.H. In: Biostatisticalanalysis. Englewood Cliffs, N.J., Prentice-Hall, 1974. Pp. 130-131.
197. Nic, N.N., ct al. Statistical package for the social sciences. New York, McGraw-Hill, 1975.
198. Healy, M.J.R., and Whitehead, T.P. Outlying values in thc national quality control scheme.
Ann. Clin. Hiochem. 17:78-81, 1980.
199. Barnett, V. The study of outliers: Purpose and model. Appl. Stat. 27:242-250, 1978.
200. John, J. A. Outliers in factorial experiments. Appl. Stat. 27:111-119, 1978.
201. Sprackling, M. E., ct al. Blood pressure reduction in thc elderly: a randomised controlled trial
of mcthyldopa. Hr. Med. J. 283:1151-1153, 1981.
202. Mantel, N. Evaluation of survival data and two new rank order statistics arising in its
consideration. Cancer Chemother. Rep. 50:163-170, 1966.
203. Breslow, N.E. Analysis of survival data under thc proportional hazards model. Int. Stat.
Rev. 43:45-57, 1975.'

L,’nd°"' MaK"1"" bo°ks-

168. Amery, A., Ct al. Antihypertensive therapy in pati<
ients above age 60: Third interim report of
the Europi
jean Working Party on High Blood Pressure in Elderly (EWPHE). Acta Cardiol.
33:113-134, 1978.
I69'
",s of clinical
d",,cal ,rials ,n card™'”
i, J.R. Presentation and analysis ot
of thc rcsi
results

J

17°. Norwegi;
170.
Norwegian Multicentrc Study Group. Timolol-induced reduction in mortality and rein-

1981

S'

I

i

I

" ,n P

entS surv,v,n8 acutc myocardial infarction. N. LnXl. J. Med. 301:801-807,

171. Wilkinson, G.N. Estiniatiiion of missing values for thc analysis of incomplete data. Biometrics
14:257-286, 1958.
172. Gordis, L. Conceptual and me
' ’ ’
icthodologic
problems in measuring patient compliance. In:
Compliance
care,i' Haynes,
Taylor, R.W.. and Sackett, D.L. (cds.). Baltimore
null nnJnnin ihealth
k
J."’ R.L.,
"
and London,
London, John
Hopkins University
University Press,
Press, 1'*"
1979.
Pp. 23-65.
and
John Hopkins
’“ “
— "
173. Sackett, D.L. A compliance
busy practitiom
practitioner. In: C.nnp/nmrc in health one,
. iancc practical
practical for
for the
the busy
Haynes, R.L., Taylor, R.W., and Sackett, 17. L. (cds.). Baltimore and London. John Hop­
kins University Press, 1979. Pp. 286-294.
174. Feinstein, A .R., ct al. A controlled study of three i
'
methods of prophylaxis against streptococcal infection
.
....in a populati
■ ■
Jt0" of rheumatic children. II. Results of the first three years of thc

U2m 1959
175’ Llts^^^

204. Wulff, H.R. Letter: Confidence limits in evaluating controlled therapeutic trials. Lancet

CVaIUat"’g 'l,C ,na,"tC"a',« of °ral Prophylaxis. N. Dtxl. J.
°f Pa'imt d°Sa^ d™-' -ports with ptll

176. Gordis. L„ Markowitz. M.. and Lil'icnfeld. A M. The inaccuracy in using interviews to

estimate patient reliability in taking medications at home. Med. Care 7:49-54, 1969.
u pitt.
J., C ifton, P., and Hoffbrand. B.I. Factors influencing over and under178 MnSHmptA°,r1 °f a"tI-hyPiert^sivc drugs. Acta Int. Pharmacodyn Ther. suppl.:243-25() 1980
8. Mushhn, A L, and Appel F. A Diagnosing potential noncompliancc. Physicians’ ability in a
170 1) I ?7Un d’rncns,on of 'ncdical care. Arch. Intern. Med. 137:318-321. 1977.
bottle m1 7 Ca^^^O^, H S” a|nd HS'’ B P McaSl,ri,,8 i,ltakc of 3 Prescribed medication. A
180 b
a u d 3
tCC,imcluc co'"P’-"-cd. Clin. Pharmacol. Ther. 1 1:288-337, 1970
0' ^26^1^^^ RJ' Fa""- °f <ld'd™ - ™ Pentcilhn by mouth. N. D,!.
.

I

I

181 •

H- Tbc
of.compliance distnbutors on therapeutic trial. In: Ct,,.,/,/inner in
nh^H I
yt7S'
,Tay'Or' R W - a"d Sackc,t' 15 L (‘-■ds ). Baltimore and London
John Hopkins University Press, 1979. Pp. 297-308.
'82' g5,7nStI07d A R' B‘os,i,istical Problems in ‘compliance bias'. Clin.
Ther. 16:846'7 J/,

17/7,

,83' 895:Oiri23S-1128d W,™"' '' Stat‘Stical cvaluation of medical journal manuscripts. JAMA
195:1123-1128,1966.
184. Schoolman,
H.M.,
et al. —
Statistics
in i research: principles versus practices. /. Lab.


v.- in medical
Clin. Med. 7:357-367. 1968.
185. Lionel, N.D.W., and Herxheimer A
Assessing reports of therapeutic trials. Hr. Med. /.
3:637-640, 1970.
186. Gore, S.M., Jones, I.G., and Rytter, E.C I"
Misuse of statistical methods: critical assessment
in T
“J from
\... January} to March 1976. Hr. Med. J. 1:85-87. 1977
1Q_ of articles
artlc,cs "]
B.M.J.
187.
Freinian,
A.,’ Ct
ct ’al.
The0 im
importa
8 ’ fnd
’T’ JJ. A
' eV
POrta,,ce of bcta’ the type II error and sample size in the design

wmKXtr

I

)

c°"'ro1 ,r';li Survcy °r71 ‘,,C8a,ivc’tnals'N-

,88' c!vX,tA61BlT1980S: H°W ‘° dCtCCt’ COrrCCt a"d PrCVCnt crrors thc nlcd'"1
189. Armitage, P. Statistical methods in medical research. Oxford and Edinburgh, Blackwell
Scientific, 1971.
190. Sncdccor, G.W., and Cochran, W.G. Statistical methods. Ames, Iowa, Iowa State Univcrsit
y
Press, 1967.
191. Petrie, A. Lecture notes oh medical statistics. Oxford and Edinburgh, Blackwell Scientific, 1978.

i

2:969-970. 1973.
205. Leading article. Interpreting clinical trials. Hr. Med. J. 2:1318, 1978.
206. Baber, N.S., and Lewis, J. A. Beta-blockers in treatment of mvocardial infarction. Hr. Med.
J. 2:59, 1980.
207. Gore, S.M. Statistics in question: Assessing methods—confidence intervals. Hr. Med. J
283:660-662, 1981.
208. Edcrer, F. A parametric estimate of the standard error of thc survival rate. J. Am. Stat. Assoc.
56:111-118, 1961.
209. Cox, D.R. Regression models and life tables. J. R Stat. Soc. Hr. 34:187-202. 1972.
210. McMichael, J. Anticoagulants: Another view. Hr. Med. J. 2:1007. 1964.
211. Hypertension Detection and Follow-up Program Cooperative Group. Five-year findings of
thc hypertension detection and follow-up program. II Mortality bv race-sex and age. JAMA
242:2572-2577, 197.9.
212. Amery, A., ct al. Hypotensive action and side effects of clonidinc-chlorthalidonc and
mcthyldopa-chlorthalidonc in treatment of hypertension. Hr. Med. ]. 4:392-395, 1970.
213. McMahon, F.G. Efficacy of an antihypertensive agent. Comparison of mcthyldopa and
hydrochlorothiazide in combination and singly. JAMA 231:155-158, 1975.
214. Gibb, W.E., ct al. Comparison of bcthanidinc, alpha-mcthyldopa^and reserpine in essential
hypertension. Lancet 2:275-277, 1970.
215. Hcfferman, A., ct al. A within-patient comparison of debrisoquinc and mcthyldopa in
hypertension. Hr. Med. J. 1:75-78, 1971.
216. Conolly, M.E., ct al. A crossover comparison of clonidinc and mcthyldopa in hypertension.
Eur.J. Clin. Pharmacol. 4:222-227, 1972.
217. Oates, J. A., ct al. Thc relative efficacy of guancthidine, mcthyldopa and pargylinc as antiliypertensive agents. N. Engl. J. Med. 273:729-734, 1965.
218. Prichard, B N.C., ct al. Bcthanidinc, guancthidine and mcthyldopa in treatment of hyper­
tension: A within patient comparison. Hr. Med. J. 1:135-144, 1968.
219. Schooler, K K. A study of errors and bias in coding responses to open end questions.
Dissertation Abstr. 16:2542, 1956.
220. Young, D.W. Evaluation of a questionary. Methods Inf. Med. 11:15-19, 1972.
221. Collcn, M.F., ct al. Reliability of a self-administered medical questionnaire. Arch. Intern.
Med. 123:664-681, 1969.
222. Rose, G.A. Thc diagnosis of ischaemic heart pain and intermittent claudication in field
surveys. Hull. WHO 27:645-658, 1962.
223. Maccoby, E.E., and Maccoby, N. The interview: A tool of social science. In: Handbook of
social pychology, Lindzcy, G. (cd.). Reading, Mass., Addison-Wesley. 1954.
224. Mcllncr, C. The self-administered medical history. Theoretical possibilities and practical
limitations of thc usefulness of standardized medical histories. Acta Chir Scand. Suppl.
406:1+, 1970.

I

Z3.

1

i'
i!

i
1

I

I

I
i

i
•|

225. Belson, W., and Duncan, J. A. A comparison of the check-list and the open response ques­
tioning systems. Appl. Stat. 11:120-132, 1962.
226. Cannel, C.F., and Kahn, R.L. The collection of data by interviewing. In: Research Methods in
Behavioral Sciences, Festingcr, L.. and Katz. D. (eds.). New York, Dryden. 1953.
227. Cronbach, L.J. Further evidence on response sets and test design. Izdiic. Psych. Meas 10-3
1950.

-.
228. Anderson, J., and Day, J.L. New self-administered medical questionary. Br. Med. J. 4:636638, 1968.
229. Anastasi, A. In: Psychological testing. New York, Macmillan, 1961.
230. Parten, M.B. In: Surveys, polls and samples. New York, Harper, 1950.
231. Shepherd, M. Implications of a multi-centred clinical trial of treatment of depressive illness.
In: Anti-depressant drugs, Garattini, S., and Dukes, M.N.G. (eds.). Amsterdam. Exccrpta
Medica, 1967. P. 332.
232. Hamilton. M. Evaluation of psychotropic drugs (3) sedatives. In: Principles and practice of
clinical trials. Harris, E.L.. Fitzgerald. J.D., (eds.). Edinburgh and London. E. and S. Living­
ston, 1970. Pp. 217-225.
233. Eysenck. H.J. Manual of the Maudsley. pci
. 'rsonality inventory. London: University of London
Press, 1959.
234. Taylor. J. A. A personality scale of manifest anxiety: J. Abnorm. Soc. Paychol. 48:285, 1953.
235. Overall, J.E., and Gorham, G.R. The brief psychiatric rating scale. Psychol. Rep. 10:799,
1962.
/

.
236. Kellner, R., and Sheffield, B.F. The use of self-rating in a single-patient multiple cross-over
trial. Br.J. Psychiat. 114:193-196, 1968.
237. Hamilton, M. The assessment of anxiety states by rating. Br. J Med. Psychol. 32:50. 1959.
238. Shepherd. M. Evaluation of psychotropic drugs. In: Principles and practice of clinical trials.
Harris. E.L.. and Fitzgerald. J.D. (eds.). Edinburgh and London. E. and S. Livingston. 1970
Pp. 208-216.
239. Bulpitt. C.J. Quality of life in hypertensive patients. In: Hypertensive cardiovascular disease:
Pathophysiology and treatment, Amery, A. ct al. (eds.). The Hague. Martinus Nijhoff. 1982.
240. Bulpitt, C.J., Dollcry, C.T., and Hofflirand, B.l. The contribution of psychological features
to the symptoms of treated hypertensive patients. Psychol. Med. 77:661-665, 1977.
241. Bulpitt, C.J ., et al. The symptom patterns of treated diabetic patients. /. Chron. Dis. 29 571583, 1976.
242. Fanshcl, S., and Bush, J.W. A health status index and its application to health services
outcomes. Operations Res. 18:1021-1065. 1970.
243. Greenwood, D.T., and Todd, A H. From laboratory to clinical use. In: Cliiiical
trials. JohnClinical triala.
Join
son, F.N., and Johnson, S. (eds.). Oxford, Blackwell Scientific Publications, 1977, Pp. 13-

244. Simon, T.R.M., and Jones, G. Safety of medicines: The control of clinical trials. In: Clinical
trials. Johnson, F.N., and Johnson, S. (eds.). Oxford, Blackwell Scientific Publications
1977. Pp. 1-2.
245. Griffin, J.P., and Long, J.R. New procedure affecting the conduct of clinical trials in the
United Kingdom. Br. Med. J. 283:477-478. 1981,
246. Dollcry, C.T. Clinical trials of new drugs. J. Roy. Coll. Phys. 11:226-233. 1977.
247. Lumbroso, A. The introduction of new drugs. In: Pharmaceuticals and health policy, Blum, R.,
ct al. (eds.). London, Croom Helm, 1981.
248. Grout, J. R. Quoted in Lumbroso, A. The introduction of new drugs. In: Pharinaccuticala and
health policy, Blum, R., et al. (eds.). London, Croom Helm, 1981.
249. Silverman, M., and Lydcckcr, M. The promotion of prescription drugs and other puzzles.
In: Pharmaceuticals and health policy, Blum, R., ct al. (eds.). London. Croom Helm 1981 Pn
78-92.
’ 1’
250. Lionel, W., and Hcrxhcimcr, A. Coherent policii
policies on drugs: Formulation and implementa­
tion. In: Pharmaceuticals and health policy. Blum, R.. et al. (eds.). London, Croom Helm
1981. P. 240.
251. WHO. Tech. Rep. Ser. 425:5, 1969.
252. Bulpitt, C.J. Screening for adverse drug reactions. Br. J. Hosp. Med. 18:329-334. 1977.
253. Bottiger, L.E., and Wcstcrholm, B. Drug-induced blood dyscrasias in Sweden Br Med I
i-aao a.ia imi
-J3:339-343,
1973.

1

*
I

I
1

254. Multiccntrc international study: Reduction in mortality after myocardial infarction with
long-term beta-adrenoceptor blockade. Supplementary Report. Br. Med. J. 2:419-421. 1977.
255. Skcgg. D.C.G.. and Doll, R. T he case for recording events in clinical trials. Br. Med. J
2:1523-1524, 1977.
256. Lewis, J. A. Post marketing surveillance: How many patients? Tips (April):93-94, 1981.
257. Vcrc, D.W. Controlled trials to detect efficacy and toxicity: Training to meet tomorrow's
needs. In: Principles and practice of clinical trials. Harris, E.L., and Fitzgerald, J.D. (eds.).
Edinburgh and London. E. and S. Livingston, 1970. Pp. 242-249.
258. Mann, R D., ct al. The significance of variations in the scrum transaminases in the assess­
ment of two new drugs. In: 'Experimental studies and clinical experience—the assessment of risk.
Proc. Eur. Soc. for the Study oj Drug Toxicity, vol. VI. Amsterdam. Exccrpta Medica Founda­
tion, 1965.
259. The coronary Drug Project Research Group. Clofibrate and niacin in coronary heart disease.
JAMA 231:360-381, 1975.
260. Knattcrud, G.L., ct al. Effects of hypoglycemic agents on vascular complications in patients
with adult-onset diabetes. IV. A preliminary report on phenformin results. J A MA 217:777784, 1971.
261. Oliver, M.F. Scrum chloestcrol—the knave of hearts and the joker. Lancet 2:1090-1095,
1981.
262. The Anturanc Reinfarction Trial Research Group: Sulfinpyrazone in the prevention of sud­
den death after myocardial infarction. N. Engl. J. Med. 302:250-256, 1980.
263. Armitage, P. Trials of antiplatclet drugs: some methodological considerations. Rev.
Epidemiol. Same Publique 27:87^-90, 1979. “
264. Braunwald, E. Treatment of the patient after myocardial infarction. The last decade and the
next. N. Engl. J. Med. 302:290-292. 1980.
265. Kolata, G.B. FDA says no to Anturanc. Science 208:1130— 1132. 1980.
266. Mitchell, J.R.A. Secondary prevention of myocardial infarction. The present state of the
ART. Br. Mid. J. 2:1128-1130, 1980.
267. European Cooperative Study Group for Streptokinase Treatment in Acute Myocardial In­
farction. Streptokinase in acute myocardial infarction. N. Engl. J. Med. 301:797-802. 1979.
268. Leading Article. Fibrinolytic therapy in myocardial infarction. Br. Med. J. 2:1017-1018.
1979.
269. Elwood, P.C. Aspirin, dipyridamole and secondary prevention. In: The clinical impact oj beta­
adrenoceptor blockade, Burley, D.M., and Bird wood, G.F.B. (eds). Horsham, England, Ciba
Laboratories, 1980. Pp. 25-26.
270. Elwood, P.C., ct al. A randomised controlled trial of acetyl salicylic acid in the secondary
prevention of mortality from myocardial infarction. Br. Med. J. 1:436-440, 1974.
271. Coronary Drug Project Research Group: Aspirin in coronary disease. J. Chron. Dis. 29:625642, 1976.
272. Ubcrla, K. In: Acetylsalicylic acid in cerebral ischaemia and coroikiry heart disease. II7. Colfarit
Symposium Berlin 1977. Stuttgart and New York, Schattauer Verlag. 1978. P. 157.
273. Elwood, P.C., and Swcctnam, P.M. Aspirin and secondary mortality after myocardial
infarction. Lancet 2:1313-1315, 1979.
274. Persantinc and aspirin in coronary heart disease. The Pcrsantinc-Aspirin Reinfarction Study
Research Group. Circulation 62:449-461, 1980.
275. Aspirin Myocardial Infarction Study Group: A randomized, controlled trial of aspirin in
persons recovered from myocardial infarction. JAMA 243:661-669, 1980.
276. Sixty Plus Reinfarction Study Research Group: A double-blind trial to assess long-term oral
anticoagulant therapy in elderly patients after myocardial infarction. Lancet 2:989-993, 1980.
277. Vcdin, J. A. Analysis presented at the Vlllth European Congress of Cardiology. Paris. 1980.
278. Reynolds, J.L., and Whitlock, R.M. Effects of beta-adrenergic receptor blocker in myocar­
dial infarction for one year from onset. Br. Heart J. 34:252-259, 1972.
279. Ahlmark, G., and Sactrc, H. Long-term treatment with 3-blocker after myocardial infarc­
tion. Eur. J. Clin. Pharmacol. 10:77-83, 1976.
280. Wilhehnsson, C., ct al. Reduction of sudden deaths after myocardial infarction by treatment
with alprcnolol. Preliminary results. Lancet 2:1157-1160, 1974.
281. Andersen, M.P., ct al. Effect of alprcnolol on mortality among patients with definite or
suspected acute myocardial infarction. Preliminary results. Lancet 1:865-868. 1979.

r
282. Hampton, J. R Evidence on use of beta-blockers, hi: Cum-n, ,l,emes in rurA(1/w, Bird wood
CkF.B.. and Russel. J.G. (eds.). Horsham, England, Geigy Pharmaceuticals, 1981. Pp. 86-

1
!
=

283.

2074.S'' heart a"aCk S‘Udy gr°Up- ThC bC‘a bl°ckcr heart trial-

284' ret^XS

PrOSram' A Sta‘15tiria" l0°ks

INDEX
mortality

8roi,p d,abctcs progra"’ <ugijp>

285-

287' Cornfiea|dh|l

me" ^P01-''° ,USV- D’"X Trade New. 23 August 1971.

..... ..... .



IC

Mr.—-. . .

■J h ^dARrC’
Pp. 494 505
’ J'

’ DCS*gn a*’d conduct °r clinical trials. In: The hypcrten^c
D W- (cds )' Tu'’bridgc Wells, Pitman Medical, 1980.

29'' m^^^/e71^P^:^’^SSnOUS thro'"b-"b°luS
292. WbS-UM, I|39O79CtroSpeCtiVC COn,rOls ",akc clll,i"1

-P'-

'i-herently fallacious? Br. Med. J.

294 Wool?8Rrnle' R.3nllomiscd “’"‘'•oiled trials’ Br. Med. I. 4:1244-1245. 1979.
19 251-253 USS.'""""8
rclatio,,5l,iP bctwcc" bl°od B™up disease. Ann. Hun,. Gene!.

Il

295- So^k:al7S’iS^:;s^^a',a,ys's °f data from —p—

296' 2M607-60B9. i'wS5511""' 8ll'COSC 3"d i,,suIi"

"f myocardial infarct,on. Lwee,

Jhc 5PCcd
Inti. J. Ncuropsychiat. 3:140-141, 1961.
298.
trials,
reviews.' a"d
298' Maxwell, C. Clinical tr
'alS' reViCWS

"fdesipranrine: a controlled tr.al,

297'

I

of negative results. Br.J. Clin. PI,.,,.

i

299. Rogers, S.C., and Clay, P.M. A statistical
' nhrpk
/k”
y' r I •
stahstical rcview of controlled trials of imipramine and
3M F Vb° m aC treatmcnt ofdcPrcssive illness. Br. J. Psychiat. 127:599-603 197^
30? l eadke'
ArIZ,ie”n,ttc,Pri’fl,ng: Strafbare Wrsuchsmethoden. Hcidelberg/Karlsruhe 1977
301. Leadmg Art.ck Controlled trials: Planned deception? Lancet 1:534-535 81979

302. Gokdzieher J.W. et al. A double-blind cross-over investigation of the siie effects attributed
to oral contraceptives, bcrtil. Steril. 22:609-623 1971
annoutca
303' B^J.
7978. '”Cnta"y
’UiW!,S W''° ™"’Ot
“•

M5 Sane
I!‘’S’ f ",cdical
C°"
^nd. 1.157-65, 1979.
Gochrane A.L. Attitudes to controlled trials. Paper presented at the 9th International
Scientific Meeting of the IEA. Edinburgh, August 1981.
!

1

i

i

Abraham HC, 73
accepted treatment, 2
accuracy of a measurement, 81-83
Adams R, 23
adverse drug reactions
detection, 26-27, 214-220 (see detection of
an ADR)
recompensation, 13
terminating a trial, 26
Aenishanslin W, 125
Ahimark G, 229
Allessandro GRD, 57
allocation of treatment
according to date, 54
hospital number, 54
name, 54
preceding results, 54
wishes of patient, 48, 54
free of bias, 45-46
random (see randomisation)
Amberson JB, 10
American Heart Association Anticoagulants
Trial, 45-46
Amery A, 23. 34, 162-163, 166, 176, 196,
220
analysis
amalgamating trial results, 192-193
experimental unit, 180-181
errors, 180-185

for effective randomisation, 185
intention-to-treat, 191-192
lumping and splitting, 191-192
meaning ofP. 183-184
nonparametric, 186
no statistical test, 181
per-protocol, 192
proportional data, 187
quantitative data, 185
survival data 187-189
wrong statistical test, 181-182
Anastasi A, 201
Anderson J, 201
Andersen MP, 229
angina pectoris, 21
ANOVA, 197
Anscombc FJ, 132
anthrax, 10
anticoagulants, 2, 31-32, 45-46, 78, 218, 228
antidepressant drugs, 63-64
antihypertensive treatment, 22-26, 30, 36-37.
74-75, 76-78, 102-104, 112-113, 121,
160-162, 165-166, 205, 236
Anturane Reinfarction Trial, 26, 31—32, 64,
221-225
Appel FA, 171
Armitage P. 5, 120, 128-131, 179. 183, 185187, 223-224
aspirin, 31-32, 226-227

256

Index

Aspirin Myocardial Infarction Study, 40-42
226-227
Australian Therapeutic Trial in mild hyperten­
sion, 77
Baber NS, 104, 190, 230, 235
Bacon Francis, 61
Ballintine, 61
Bartlett E, 9
Bearman JE, 63, 136, 139, 158-159
Beck DF, 45
Beecher HK, 21, 23, 57, 73
Belson W, 200
Benedict GW, 40-41
Bennett AE, 148, 198, 200
Bergman AB, 171
Bernard C, 19
beta-adrenoceptor blocking drugs, 228-230
beta blocker heart attack trial. 230
bias, due to
analysis, 64
faulty methods. 66
inadequate control group, 56-59
noncompliance, 59-61
observer, 61-64
patient, 61
placebo effect {see placebo effect)
regression to the mean, 58
time trends, 58
uneven allocation, 56
withdrawal, 64-66
biological importance, 29-30
Black D, 243
Blondlot, 61-62
blood pressure measurement, 62-63, 82, 8790, 94-95
Bottinger LE, 215
Bradford Hill A, vii, 1, 6, 11, 15-16, 22, 56,
74, 76-77. 191
Braunwald E. 224
Breslow NE, 188, 190
Brown BW. 112
Bull JP, 5-6, 9-10
Bulpitt CJ, 30, 57, 85, 126, 146, 166, 170, 180,
198-199, 202, 205-206, 215, 236, 243
Bush J W, 206
Byar DP, 59, 134-135

Cannel CF, 201
carcinoma, bronchus, 2
CarneS, 85. 146, 198-199, 202, 205
Caron HS, 171
carry-over effect, 119, 121, 126-128
case-saving rule, 115
Chalmers J, 126

Chalmers TC, 134
checking

257

diagnoses, 91
documents, 90-93
measurements, 91-92, 186
outliers, 186
range, 93, 186
chi-squared test. 187-189
cholesterol measurements, 81
Christie D. 59, 240
Claimed Investigational Exemption, 209
Clark CJ, 104, 115-116
Clark HH, 151
Clay PM. 241
Clifton P. 170
clinical trial certificate, 209
elofibrate, 31-32, 217-218, 240
Cobb LA, 23
Cochrane AL, 2, 3, 243
Cochran WG, 115, 127-128, 133, 179, 183186
coding of data, 152-156
coefficient of variation, 81
Collen MF, 199
conduct of the trial
nonadherence of investigators, 173—
174
noncompliancc of subjects, 170-173
pilot trial, 157-159
run-in period, 159-164
terminating, 175-178
withdrawals, 165-170
confidence limits
calculation, 189-191
rather than power, 104
confounding variables, 46-47
congugatcd oestrogens, 26
Conolly ME, 196
consent
avoidance of necessity for, 20
full information, 17-20, 75-76
written. 17, 20
control group
concurrent, 2. 7, 9-10
historical, 10-11, 58-59
similar to treatment group, 44-55
Cooper GR, 81
Co-operative Randomised Controlled Trial,
166-170
coordinating centre, 137
Cornfield J, 232
Coronary Drug Project Trial, 26, 104, 114115, 217-218, 227
Cox DR. 190
Granberg L, 240
Crockett J E. 23
Croke G, 42
Cron bach LJ. 201
cross-over trial. 44

I

1

II
I

effect of order. 121-123
effect of treatment-period interaction, 121
in psychiatry, 204
reduction in numbers required, 113
reduction in variability, 94
to detect an interaction, 125-126
when to use, 121 — 123
CroutJR, 212
Crowley FA, 90
Cruickshank JM, 236

data transformation, 186
DayJL, 201
decision-making trial, 108-109
decision rules for stopping a trial, 25-26
Declaration of Helsinki, 14
definition of a randomised trial, vii
designs for trials
cross-over, 121-123
factorial, 123-126
graeco-latin square, 128
latin square, 126-128
play-the-winner, 134-135
sequential, 128-134
standard 119-120
design of documents, 148-150
duplication, 150
instructions to subjects, 149-150
layout of forms, 148-149
no-carbon-rcquired paper, 150
Desu MM, 108
detection of an ADR
according to frequency, 214-216
in a large trial, 217-219
in a small trial, 217
methods, 219-220
when previously recognised, 214
Dcvloo RA, 57
dextrothyroxine, 26
diabetes mellitus, 3, 26, 230-233
dietary advice, 66-69, 172-173
different trial designs, 118-135
digit preference, 89
Dimond EG, 23
diphtheria, 10
Doll R, 3, 59-60, 215. 219. 241
Dollery CT. 58. 77, 85, 93, 146, 198-199,
202. 205. 210
Dore CF, 89
double-blinding technique,
alternatives, 69-71
breakdown in. 72-74
definition, 61
disadvantages, 71-72
laboratory results, 70
method, 67-68

when difficult, 68-69
when undesirable, 68
double-dummy technique, 71-72
Douglas-Jones AP, 236
DownieCC, 104, 115-116
dropout {see withdrawal)
drug interactions, 17
Duncan J A, 200
Dunnctt CW, 115

early trials on new drugs, 17-19, 209-213
phase I, 11 and III trials, 210-211
Edcrcr F, 57, 61, 64, 87, 190
Elwood PC, 31-32, 226-227
end point
continuously distributed variable, 30
ease of measurement, 29
frequent, 31
identification, 28-29
infrequent, 31
qualitative, 30-31
error
alpha and beta, 99-112
analytical, 180-185
clerical, 93
diagnostic, 91
gamma, 107
in entry criteria, 91
in withdrawal of patients, 92-93
measurement, 91-92
one-tailed test, 183
types I and II. 99—112
type HI, 107-109
EtangJC L’, 5
ethics. 12-27
declarations, 13-17
definition, 12
legal considerations, 13
nontherapcutic research. 14-15, 17
research combined with care. 14
research ethical committees, 16-17
sequential trials, 131
encephalitis, 58
European Working Party on Hypertension in
the Elderly, 23-25, 34, 161-162, 166,
176
Evans W, 57
Exclusion criteria, 23-24
explanatory trial, 61
Eysenck HJ, 204
factorial trials, 123-126
failure to accept results
adverse effects. 237
faulty analysis, 223, 234-235
faulty definitions, 222-223, 235

258

Index

failure to accept results—Continued
faulty performance, 230, 232-234
faulty reporting, 223
preconceived ideas, 226, 230-234
restricted selection, 225 , 235
results not consistent
in subgroups 236
in different trials, 230, 237
too few patients, 230, 235
treatment groups different, 226, 232, 235
treatment too difficult, 226, 228, 237
treatment effect too small, 227-228, 235
vested interests, 224, 237
Fanshel S, 206
false negative rate, 92, 199
false positive rate, 92, 199
Feinstein AR, 170, 172-174, 181, 230-234
Ferris FL, 87
FibigerJ, 10
financial considerations, 139
Fincke, M, 242
Fish RC, 23
Fisher, RA, 5, 123, 125
Flamant R, 97, 106, 109-110
Fletcher CM, 63
Foulds GA, 63-64
Free SM, 121
FreimanJA, 179, 184
Gail M, 115
Gallstone Study (National Co-operative), 4243
Gallon F, 86
general practice, trials in, 4
George SL, 108
Gibb WE, 196
Glantz SA, 179, 181, 184
Glaser EM, 12, 18-19, 147, 159
Goldberg ID, 85
Goldsmith CH, 172
Goldzieher JW, 242
Gordis L, 170
GoreSM, 179, 191
Gorham GR, 204
Graeco-Latin square, 128-129, 169
Gravenstein JS, 57
Green Governor DH, 18-19
Greenwood DT, 209, 211
Greiner T, 57
Griffen JP, 210
Grymes TP, 23
Gullen WH, 63, 158-159

t

Haenszel W, 241
Halperin M, 107
Hamilton M, 102-104, 144, 203-204
Hamilton PJS, 35, 59-60

259

Hampton JR, 162-163, 230
Hargreaves MA, 241

Kittle CF, 23
Kolata GB, 224-225
Knatternd GL, 217
KnowledenJ, 67

Harris AD, 125

Harris WH, 236
Havard CWH, 70, 126
Hawthorne effect, 57
Haybittie J, 144, 150-152
Haygarth J, 8
Healy MJR, 186
Heaton-Ward cR'ect, 73
Hefferman A, 196
Hcrxhcimer A, 179, 213
Higgin’s law, 87
Hill C, 135
Hillis BR, 57
Hills M, 120
history of trials, 5-11
Hoffbrand Bl, 170, 205
Hogben L, 180
Holland WW, 90
Hoyle C, 57
Hsi BP, 171
Hunter J, 7
Hypertension Detection and Follow-up Pro­
gram, 37, 39, 117, 192, 236
hypoglycaemic drugs, 2, 26-27

Laplace PS, 9
Lasagne L, 57
latin square
carry-over elTects, 217
order effects, 127
randomisation in, 126-127
treatment effect, 127
withdrawals,’169
LellouchJ, 99, 106, 108-110
LewisJA, 104, 190, 216, 230, 235
life table, 187-189
Lilienfeld AM, 170
Lind J, 6, 45
Lionel NDW, 179
Lionel W, 213
Lipman RS, 170
Lister J, 9-10
Loewcnson RB, 63, 158-159
London Hospital Clinical Trials Unit, 140
Long JR, 210
Louis PCA, 9
Lovell MG, 23
LRC Coronary Prevention Trial, 40-42

I

inclusion criteria, 23
information to be collected, 144-156
demographic, 146
facilitating data entry, 145
identifying the patient, 145-146
primary care physician, 147
quantity, 144-145
to facilitate follow-up, 146
intention-to-treat principle, 20, 46, 61, 173,
191-192, 234
interaction between treatments, 123-126
interim analyses, 25-26
internal mammary artery ligation, 23
International Anticoagulant Review Group,
32. 78, 228
International Pharmaceuticals Substances
Classification, 151
Jellinek EM, 57
Jenner, E, 7
Jesty B, 7
JohnJA, 186
Jones G, 209-210
Jones IG, 179

Kahn HA, 63, 91
Kahn RL, 201
Karten I, 179, 181
Keats AS, 57
Kellner R. 204

Lumbroso A, 211-212

Lydeckcr M, 212
Lynch P, 126

I

I
I

Maccoby EE, 200
Maccoby N, 200
McFate Smith W, 136-137
McMahon BT, 10
McMahon FG, 196
McMichael. 191
McNemar’s test, 187-188
McPherson K, 26, 111-112
Maitland C, 7
Mann RD, 217
Mantel N, 188, 241
marketing trials, 4
Markowitz M, 170
Marple CD, 45
masking, 61
Maxwell C, 241
Medical Research Council, 3, 5, 10-11, 16,
22-23, 73, 77, 162, 171, 203
Meier P, 39, 121, 134, 177-178, 224
Mcllner C, 200
methods of recruitment
medical records, 40-41
own patients, 40
screening, 41-42

via colleagues, 40
mcthyldopa, 4, 195-196
minor objectives
in long-term trials, 33-34
limiting factors, 34
Mitchell GAG, 10
Mitchell JR A. 225, 228
Mittra B, 241
monitor, 93-94, 174
Mosteller FC, 57
Muench's laws, 43, 63, 158-159
Murphy’s law, 157
Mushlin Al, 171
myocardial infarction, 2, 113-114, 176-177
Nam JM, 115

National Diet Heart Study. 68, 73, 104, 108,
113-115
Nebuchadnezzar’s trial, 5
new treatment, 2, 17-19
Nie NN. 186
Nicholls DP, 146, 217
nonadherence to protocol by investigator.
174-175
noncompliance
assessment by interview, 170
assessment by pill count, 171
assessment by physician, 171
assessment by self-administered question­
naire, 170
assessment from blood and urine, 171
consequences, 172-173
detection, 162, 170-171
producing bias, 59-61. 172-173
Norwegian Multicentre Study Group. 163164, 177
number of subjects required, 96-117
aids to calculating, 115-116
explanatory trial, 106-108
pragmatic trial, 108-110
relationship to asymmetric «, 110-111
drop out, 114
error, 109-112
power, 109-110
treatment effect, 113-114
variance, 112-113, 116-117
unequal groups. 114-115
with noncompliance, 172
Nurenberg Code, 13
OatesJA, 196

objectives of a trial, 28-34
identification, 28-29, 97-99
importance, 29
likelihood to be achieved, 29
major, 28-33, 96-99

260

1

Index

objectives of a trial—Continued
quantification, 30-31
traditional, 97
observer bias, 89
observer training, 88-89
Oliver MF, 217-219
order effect, 122-123
organization of a trial, 137
Overall J E, 204
Parc A, 5
Park LC, 170
Pasteur, 10
Parten MB, 202
patient as his own control, 44
Pearson RM, 70. 126
penicillin, 10, 21
Perkin’s tractors, 8
Persantine-Aspirin Rcinfarction Study, 227
Peto R, 3, 43, 53. 58-60, 114-115. 179-180
185. 187-188. 193, 241
Petrie A. 179. 183, 185-186
phase I, H & III trials, 209-210
phenformin, 26
physiotherapy, 2
pilot trial,
advantages, 158-159
disadvantages, 159
Pinner M, 10
placebo effect, 22, 57-58
in hypertensive patients, 74-75
placebo run-in period, 76
placebos
advantages of, 74-75
disadvantages of, 77-78
doctor-patient relationship. 77
ethical considerations, 20-22, 77-78
in hypertension, 22, 65
in nineteenth century, 8
in pain relief, 21, 57-58
medico-legal problems, 77-78
number of trials required, 78
use of, 19, 64-65. 74-78
play-the-winner trial, 54, 134-135
Pocock SJ, 10, 240
Pollock MR, 125
Porrit AE, 10
postmarketing trials, 212-213
postoperative pain, 21, 57
power (see error, type 11)
Pozdena RF, 62
precision of a measurement, 81-83
Prichard BNC, 196
primary prevention trial, 31
protocol
adherence to, 89-90

261

deviations from. 87-88
writing, 136-143
Prout TE, 43
Pryce IG. 242
psychiatry
diagnosis, 203
informed consent, 203
measurement of response, 203
self-rating scales, 203
new drugs, 203
psychotherapy, 2

restricted, 47, 49-51
stratified, 48, 53
tests for, 185
timing of, 49

unequal, 53-54, 114-115
variable block, 51-52
recruitment of subjects, 39-43
methods (sec methods of recruitment)
period of recruitment, 40
related factors, 42-43
selection criteria, 39-40
reduction in variance
by measuring change, 112-113
by replication, 112-113
in cross-over trial, 113
regression to the mean, 58, 86
regulatory approval
for early drug trials, 209-210
for general release, 211-212
repeatability indices, 83-86
repeatability of a measurement, 80-87
and bias, 86
and regression to the mean, 86-87
and validity, 87
qualitative data, 83-86
quantitative data, 80-83
symptom reports, 85
repeated looks, 25-26, 111-112, 234

quality of life
factors influencing, 205
health status index. 206
measurement. 205-207
questionnaire
interviewer administered, 201-202
self-administered.
advantages, 202
back translation, 202
validity, 92
questionnaire design
characteristics of a good question. 150-151
characteristics of a good response. 151-152
coding response. 152-154
order of yes/no response, 152
responses in boxes. 152
questions
dear, 198
grammatically correct. 200

high response rate, 200
only one enquiry, 200
open or closed 200-201
repeatable, 198
valid, 198-199

quality control,
a clear protocol. 87-88
and double-blindness, 174
checking of documents, 90-93
entry criteria. 91, 174

training of observers. 88-90
trial monitor. 93-94. 174
use of different observers, 90

random number tables, 49-50
randomisation, 2, 10-11, 44-45
and acceptance of trial results. 47
and valid statistical tests, 47
advantages of, 45-47
alternatives to, 54
balanced block. 50
block, 49
definition. 44
disadvantages of, 47-48
ethical problems with 4M

1

I

I

SPSS, 186
square-root rule, 115
StamlerJ, 39
standard deviation and standard error, 184—
185
steering committee, 137
streptokinase, 225-226
streptomycin, 3, 10, 16. 23
subjective well-being, 194-208
sulphinpyrazone. 31-32
Sunday Times, 4, 212-213
Sutton HG, 8
Swcctnam PM. 227
symptoms
recognition, 194-197
spontaneous reporting, 197
synergism, 125
Taylor JA, 204
Temple R. 224-225
terminating a trial
decision rides, 25-26, 175-178
due to an adverse effect. 26
due to benefit, 26
stopping too late. 177
stopping too soon, 176
Thompson FM. JO2—1O4

Todd AH, 209, 211
tolbutamide, 26, 230-233
tonsillectomy, 3
toxicity, detection of, 26
TravellJ, 57
treble blindness, 61
Twenty-one Army Group, 10
two-armed-bandit problem, 135
Uberla K. 227
unequal groups (see randomisation)
University Group Diabetes Project, 26-27.
230-233
vaccination against smallpox, 7-8
validity
and repeatability, 87
definition, 35
extrapolation of results, 36-38
of questions, 198-199
variability of results. 80-95, 105-106 (sec ac­
curacy, precision and repeatability)
variance
qualitative data, 105-106
quantitative data, 105
VedinJA, 228-230

Vere DW. 217
Veterans Administration trials, 22, 26. 30-31.
36-37, 165, 205
Vitamin C, 73
volunteers,
prisoners. 18-19
recruitment. 18
remuneration, 17-18
Wade OL, 13
Wald A, 129
wash-out period, 121
Weinstein MG. 47-48, 53
Werner RJ. 171
Westcrholm B, 215
Whitehead T, 30, 186
Whitlock RM. 229
Wilhclmsson C, 229
Wilkinson GN, 170
Williams EF, 127
Williams. OD. 93
Williams W, 7
Wilson C, 125
Wilson EB, 63
Wisniewski TK. 102-104
withdrawal from a trial, 24-25, 138, 165-169
due to adverse treatment effect, 166
due to death, 165
due to worsening of condition, 166-167
in activity treated group, 65-66
o'* m

262

Index

withdrawal from a trial—Continued
in run-in period, 64-65, 162, 167
methods of analysing, 165, 168-169
methods of reducing number, 167-169
related to end point, 165-166
unrelated to end point, 165
Wood, Doctor, 62
Woolf B, 241
World Medical Association, 14-15
Wortley, Lady M, 7

Wright BM, 89
Wright IS, 45
Wright P, 144, 150-152
WulfTHR, 190

Young DW, 199

ZarJH, 182
Zelen M, 20, 44, 47, 50, 134

Media
14148.pdf

Position: 2813 (3 views)