11465.pdf

Media

extracted text: f

CRHP

Distribution : limited

Paris, September 1978
Original : English

UNITED NATIONS EDUCATIONAL
SCIENTIFIC AND CULTURAL ORGANIZATION

I

TRAINING SEMINARS ON STATISTICAL METHODS
FOR PROJECTING SCHOOL ENROLMENT

Basic Background Material

Book 1
I

Basic Mathematics

Unit II

Basic Statistics

Unit

(Provisional version)
4

I
i

Education Projections Unit
Division o£ Statistics on Education
Office of Statistics
I

TRAINING SEMINARS ON STATISTICAL HETHODS

FOR PROJECTING SCHOOL ENROLMENT

Basic Background Material *’

Book 1

I

Basic Mathematics
by Robin Shannon

Unit II

Basic Statistics
by Robin Shannon

Unit

(Provisional version)

1) ThXA wosdi
be.e.n p/LCpa/ieci iact/ix.n t/ie
tho. ptiOjZcX • Na^conoL
Tmawjlq
on
Methods uiuth. Spe^aZ R^eAe.nee to P^ojcc^uig
Sehoot EnA.obnent (INT/76/P22), betng eaMted out by the Unesco O^tee o^
Stottsttes, Posits, wtth fitnanctot support ^om the Untted Nottons Fund ^oa.
Poputatton \cttvtttes (UNFPA).

PREFACE

One of the major problems which Unesco statisticians have faced in
conducting national training seminars on statistical methods for projecting
school enrolment (nine of which have been organized in developing countries
over the last three years), has been the incomplete knowledge of many parti
cipants as regards basic concepts in subjects that are pre-requisites for
drawing full benefit from the seminar’s programme. This has been mainly the
case in areas such as mathematics, statistics, demography and data collection
methodology.
It should be recognized that it is often difficult, and expensive, to
find suitable textbooks covering the minimum ground in the above-mentioned
areas, and which also make particular reference to problems in education and
population. As it has been indicated, such reference materials are indispens
able for a sound approach to quantification in education and population.

For these reasons, the Unesco Office of Statistics has undertaken to
make available a set of relatively short, clear and practically oriented
background units with the major objective of giving participants the possibi
lity of revising their knowledge on selected items before as well as after
the seminars, thus enabling them to maximize their involvement in this training
programme.
The present volume (Book 1) contains two units, i.e. Unit I Basic
Mathematics, and Unit II Basic Statistics. It is issued in a provisional version
with a view to incorporate possible improvements in the final version to be
issued at a later stage. Readers may kindly address any comments to the Division
of Statistics on Education, Office of Statistics, Unesco, 75700 Paris (France).
This background material is therefore a general complement to serve as
introduction to the basic paper which is normally prepared for each seminar,
demonstrating how to analyze and project available population and education
data for the country concerned by means of simple statistical techniques. As
such, each Unit is conceived as a self-contained document. Attention should be
drawn to the fact that the various subjects treated in this background series
are directed towards the officials involved in practical work, and therefore
the theoretical foundations (already extensively covered in authoritative
textbooks) of the areas are in no case given priority.
Similar units are under preparation covering such topics as Basic
Demography, Educational Statistics, Education Projection methods and Statistics
on educational finance.

We hope in this way to facilitate the work of educational statisticians
and planners in developing countries.

Division of Statistics on Education
Unesco Office of Statistics
Paris, September 1978

A

UNIT I - BASIC MATHEMATICS

by Robin Shannon, Lecturer
Department of Economics,

University of Newcastle-Upon-Tyne

(United Kingdom)

CONTENTS

iii

Introduction
SECTION

SOME FUNDAMENTAL CONCEPTS AND
OPERATIONS

1

Symbols

1

3

1.3

Brackets
Positive and negative numbers

1.4

Factors

6

1.5

Powers of numbers

7

1.6

Simple equations

9

2

FIRST STEPS IN DATA ANALYSIS

2.1

Absolute numbers

12
12

2.2
2.3

Rates and ratios
Significant figures and rounding

13

2.4

Proportions

18

2.5

Percentages

19

2.6

Changes in variables over time

19

2.7

Rates of growth over time

3

LOGARITHMS

25

3.1

The idea of logarithms

25

3.2
3.3

Tables of logarithms
Tables of antilogarithms

27

3.4

Use of logarithms in multiplication
and division

29

Use of logarithms in finding the powers
and roots of numbers

32

4

THE AVERAGE ANNUAL GROWTH RATE

35

4.1

Calculation (continued)

35

4.2

Use of average annual growth rates
Time taken for a variable to increase by
a given magnitude or proportion

35

1

1.1
1.2

I

SECTION

SECTION

3.5

SECTION

4.3

4

15

29

36

ii

Page

SECTION

SECTION

5
5.1

MATHEMATICAL FUNCTIONS

37

Introduction

37

5.2

Linear functions

40

5.3

Non-linear functions

42

6

GRAPHS

42

6.1

Rectangular co-ordinates
Plotting the graph of a linear
function

42

6.3

Interpretation of coefficients

47

6.4

Plotting the graph of a quadratic
function

50

6.5

Plotting graphs of observed data

51

6.6

Ten practical hints on drawing
graphs

56

6.2

43

EXERCISES AND ANSWERS

Exercise 1 (Suggested to reader on p. 9)

59-60,

73-4

61,
62,

75

Exercise 2 (

tt

it

p.12)

Exercise 3 (

it

it

p.18)

Exercise 4 (

it

it

63,

77

Exercise 5 (

it

it

p.19)
p.21)

64,

78

Exercise 6 (

it

it

p.32)

65,

79-80

Exercise 7 (

it

it

p.34)

66,

81-2

Exercise 8 (
Exercise 9 (

it

it

67,

83

it

it

p.36)
p.37)

68,

84

Exercise 10 (

it

it

p.42)

69,

85

Exercise 11 (

it

it

p.48)

70,

86-7

Exercise 12 (

it

it

it

it

71,
72,

88-9

Exercise 13 (

p.51)
p.57)

76

90-2

ANNEXE

Table of^common logarithms

93

Table o t^common antilogarithms

94

- iii

Introduction

Many people find mathematics a forbidding - even frightening subject.

to abound.

Strange and incomprehensible words and notation seem
Numbers seem sometimes to appear from nowhere - only

to disappear again for no apparent reason.

Complicated relationships

may be presented which seem designed to mystify the reader rather
than to enlighten him. In the face of this the puzzled reader

may decide simply to abandon his efforts to understand.

This is

a sad irony, since the essential reason for using mathematics is

to further understanding.
The purpose of this chapter is to demonstrate to the reader

that the techniques of basic mathematics presented here, assuming
careful study of the text and diligent completion of the exercises,
are well within his or her understanding.

However, it must

immediately be stressed that this chapter should not be regarded
as a complete substitute for a general introductory mathematical
textbook.

It is rather a presentation of the basic mathematical

concepts and techniques which an educational planner or administrator indeed anyone concerned with the development and monitoring of
educational systems - will almost certainly encounter in his

or her professional work.

This means that a number of mathematical

methods covered bya^any general textbooks will not be found here.

For example, a general introductory textbook would almost certainly
introduce the reader to the solution of quadratic and simultaneous
equations, the ideas behind irrational and complex numbers, linear

iv -

algebra, concepts of limits and continuity, the calculus, and other
subjects.

Certainly, some of these more advanced mathematical techniques

are utilised in relatively sophisticated analyses of educational
data.

Use of all, however, assumes a mastery of more fundamental

methods, manyof which are presented in this text.
*

Thus, this is a presentation designed essentially for

practical professional men and women in the field of educational

planning and administration.

It is not designed to make the

reader a trained mathematician!

It is assumed that the only previous training in mathematics
which the reader will bring to this text is a knowledge of the
basic operations of addition, subtraction, multiplication and

division.

A number of readers, no doubt, will find sections

of this chapter already familiar.

Such readers may safely omit

these sections, and spend their time on any unfamiliar, or only
half-understood, concepts and methods.
urged:

do the exercises!

a proper understanding.

But all readers are

Practice is essential in developing
There is no substitute for it.

The nature of mathematics

A considerable part of educational planning, monitoring
and control is concerned with various types of relationships

between quantities.

Some relationships may be purely definitional:

for example, a "pupil-teacher ratio” is a number stemming directly
from its definition.

Other relationships may be behavioural:

example, the number of students applying to follow a particular

course of higher education will depend, in ways which may be

for

- V -

specified, on the behaviour of individuals with respect to various
factors.

for

And some relationships may be purely technical:

example, the maximum number of schools that could possibly be

constructed in a given time period will depend on the limited
resources of labour, capital and finance available.
The role of mathematics is essentially to analyse the
These

structure and logical consequences of such relationships.
4

relationships may be considered singly, or they may need to be

considered together.

For instance, in attempting to project

the number of pupils who will be enrolled at a particular level

of education at some time in the future. a considerable number

of relationships between quantities will have to be considered

simultaneously.

For example, how many children of a particular

age—group do we expect there to be at a specified time?

Of

these, how many do we expect to have entered the system?

How will

these children be distributed across grades?

What number of

pupils will be promoted from grade to grade, how many will repeat
their grades, and how many will leave the school system entirely

(dropout or graduate)?

Mathematics can help us to uncover the logical consequences
of such relationships.

If our initial observations, or assumptions.

are wrong, mathematics alone cannot help us.

a tool of analysis;
interpretation.

For it is above all

an indispensible tool in planning and

It cannot, any more than any tool, conceptual

or physical, tell us what we should do.

discover what we have done;

But it does help us to

and what we can do.

1.

!•

SOME FUNDAMENTAL CONCEPTS AND OPERATIONS

1.1

Symbols

1.1.1

In many instances in this text we shall use symbols rather

than definite numbers.

The use of symbols will allow us to deal

with general expressions and general results.
be utilising the basic concepts of algebra.

We shall, in fact,

Algebra is best

understood as a generalisation and extension of arithmetic.

As

pointed out in the Introduction, it is assumed that the reader

is familiar with the fundamental operations of arithmetic.
These are the operations of addition, subtraction, multiplication

and division.

All these operations are used in algebra in

essentially the same manner as in arithmetic.

However, in

algebra, as new processes are developed, new symbols are
introduced to help the operations.

1.1.2

To clarify the use of symbols, we consider the following

extremely simple example.
f-

Assume that in a certain primary

school there were 559 pupils distributed across 13 classes.

What is the average class size?
i

In this particular school, it

is clearly:
559
43 pupils
13
In order to generalise this simple example of the calculation
of average class size to any school, let us assign symbols

to the three different numbers.
Let the letter ’’P11 represent ’’the total number of pupils it
Let the letter ”C” represent ’’the total number of classes”
Let the letter ”A” represent ’’the average class size”

2«

Thus we may write:
A

=

P

—

(1)

This is our general formula for defining average class size.
In our particular example above,

A

43

P

= 559

C

13

The general, algebraic formulation(1) applies to any school.

Average class size A may always be calculated, given values for
P and C.

1.1.3

The reader may verify for himself or herself that the

formula(l) obeys the four above-mentioned elementary operations

of arithmetic.

For example, let us multiply the formula by 2

on both sides of the equality sign:
2A

2

2A

2P
C

P
C
(2)

Notice that the usual multiplication sign, ’’x”, is often

conveniently omitted in algebra.

Sometimes it is replaced

, but in this text the convention is adopted

by a dot.

that the absence of any sign between two adjacent symbols
implies that they are multiplied together.

1.1.4

Continuing with our example, note that the term 2A

above denotes a multiple of A and the number 2.

This
♦

number 2 is known as the coefficient of A.

A coefficient

may be a definite number, as here, or may itself be written
as a letter representing the number.

bA

bP
C

Thus we may write:

4

3.

where b, the coefficient, equals 2 in our particular example.
1.2

Brackets

1.2.1

It is often the case that an algebraic expression, or

part of an expression, has some operation (e.g. multiplication)

to be performed on it as a whole.

For example, we might wish

to write, in algebraic symbols, ’’Three times the sum of w and y”,
w and y symbolising particular variables.
yjote that a variable is a symbol which can take on any of a
set of prescribed values.

1.2.2

If we wrote the following expression:

3

x

w

(3)

y

it would not be clear whether 3 should multiply w alone, or both
Thus we use ’’brackets” to enclose the part which is

w and y.

to be operated on as a whole.

We write:
(4)

3(w + y)

Brackets tell us of the order in which operations are to be
performed.

coo./J

fa#

For example:
2(x + y) - z

means that first we calculate the sum of x and y, then multiply

by 2, before finally subtracting z from the total.

1.2.3

Brackets are essentially a convenient way of grouping

symbols or numbers in order to perform operations upon them.

In transforming expressions with brackets into expressions
without, certain rules must be observed carefully.

removing the brackets from the following expression:

x(y * z)

(5)

Thus, in

4.

each term within the brackets must be multiplied by x:
x(y + z) = xy + xz

(6)

Similarly,

(w + x) (y * z)

(7)

is equivalent to:
w(y + z) + x(y + z)

= wy + wz + xy + xz

(8)

Similarly, when division is performed, each term within the
bracket must be divided by the number outside the bracket, thus:

(y + z) = X + £
X

(9)

XX

Note that the division of (y ♦ z) by x is precisely the same as
the multiplication of (y + z) by
1.2.4

.

In performing the operations of addition and subtraction

in conjunction with the removal of brackets, two important rules
must be observed.

Firstly, when a positive sign goes in front

of the brackets, the signs of the terms within the brackets
remain the same.

Secondly, when a negative sign goes in front of

the brackets, the signs of the terms within the brackets change.

Thus, for example

x + (y + z) = x + y + z
x ♦ (y - z) = x + y - z
x -(y+z)=x-y-z
X

1.3
1.3.1

-(y-z)=x-y+z

Positive and negative numbers
Corresponding to every positive number (signed +) there

is a negative number (signed -).

In effect, a negative number

is a number which in its meaning and effect is opposite to a

9

5.

positive number.

All that we need to know for our purposes

are the fundamental rules for the operations of addition.
subtraction, multiplication and division.

1.3.2

In the addition and subtraction of positive and negative

numbers, where x represents any number,

(+ x) = 4x
«• (- x) = -x

- (♦ x) = -X
- (- x) = +x

It can be readily remembered that like signs give a positive
result, unlike signs a negative result.

ThuS' as examples of

the above general rules:
(+4) * (+3) =

=

(+4) ♦ (-3) = +4 -3 = +1

(^4) - (+3) = +4 -3 = 4-1
(+4) - (-3) = +4 +3 = +7

1.3.3

In the multiplication and division of positive and negative

numbers, where x and y represent any pair of numbers, if two
numbers have the same sign the result is a positive number.

the signs differ, the result is a negative number.
multiplication:

(+x) X (+y) = + xy
(+x) x (~y) = - xy
(-x) x (+y) = - xy

(-x) X (-y) = + xy
and in division:
x
y
(+x) ♦ (-y) = - x
y

(+x) 4 (+y) s +

(-x) 4 (+y) = - x
y
(-x) + (-y) = + x
y

Thus in

If

6.

Hence, for example,

(-4) X (-3) = +12
and

(-4) ♦ (+2) = -2
1.4
1.4.1

Factors
We have seen the value of symbols, their grouping into
9

brackets, and the rules which need to be applied in operations

in positive and negative numbers.
important operation in algebra:

algebraic expressions.

We may now consider an

finding the factors for

To understand what is meantby this,

consider again the expression (5) in 1.2.3 above.

It is

the multiplication of two factors, x and (y + z):
x (y + z)

(5)

This equals:
xy + xz

(6)

Now we ask the question:

do we factorise it?
in each term.

if we started with xy + xz, how

To do this, we note that x is a factor

We therefore say that it is a factor of the

whole expression.

To find the other factor we divide each

term by x and add the quotients, y + z.

Hence,

xy + xz = x (y + z)

(10)

We have thus factorised expression (6) into its two factors.
x and (y + z).

1.4.2

Similarly the reader should consider expression (8) in
A

1.2.3 above;

wy ♦ wz + xy + xz

(8)

7.

How might this expression be factorised?

The second two have a

two terms have a common factor, w.

common factor, x«

Note that the first

Dividing the first two terms in (8) by w,
Thus the factorisation of the

we obtain the factor (y + z)>

first two terms gives w (y + z).

Similarly we see that the

factorisation of the second two terms in (8) gives x (y + z).
We now have:

(11)

w (y * z) + x (y + z)
We note that there is a common factor, (y

terms of (11).

z), in both

Taking this common factor out, we obtain the

two factors of expression (8), and get:
(7)

(w + x) (y + z)

1.4.3

Not all expressions can be factorised.

For example:

xy * vw

I

is incapable of factorisation.

Further, other expressions

require more advanced methods for their factorisation than
are necessary for our purposes.

1.5
1.5.1

Powers of numbers
The product of equal numbers is called a power.

Hence:

(2 x 2) is called the second power of 2, or the square of 2
(2x2x2) is called the third power of 2, or the cube of
2; and so on.
Genexolising, where y is any number,

(y x y) is the second power of y, or the square of y, and
may be written y

2

(y x y x y) is the third power of y, or the cube of y,

and may be written y

3

8.

if y is multiplied by itself n times, where n is any number,
we obtain the

power
power of
of y,
y, which
which may
may be
be written
written y
y11.•

The symbol n is known as the index or exponent#

1.5.2

In multiplying two powers of a number, the rule to follow

is that the index of the product is the sum of the indices.
Thus, for example:

y

2

x y

3

= y

2+3

= y

*

5

This may be seen because:

(y x y) x (yxyxy) = (yxyxyxyxy)
= y

1.5.3

5

In dividing two powers of a number, the rule to follow is

to subtract the index of the divisor
dividend.

from the index of the

Thus:

y

5

+ y

3

= y

5-3

= y

2

This may be seen because:
(y x y x y x y x y)
(yxyxy)
= (y x y)\ = y2

Note that, for example,
y

3

-2
5
3-5
= y
* y = y

a,

where - 2 is^negative index.

This is simply the reciprocal

of the positive power of the number;
-2

1

or generally,

y

-n = —
1
n
y

thus

9.

The square of any number is positive, whether the number

1.5.4

is positive or negative.

If the operation is reversed, and the

square root of a number is required, it follows that the square
root may be either positive or negative

To understand this, note that '
(♦ y) x

U y) = y2

(- y) x (- y) = y2

and

2
If we require the square root of y‘ we use the

sign

"plus

or minus", thus:
r 2
y

= 1y

Sometimes rather than the square root sign,

, the index

Thus:

| is used.

X

X

In general, the notation for the n

th

root of a number is written:

nr“
Vx

or equivalently:

2
x

n

Exercise 1

The reader should now do Exercise 1, on pageS^J

1.6

Simple equations
p

1.6.1

In 1.1.2 above we introduced a simple formula(l), A =

This formula is in fact an equation:
of the left hand and right hand sides.

a statement of equality
The concept of an equation

10.

is a central one in mathematics, and it is important that the

reader should be fully familiar with the various operations
which may be made on an equation in order to solve it.

By

’’solving” an equation, or finding the "solution”, is meant

the process of finding the value of an unknown which satisfies
the equation (maintains the equality of both sides).
r

1.6.2

Let us take a simple example.

Suppose you were informed

that four times the salary of a newly-qualified teacher was

paid to a headmaster, whose salary was #500 per month.

What

is the newly qualified teacher’s salary?

Let the^salary (the unknown number) be symbolised as S.

We

can now formulate a simple equation:

4S

500

(12)

The solution of this equation requires simply that we divide
both sides of the equation by the coefficient of S, and obtain:
Q
500
125
(13)
s

=

“

The solution to the equation is that the newly-qualified
teacher’s salary is #125 per month.
1.6.3

Generally equations are not so very simple as this onel

Equations may consist of complicated expressions on both sides

of the equality sign.

However, correct use of various

operations will enable us to find the value of the unknown
symbol.

The reader should learn how to apply two basic rules

in the manipulation of equations:
i) if the same number is either added to, or subtracted
from, both sides of an equation, the two sides remain

equal

lie

ii) if both sides of an equation are multiplied or

divided by the same number, the two sides of the

new equation will remain equal.

If the multiplier or divisor

is negative, both sides change signs.

1.6.4

Let us use these simple rules to solve an equation.

Assume we wish to solve the following equation for the unknown.

x:

i

3x + 7 = 5x

5

(14)

The basic method used is to collect terms involving the unknown
on the left hand side, and other terms on the right.

Thus by

using the first rule in 1.6.3 above, we may subtract 5x from
both sides and also subtract 7 from both sides of (14), obtaining:

3x - 5x = -5 -7

-2x = -12

(15)

Now dividing each side of (15) by -2, we obtain the solution

to equation (14):

x

6

To verify that this is indeed the correct solution, we may

substitute x = 6 into the original equation (14):
3(6) +7 = 5(6) -5
confirming that the value x = 6 does satisfy the equation (14).

1.6.5

We have seen therefore that transferring her Kt?

from one

side of equation (14) to the other changed their signs in the
process (the first rule in 1.6.3).

Division of both sides by

the negative number, -2, changed the signs on both sides of
the equation (15) (the second rule in 1.6.3).

12.

EXERCISE 2

The reader should now do Exercise 2, on page 6 I .

2.

FIRST STEPS IN DATA ANALYSIS

2.1

Absolute numbers

y

2.1.1

The data that emerge from the data-gathering processes

(censuses, surveys, etc.) are very often expressed in actual, or
absolute, numbers.

Simple presentations of the actual absolute

data is of course sufficient for some purposes.

For example,

we may wish to know how many births occurred in a particular

country, or how many pupils entered the school system,
in a particular year.

Many such questions might be envisaged.

A basic method in the preliminary analysis of data is presentation

in the form of a time-series.

A time-series may be defined as

a set of ordered observations on a quantitative characteristic
of an individual or collective phenomenon taken at different

points of time.

Although it is not essential, it is common.

and helps interpretation, for these points to be equidistant

in time.

For example, much educational data ace,published

'0v\ an annual basis.

Example 1 presents an annual time-series

of the absolute numbers of pupils enrolled at the primary

level in Niger over the period 1972 - 1976.

13.

Example 1:

An annual tine-series of absolute numbers

Absolute numbers of pupils enrolled at the primary
level, Niger, 1972-1976

1972

1973

1974

1975

1976

94,500

100,892

110,437

120,984

142,182

2.1.2

It can be seen immediately from the ordered time-series

in Example 1 that primary enrolment was increasing in Niger
over the period 1972-1976.

However, these absolute numbers

alone cannot inform us how other variables (1) relate to^or

In data analysis we often need to

explain, this increase.

consider how a particular absolute number - or set of numbers -

relates to other numbers.

This introduces the general concept

of rates.
2.2

2.2.1

Rates

There is a great variety of different types of rates.

Most readers will have heard, for example, of currency exchange
rates, rates of growth, rates of tax, rates of interest, and

(if involved in making educational projections), rates of
promotion, repetition and dropout.

The reader can doubtless

think of many other examples.

Ol

(1)

For a definition ©invariable, see above, para. L J. I

14.

2.2.2

What do all these apparently very different specific uses

of the word “rate” have in common?
ratio.

The answer is:

the idea of a

A ratio is a quotient which indicates the relative size of

one number to another.
Example 2:

Ratios

Ratio of enrolment at all levels of education to population aged

7-24, Indonesia.

*

1971

1973

1975

1.

Enrolment at all levels

18,411,827

19,869,957

21,872,075

2.

Population aged 7-24

45,530,688

48,832,242

52,282,907

3.

Ratio:

line 1
line 2

0.4044

0.4069

0.4183

2.2.3

Example 2 shows how, over the period 1971-1975, the so-called

“overall enrolment ratio” (1) developed.

had risen to 0.4183 by 1975.

It was 0.4044 in 1971 and

It can be seen that this particular

type of ratio, like all ratios, is obtained by division:
line 1 by line 2.

here, of

Thus in 1971 the ratio of 0.4044 was obtained by

the following quotient:
enrolment at all levels in 1971
population aged 7-24 in 1971
18,411,827
45,530,688

0.4044

'The ratios for 1973 and 1975 were similarly obtained (the reader should
check the answers).

2.2.4

If in fact the reader calculates the ratio 18,411,827 ,
45,530,688

he or she will find that the calculated ratio equals 0.4043827, when

taken to 7 ’’decimal places”.

Note that the number of “decimal places”

refers to the number of figures after the decimal point.

(1)

The figure

Enrolment ratios are usually expressed in /form of percentages,
a concept explained in 3
below.
A

15.

given in this example, 0.4044, has been reduced from seven

to four decimal places:

we say that it has been "rounded11

to four decimal places.

2.3

Significant figures and rounding

2.3.1

In principle mucA of the Ttxw

which we

in

educational planning could be made perfectly accurate:

in

practice, errors from various sources enter into all the
stages of data collection and processing.

Perfect accuracy

would be extremely costly to attain, and the high costs
would not, in general, be justifiable.

And even if we could

overcome all the practical problems of accurate measurement,
we would still frequently prefer to approximate our data.

For example, instead of saying that the population aged 7-24
in Indonesia in 1975 was 52,282,907 (see Example 2 above),
we may say that it was approximately 52 million.

In making

such an approximation, we should define the degree of

approximation.

Thus we could express the above result as:

52,000,000

+,

500,000

or

52 million to the second "significant figure"5

both expressions meaning the same thing.

Where a decimal

point is involved, the zeros needed to locate the decimal
point are not counted as significant figures.

2.3.2

Numbers which result from accurate countings are

exact and so have an unlimited number of significant

figures.

Given the existence of errors in data collection

however, a number such as 52,282,907 may have an uncertain

number of significant figures.

There can be no absolute

rule, therefore, for deciding on the "correctnumber of

16.

significant figures.

The appropriate number to work with will

often depend on the particular circumstances:

your knowledge

concerning the accuracy of the sources of the data, and the

uses to which they will subsequently be put.

2.3.3

The main danger in using approximate figures lies

in giving the impression of a greater degree of accuracy

than is actually justified.

If, for example, we were to

add the following numbers, representing, perhaps, populations:
762

(accurate)

1,900

(to the second significant figure)

123,000

(to the third significant figure)

125,662
the answer, 125,662 is misleading.

For the second figure, 1900,

could have been anywhere between 1850 and 1950;

and the third.

123,000, anywhere between 122,500 and 123,500.

So, rather

than exactly 125,662, the answer could have been between a

minimum and a maximum:

762

762

1,850

1,950

122,500

123,500

125,112

and

126,212

The difference between the two extreme possibilities is 1,100.
The original answer should have been better expressed:

125,662

2.3.4

+.

550

Whether we wish to approximate our data because of

known inaccuracies or because we simply wish large numbers to

be more readily digestible, we must decide on a method for the
process of •’rounding”.

Assume, for example, that we wish

17.

to round a number such as 3.67 to one decimal place.

The result

of rounding is 3.7, since 3.67 is nearer to 3.7 than to 3.6.
Similarly, 103.8135, after rounding to the nearest hundreth,
that is to two decimal places, would become 103.81, since
¥

103.8135 is closer to 103.81 than to 103.82.

If however we

were faced with the number 103.815 and wished to round it to
2 decimal places, we would be in something of a dilemma:
for 103.815 is just as close to 103.82 as it is to 103.81.
It has become a useful convention to round in such cases
to the even integer preceding the 5.

This practice is

useful in reducing cumulative rounding errors when a large
number of operations is involved.

Thus 103.815 is rounded

103.825 is also rounded to 103.82, and 103.835

to 103.82;

is rounded to 103.84.

Rounding to the nearest million,

16,500,000 would be 16,000,000;

17,500,000 would be

18,000,000.

2.3.5

Adding a set of rounded numbers, as we saw in 2.3.3,

inevitably involves a degree of error in the final result.

This cannot be avoided, but should never be entirely

overlooked.

Percentage figures are very often rounded to
The (percentage) figures in

one or two decimal places.

the first column below have been rounded to one decimal
place in the second column:
%

%

40.55

40.6

30.35

30.4

29.10

29.1

100.00

100.1

18.

We see that the original correct total of 100% has become 100.1%

in the second column, due to the rounding process.

This

cumulative rounding error inevitably occurs fairly frequently.
The second column total should not be written, incorrectly,
as 100.0^ but a footnote or comment placed in the table
*

mentioning the occurrence of rounding error.

Example 3:

Rounding numbers

Each of the following numbers has been rounded to the

(Remember the convention of

indicated accuracy.

rounding to the even integer preceding a 5).
Rounded to

Original number
7.5001

Result of rounding

8

nearest unit

48.6

it

-3.674

ii

hundreth

-3.67

7.9283

ii

thousandth

7.928

9,499

ii

thousand

9000

9,500

ii

thousand

10000

-10,500

ii

thousand

-10000

ii

million

17,000,000

16,500,001

49

ii

Exercise 3

The reader should now do Exercise 3, on page

2.4
2.4.1

•

Proportions
A proportion is a ratio relating the magnitude of a part to

its whole.

Hence a proportion, P, must lie between zero and unity:

0 < P

$

1

For example, a part of an enrolment total might be expressed as a
proportion of its whole.

The example below shows for each year

the proportions of all children enrolled at the primary level in

19.

Gabon over the period 1972-1976 who were female.

The proportions

have been rounded to four decimal places.

Proportions

Example 4:

Female proportions of total primary level
enroIment , Gabon
1972

1973

1974

1975

1976

1. Female enrolment

50,505

53,401

55,354

58,995

62,736

2. Total enrolment

105,601

110,466

114,172

121,407

128,552

3. Proportions: line 1
line 2

0.4783

0.4834

0.4848

0.4859

0.4880

2.5

Percentages

2.5.1

It is common practice to express proportions in percentage

form.

A percentage is a proportion in a hundred.

Thus to convert

proportions to percentages, the proportion is multiplied by 100.

Example 5:

Percentages

The proportion comprised by girls in primary level enrolment in
Example 4 is given in 1976 as 0.4880.

By multiplying this

proportion by 100, this may be expressed in percentage form:

0.4880 x 100

48.80 percent

often written

48.80%

Exercise 4

The reader should now do Exercise 4, on page 63.

2.6

2.6.1

/n

variables over time

In analysing educational data, educational planners and

statisticians very frequently use time-series analyses

We have

20.

already been introduced to the concept of a time-series.
is the presentation of a series of data ordered over time.

It
The

observation of the manner in which particular variables grow,

stay constant, or decline over time can help in describing
and explaining how educational systems have behaved in the past.
They also play their part in predicting how variables might behave
in the future.

2.6.2

An important extension of the fundamental idea of a rate is that

of a rate of change over time.

In Example 1 we saw that primary

level enrolment in Niger was growing over the period 1972-76.
what rate did it grow?
variable rate?

At

Did it grow at a constant rate or at a

Can we develop concepts to express simply and clearly

how variables change in magnitude over time?

We now discuss the

basic techniques available to demonstrate changes over time.

2.6.3

An initial distinction should be made between absolute and

relative changes over a period of time.

We shall use the following

symbols in analysing the data already presented in Example 1.
P

= primary enrolment in the initial year

n

= number of years in the period

o
P n = primary enrolment in the nth year

21.

Example 6

absolute growth in primary enrolment from year 0 (1972)

to year 4 (1976) is simply the

difference between

enrolment in the two years:
142,182 - 94,500

(Pn - P )

o

Ob)

47,682

The relative, or percentage, growth in enrolment may be seen
as the ratio of the absolute growth over the period expressed
as a percentage of the initial enrolment figure:

(l,„ p

=

2.6.4

100%

X

o

47,682
94,500

x

50.46%

(to two decimal places)

('7)

100%

The reader should note carefully that the percentage growth

over the period is the absolute growth expressed as a percentage

of the figure in the initial year of the period, not in the

final (or any other) year.

Thus, for example, a 100% growth over

a period would be correctly interpreted as meaning that the
absolute figure at the end of the period was double that at the
commencement•

Exercise 5

The reader should now do Exercise 5, on page

2.7

2.7.1

Rates of growth over time

We have introduced the ideas of absolute and relative growth

22.

in variables over a period of time.

rates of growth over time.

We now consider the question of

That is, we ask how we can measure the

rate of growth per time-period.

2.7.2

Returning to Example 6 above, we saw that the absolute growth

in primary enrolment in Niger over the 4-year period 1972-1976 was

We may define the average annual absolute growth as:

47,682 pupils.
P

n

- P

(li)

o

n

which, in this example, is:

142,182 - 94,500
4
= 11,920.5 pupils per year
The reader should note, of course, that this average annual absolute

growth was never actually observed between any years.
average:

a statistical artefact.

It is an

Nevertheless, like any summary

statistic (1) its value lies in its comparability with other

similarly calculated statistics, for example, over different time
periods or across different countries.

2.7.3

If a variable - such as a population - grew by a constant

absolute amount each time-period (say, a year) it would, if plotted

on a graph, describe a straight line (i.e. it would display ’,linearH
(2) growth).

In general, variables such as populations do not grow

(or decline) in a linear fashion.

Any constancy in the pattern of

change is more likely to be seen in a relative sense.

Populations

are more likely to show a constant proportionate, or percentage,
growth per time-period.

Thus the planner or statistician may often

(1)

See the discussion of summary statistics in Se«ir\on

(2)

Linear and non-linear functions are discussed in ^Sec-Won

.

23.

be interested to discover by what percentage, on average, a variable
grows per year over a period of time.

In addition, percentages

are more readily comparable one with another.

This is true both

for comparisons made for different time periods for the same

variable and, more importantly, when comparing the growth rates of
different variables, which may not even be measured in the same
absolute units.

2.7.4

The measure which expresses the average percentage by which a

variable grows per year over a period of time is the average annual

It is important to note immediately that this figure

growth rate.

is not obtained by dividing the percentage growth over the whole

period by the number of years in the period.

It would be wrong,

for example, to say that the average annual growth rate of the enrolment

data used in Example 6 above could be obtained by dividing 50.46%
by 4.

The reason for this will become clear after we have examined

how the average annual growth rate is in fact calculated,

To see

r..

this, we shall use the previous notation Pq, P^ and n, and
introduce "r”, the average annual growth rate, expressed in proportional

terms.

2.7.5

Assume a population grows by a constant proportion of r

per year.

If for example, it grew by one tenth (i. e., 10%), each

year, r would equal 0.10.

At the beginning of the period under

consideration, the size of the population is P^.
elapsed, the population has grown to size P^:
P!

P

pi

p

o

o

P r
o

(1 + r)

After one year has

24.

After another year had elapsed, the population has grown to size P^:

\

P2

P1

plr

P2

= P1

(1 + r)

Pf
1

= po

(1 + r)

P

= PP.
o

(1 + r)

2
after n years of constant

annual proportional growth r

P

P

n

If we know P

(1 «■ r)n

o

and P

o

(and of course n) the question now becomes,

n

A little algebraic manipulation^shows us that:

how do we find r?

(1 * r)

, .

the size of the population will be :

P

n
=

n
P"
o

(1 + r)

"/P
. / _n

r

7?

V* po
’

1

o

Average annual growth rate

Example 7:

Using the enrolment data in Example 6, let us calculate the

average annual growth rate, r, over the four year period

1972 - 1976.
P

o

P

n

We have:

= 94,500
= 142,182

n = 4

Hence applying formulaabove, we find:

r

V142,182
Y 94,500

1

(AJ?)

25.

2.7.6

How do

calculate this number?

The problem we face

is that it involves calculating the “fourth root” of

P

n

a

P

o
The fourth root of this ratio is that number which, when
multiplied by itself successively four times, equals the ratio (1).
♦

For example, the fourth root of 16 is 2;

But this is an easy example.

because 2x2x2x2= 16.

Methods for calculating roots do

exist, but are very time-consuming unless we use the technique of
’’logarithms”.

We shall therefore explore this technique before we

return to finish the problem of calculating Itpll in Example 7 above (2).

3

I.

LOGARITHMS

3.1

The idea of logarithms

3.1.1

To the reader who has never used logarithms, the sight of a

table of logarithms, with column upon column of dry numbers, often
seems forbidding.

Yet a little practice in their use has a great

pay-off in terms of time saved in future calculations.
!

It is, the

reader may be assured, a small V\(L extremely worthwhile investment

time to master their use.

of

Readers familiar with logarithms

may omit this section.
0/L

3.1.2

Let us proceed by^/example.

Consider the number 100.

This,

of course, equals 10 x 10, which we have seen (in 1.5 above) may
_2
be written as 10 .

Similarly, consider the number 1000.

3
10 x 10 x 10 = 10 .

We now recall what happens to the powers when

we multiply 100 by 1000.

This equals

From above, we see that this may be

written:

io2 x 103
(1)

See paras. l.S^l- 1.5.4 above

(2)

Readers with access to electronic calculators should note that
results obtained using calculators may differ slightly from those

using logarithms.

I

This is because of rounding errors involved in

using four figure logarithms, and rounding performed by calculators

26.

which equals:

105 (= 10^000)
The reader will note that we have added the powers.
Now let us define the logarithm (”to the base 10”) (1) of

3.1.3

*

100 to be the power to which we must raise 10 to give us 100:

is, 2.

that

Thus we have,

lo«10

100

2.0000

we shall consider here "four figure" logarithms, the four figures

(the mantissa) referring to the number of figures after the decimal
point.

Similarly, if we ask ourselves to which power we must raise

10 to give us 1000, the answer, as we have seen is 3;
lo810

Now consider what

1000

hence,

3.0000

when we add these two logarithms:

loE10 100 + log10 1000

2.0000 + 3.0000

5.0000

What number has a logarithm of 5.0000?
the "antilogarithm’1 of 5.0000?

What, in other words, is

_5
The answer is 10 , that is, ICQ000.

Thus we have performed the multiplication of two numbers by adding
their respective logarithms and then taking the "antilogarithm" of

the result.
3.1.4

We see that the great advantage of the method of logarithms

is that the simple process of addition replaces the relatively
complex process of multiplication.

And, as we shall see, the method

also copes with division by the relatively simple process of

subtraction.
(1)

Logarithms may in principle be based on any number,

with a base of 10 are known as "common" logarithms.

Hiose

27.

3.2

Tables of logarithms

3.2.1

To master the use of logarithms requires a little practice -

and of course, a table of logarithms.

The reader should now turn

to the annexe, where he or she will find a table of common

logarithms followed by a table of antilogarithms.

3.2.2

We have seen that the logarithm (to the base 10) of 100 is

2.0000.

«...is easy to
, see, .because we all
.. ,know that 100 = ^2
This
10 •

But what is the logarithm, purely as an example, of 3.642?
of 49.43? or of 111.1?
indispensible •

or

It is here that the table of logarithms is

Most such tables are four-figure tables, and these

are adequate for the majority of purposes an educational planner
would have.

3.2.3

The question

Consider the meaning of the logarithm of 3.642.

we are asking is:

give us 3.642?

what is the power to which we must raise 10 to

Because that power is defined as the logarithm

(to the base 10) of 3.642;

10loE103-642

that is,
3.642

The table of logarithms provides us with the answer.
table and look down the first column:
numbers from 10 to 99.

Turn to the

it is a column of two figure

To find the logarithm of 3.642, look down

the column until you come to n36H.

Now move along this row, across

the columns, until you come to the first column headed ”4M;

you

will find that the four figure number in row ’’SO”, column ,,4", is
5611.

We have not quite finished yet as we wish to allow for our

final figure, 2.
to the right.
(36).

You will see a second set of columns headed 1-9
/
Look under column 2 against the same row

The number ”2” appears.

have so far obtained, 5611.

This should be added to the figure we

We have now obtained log

10

3.642:

28.

0.5613

1Og10 3.642

That is to say, from our definition of logarithms to the base 10,

100.5613

3.2.4

3.642

Let us now find the logarithm of 49.43.

Preceding in the

same way, you should find:
1.6940

logl0 49.43

The reader will ask:
decimal point?

why does the figure ”1” appear before the

This figure must be supplied by you? the user of the tables.

The figure is known as the ’’characteristic”, or ’’index” of a
logarithm.

The rule to be adopted is this:

The characteristic of any number greater than one is positive,
and is less by one than the number of figures to the left of
the decimal point.
Thus, in our first example, the number 3.642, the characteristic of
the logarithm is 0, because there is only one figure to the left

of the decimal point.
1.

In the second example, 49.43, the index is

In our third example, 111.1, the index is 2.

The reader should

confirm for himself that:

logio 111.1
3.2.5

2.0457

How do we deal with a number with no figures to the left

of the decimal point, for example, 0.2327?

The rule to follow

is this:
The characteristic of a number less than one is negative, and
is greater by one than the number of zeros which immediately

follow the decimal point.

Thus the reader should satisfy himself or herself that:

*

29.

loS10 0.2327

1.3668

And, for example, that
logiO 0.0037

3.5682

(The figures 1 and 3 are to be spoken as “bar one” and ”bar three”).

3.3

Tables of antilogarithms

3.3.1

We have seen that when multiplying numbers, we add their

logarithms.

We thus obtain another logarithm:

this represent?

what number does

To find out, we TnO-J use the table of ’’antilogarithms”

Take, for example,

2.0457
Of what number is this the logarithm?

To find this, ignoring for

the moment the index figure 2 (for this simply tells us the position
of the decimal point in the number we shall find), scan down the

first (two—figure) column of the table of antilogarithms.

On reaching

*\04”, move across the table until you reach the first column headed

”5”.

You will see the four figure number, 1109.

to allow for the final digit, 7.

You still have

So move across to the second of

the columns headed ”7”, where you see the number 2.

As in the table

of logarithms, this is added to the four figure number you have

already obtained, 1109.

Thus you have found that the antilogarithm

of 2.0457 is:
111.1

Note the position of the decimal point.

It has three figures in

front of it because the characteristic is positive and equals 2.

3.4
3.4.1

Use of logarithms in multiplication and division

Wen multiplying numbers, we add their logarithms.

dividing we subtract their logarithms.

(l) lb io
"ik reverb \

When

•

30.

For example,
1000
10

io2

io3 x 10"1

So in order to divide 1000 by 10, we subtract their logarithms:
loS10

/ 1000
< 10

10g10

1000 - log1Q

10

3.0000 - 1.0000
2.0000

Looking up the antilogarithm of 2.0000, we find that it equals:
100.0

which, of course, is the correct answer.

Example 8:

Use of logarithms in multiplication

We shall multiply together the first three numbers we discussed

above, that is, find the product of:
3.642 x 49.43 x 111.1

To calculate this product using logarithms, it is good

practice to set out the data in two columns as below:
logarithm

number
3.642

0.5613

49.43

1.6940

111.1

2.0457

(add)

4.3010

Adding the respective logarithms, we see .that they total 4.3010.

Of what number is this the logarithm?
table of antilogarithms.

Look up 0.3010 in your

You will find the number n2000”.

Where should the decimal point be placed?

is 4;

The characteristic

this means, following our rule above, that there should

be five figures before the decimal point.

20000.0

Therefore the answer is:

31.

3.4.2

Let us now consider the following product:

0.2327

0.0037

x

Recalling our rule about the indexes of numbers less than l,we
may write:
4

number

logarithm

0.2327

1.3668

0.0037

’3.5682

(add)

4.9350
Proceding as before, you should now look up 0.9350 in the table of

antilogarithms, and find the number 8610.
4;

The characteristic is

you must therefore place three zeros after the decimal point

before the first digit.

The answer is therefore:

0.0008610

3.4.3

It should always be remembered that the "bar” over a

characteristic means that the characteristic should be treated as
a negative number.
x

106.4

Thus if we wish to find the product of:
0.0039

we should write:
number

logarithm

106.4

2.0269

0.0039

3.5911

(add)

1.6180

The product is found by looking up the antilogarithm of 1. 6180:
0.4150

\<3 O

32.

Example 9:

Use of logarithms in division

Let us divide 106.4 by 0.0039:
106.4
0.0039
We must now subtract the logarithm of the denominator (the

figure in the lower part of the ratio) from the logarithm of

the numerator (the upper figure):

number

logarithm

106.4

2.0269

0.0039

3.5911

(subtract)

4.4358

We find our answer is antilog (4.4358):
27280.0

Exercise 6

d.o

The reader should now

Exercise 6 on page

b T.

3.5

Use of logarithms in finding the powers and roots of numbers

3.5.1

Another extremely valuable use of logarithms is in rapidly

finding the powers of a number. Consider a simple exercise:
3
what is the value of 10 ? We note that log-^Q do3) = 3 log10 10:
1OS10

(io3)

lo®10 (10 x 10 x 10)

ioglO 10 + loK10 10 + loK10 10

3(1°Sio 10)
3(1.0000)
We find antilog

10

3.0000 = 1000.

4
Similarly, if asked to find, for example, the value of 3.724 ,

33.

we would recognise that;
4

loS10 (3.724*)

=

4(1oB10 3.724)

=

4(0.5710)

2.2840
Looking up antilog 10 2.2840, we find that the answer is:
192.3

3.5.2

How would we find, for example, the nth root of a number?

By dividing the logarithm of that number by n, and proceeding to
find the antilogarithm.

Example 10:

Use of logarithms in calculating roots

What, for example, is the value of r in Example 7 above?
4 /142,182
r =

'V 94,500

1

We proceed by finding the logarithms of the numerator and
denominator and subtracting the latter from the former, giving

us the figure 0.1775.

antilog

(

We

Taking the antilogarithm, we find the answer:

obtain 0.0444.

r

This figure is now divided by 4.

h toi

10

1.108 - 1

0.108

number

logarithm

142,182

5.1529

94,500

4.9754
4

0.1775
0.0444

(subtract)

34.

3.5.3

For our final example of the use of logarithms, we

calculate the root of a number lying between 0 and 1 (note
that the logarithm of 1 is defined to be zero, and that

logarithms of negative numbers do not exist).

We have seen that

logarithms of numbers between 0 and 1 have negative characteristics.

How do we divide a negative characteristic by the given value

of the root?

The technique involves splitting the characteristic

into two parts.

The first part is negative and is chosen to be

exactly divisible by the value of the given root.

The second,

compensating, part is positive and is placed against the mantissa
of the logarithm.

3.5.4

In the first of the two examples in Example 11 below,

2 + 1 is written for T, so that the negative part can be divided
exactly by 2..

In the second example, 3+2 replaces 1 in order

that the negative part should be exactly divisible by 3.
Example 11:

Use of logarithms in calculating the roots of
numbers between 0 and 1

1

Calculation of the square root of 0.56
The logarithm of 0.56 is T.7482.

by 2 We rewrite it thus:
2, we obtain:

1

2

In order to divide this

+ 1.7482.

After division by

+ 0.8741, which we may write as 1.8741.

Taking the antilogarithm we find the answer:

2

0.7484.

Calculation of the cube root of 0.29
The logarithm of 0.29 is T.4624.

by 3, we rewrite it thus:
3, we obtain:

In order to divide this

3 + 2.4624.

After division by

T + 0.8208, which we may write as T.8208.

Taking the antilogarithm we find the answer:

Exercise 7
The reader should now do Exercise 7 on page

0.6619.

35.

4

THE AVERAGE ANNUAL GROWTH RATE

4.1

Calculation (continued)
Using formula

We may now return to Example 7.

4.1.1

developed

in 2.7.5, we found that the average annual growth rate of elementary
enrolment in Niger over the period 1972-76 was given by the

expression’:
4 142,182

r = a/

94,500

(A3)

-1

After introducing logarithms, we saw in Example 10 that r was
calculated to be (1.108) - 1 = 0.108.

This tells us that over the

time-period, enrolment grew at an average annual proportional rate
of 0.108.

4.1.2 Usually this figure is expressed in percentage terms.

The

reader will recall that a percentage figure is obtained by multiplying
a proportion by 100.

Thus, the average annual percentage growth

rate of primary level enrolment in Niger over the four-year period
1972-76 was:

r = 10.8%
4.2

Use of average annual growth rates

4.2.1

Such growth rates are widely used in comparing the growth

(or decline) both of one variable over different time-periods and

of different variables.
a constant

It is worth reminding the reader again that

rate of growth. r, does not mean that the variable

will increase by a constant absolute quantity each year.

It means

that it augments itself by a constant proportion (or percentage)

each year.

This implies that the absolute increment each year will

increase over time, as the base on which the constant proportional
growth is calculated each year is itself steadily increasing.

36.

(The reader may benefit by carefully re-reading paras. 2.7.3.*2.7.5.
above).

Exercise 8

.

The reader should now do Exercise 8 on page

4.3

Time taken for a variable to increase by a given magnitude or

proportion
4.3.1

Frequently, the following sort of question may arise in

if a variable, for example population, continues

educational planning:

to grow at its current average annual rate, how long will it be
until the variable is half as big again?

or has doubled?

The same

of
At the present rate of

sort of question could be asked^enrolment.

growth, how long until enrolment has doubled?

4.3.2

or trebled?

The answer to these question may be obtained by consideration

of formula (Al ) again.

P

P

n

Previously,

o

(1 > r)n

(if)

we knew P , P and n and sought to calculate r.
n
o

Now,

the problem is that we know P' » P and r, and seek to find n, the
o
n
length of time it takes P o to grow to size P n at a given rate r.

4.3.3

To find the value of n in terms of Pq, P n and r, the most

straightforward method is simply to take logarithms of
l°g Pn

iog Po

n log (1 *

)

Re-arranging, (recall the “Ku/cJ described in /-6<^?above)
n log (1 + r)

n

=

log Pn - 1OS PO

= 1Og Pn ~ 1Og P°
log (1 + r)

=

1OE(>)

n

(^)

log (1 + r)

):

37.

Example 12:

Calculation of n

In the ten years from 1965 to 1975 the total number of persons
enrolled in Africa at the first, second and third levels of
education rose from 29.9 million to 52.9 million.

This

represented an average annual increase of 5.87% » If growth

were to continue at this constant rate, how long would it be
after 1975 that African enrolment was 50% greater?

To

answer this, note that when enrolment is 50% greater in n

years* time.
P
P

n

1.5

o

Therefore applying formula

n

(»

log 1.5
log 1.059?
0.1761
0.0248

7.10 years

4.3.^ Example 12 shows us that^ in 7.10 years’ time from 1975 f
enrolment at the first, second and third levels of education in
Africa will be 50% greater, assuming a constant average annual
growth rate over the period of 5.87%.

Exercise 9
The reader should now do Exercise 9, on page

5

MATHEMATICAL FUNCTIONS

5.1

5.1.1

Introduction
Why should a practical person interested in educational

planning be concerned with seemingly highly theoretical and abstract
matters such as "non-linear functions" and other concepts discussed

38.

below?

it is not, perhaps, immediately obvious1

are fundamental.

But the reasons

The planner (or the analyst of events in the

past) is essentially concerned with relationships between variables.

One initial problem is to specify these relationships.

To

’’specify” means to decide which variables we believe depend on
which other variables.

A second problem is actually to ’’estimate”

these hypothesised relationships.

Just how, for example, does

population depend on the other specified variables?

What are the

magnitudes of their separate contributions to changes in population?

This issue of actually measuring statistical relationships between
variables is a topic best dealt with in a statistical context, and
is more fully discussed in

IM

Ad
hj
KOH

Oh

/Kt,
1-0

39.

5.1.2

▲ variable ia a symbol, such as X, Y, g, K, u, which

can take on any of a prescribed set of values.

This set of values

is called the domain of the variable.
5.1.3

Thus, for example, a promotion rate from one school grade

It could (in

to another is a variable, and could be written ’’p”.

principle) take on any value between 0 and 1, which is therefore its

We could write its domain as follows:

domain.

0< P £
where *’

it

1

means ’’less than or equal to”.

’•greater than or equal to”.

The symbols

Similarly
and

means

mean ”less than”

and ”greater than” respectively.
5.1.4

If a variable can theoretically take on any value between

two given values, it is called a continuous variable.

Otherwise it

is called a discrete variable.
5.1.5

The concept of a function is an extremely important one.

since it is the nathenatician's (and statistician's) way of expressing
relationships between variables.

Formally, if

the values of a variable Y depend on the values of a variable X,

we may say:

”Y is a function of X”, and write in general:
Y

Cts-;

f (X)

This expresses the idea that, to each value which a variable X can
take on, there corresponds one or more values of a variable Y.
reader may come across other letters than ”f”

, etc.

They have just the same meaning.

of possible instances of functional dependence.

such as F,
There are a multitude

We have already seen,

for example, that primary level enrolment in Niger may be seen
as a function of time, t;
E

f(t)

we may write:

The

40 •

That is to sayy

enroIment

dependent" on the variable tine.

is a variable which is "functionally
Note that we do not write it the

other way round:

t

627)

f (E )

3

We do not attach any meaning to the idea that time depends on enrolment
levels1

5.1.6

Formally, when we write

Y

3

f (X)

we define Y as the dependent variable and X as the independent variable*
Sometimes, especially in statistical analysis, X is called the
explanatory variable*

Linear functions

5-2

5.2.1

Functional dependence between two variables is frequently

suggested by a table (as in Example 1 above).

Where

there is an exact mathematical correspondence, it m^y be shown by .an exact

linear equation connecting the variables, such as
Y

=

3X - 4

L^)

This equation is a particular linear function connecting Y and X.
It gives, in effect, the rules which govern the linear relationship

between X and Y.

If we know the values of the variable X, the

equation shows us how to find each corresponding value of Y.

This

function tells us:
"given the value of X, first multiply it by 3, then subtract 4,

and there results the corresponding value of Y".
Thus, for example, when X = 6, we may write:

Y

which is to say,

=

f (6)

41o

Y

s

A Y =

3 (6) - 4

14

In other words$ the value of the function is Y = 14 when X = 6.
Sinilarly, e.g., as the reader should verify:
when

Y = -4

X « 0,

X = -2.9, Y x -12.7

Y a 296

X = 100,

and so on.
5.2.2.

There are many varieties of functions and no attenpt can

be made here to investigate any other than the sost basic.

The

linear function in two variables, which we looked at, is perhaps
the simplest.

It is ’’linear1’ because if drawn on a graph (Figure
seen to be

2 below) it is
5.2.3

a straight line.

The general equation of a linear function (a straight

line) is usually written:

C^o)

Y = a + b X
In our example above, see

),

a = —4
b = 3
There is an infinite number of possible linear equations, since both

a and b can in principle take on an infinite number of values.

and b are known as the coefficients of the equation.

a

We shall see

later, in considering graphs of functions and in the chapter on

statistics, that a and b have important practical interpretations

in analysing relationships in the field of education.

5.2.4

The concept of a linear function may be extended to two, ,

or more, variables.

For example:

s

f (X, Z, K)

Y

42.

is a function with 3 independent variables.

That is, it stands, for a

situation in which Y depends on three different variables.

A particular

instance of this function could be:

(31)

Y=4+2X-3Z+4K

Again, this may be seen as a set of rules for finding Y, given

particular values of the variables X, Z and K.

5.3

Non-linear functions

5,3.1

The general equation of a non-linear function of the

’’second degree” is:
Y=a + bX + cX

2

(33)

'Hie presence of a squared term in X makes the graph of the function

curved (in fact it is a parabola).

Equations in which the highest

power of an independent variable is 2 are known as quadratic equations.
Perhaps the simplest non-linear equation is:

Y = X2
which is shown graphically below, after the reader has been introduced

to the concepts of rectangular coordinates and graphs.

Exercise 10

Exercise 10,. on

The reader should now do

•

6. GRAPHS
6.1 Rectangular co-ordinates
6.1.1

Functions may be depicted graphically.

Y = a + bX

A linear function

(lo)

in which there is one explanatory variable may be easily drawn.

given a and b’s values.

Consider two mutually perpendicular lines,

43.

XZ OX and Y^OY, intersecting at 0

in

called the X and Y axes, respectively.

Figure 1.

These lines are

They should be scaled as

appropriate for the variables under consideration.

6.1,2

Point 0 is called the origin.

By convention the Y is

the vertical, and the X the horizontal, axis.

▲gain by convention

the X axis is scaled negatively to the left of the originv positively
to the right.

The Y axis is scaled negatively below the origin

and positively above.
6.1.3

Consider any point, P.

If perpendiculars are dropped

fron the point to the axes, the value of X and Y where the perpendiculars
neet the axes are called the rectangular coordinates of P (or often

simply the coordinates of P).

The coordinate X is sometimes called

the abscissa, and of Y the ordinate of the point.
if we look at the point P

1

in Figure 1

Hence, for example.

, we see that:

the abscissa is 2
the ordinate is -4
the coordinates are therefore (2, -4)
Similarly, the coordinates of points Pg, P^ and P^ are (3,3), (-2,1)

and (-5, -4) respectively.
written (X, Y);
The great

Notice that the coordinates are always

the X (abscissa) value coming first, by convention.

usefulness of this technique is that, giyen the

coordinates of a point, we can ’’plot” it on the figure.

6.2

Plotting the graph of a linear function

6.2.1

Let us see how the linear function we have discussed may

be plotted:

Y = 3 X - 4

(^)

We shall plot the graph (see Figure 2) over the domain:

44-;,

1L

blLM-

drddd
-B L- .'J'•'••!-■■•

11
HHi-' aiB 'MB
Ed -4<
U: ’h

“te

M

17

— -

LHLi
■ i;:l:--f

-4.: 44

”<■

t

4;^

BR:W

71

M. .

' wl-

TbhntrriF

L.M1 'M 4:4:4 SiijO_Ldi:d4 i
4i
I /i
, I—I. . • . . .»• -< • -

I
I

>

‘ -

-711 K-trrii ■I’:!-

:

"T7

hiMlSii:L
. ^ir-;T

w

.1 :■ -.

BiLiB

4

. t■ r

-

■

I

L J.

r.

L—-

MK.!d.iM;.'.L. j...

L

I

I

I

.■..,.44 '
• JMLL;
:

■ B: - M '
LMbb
“

ddL..--'-'*

XEEJE
*r1-*
j

<.

bH4: MM;:' 4
L-lEB-

■ '■[LlS:}1'' '

MMUilld

;.:r
bLL■Ml .....MMM
.Lhr^-4 BB : H L: bb '■
11LL.
1M-B
-Ml.:;B-M
.^u:^ MlHMMii

4^.1M4L

Lv^. 4

I

H!

■

..
.
03

M !

42£.4- ...

_ _lL'IlIl 7, :.i

M-BS
W

Mild.

T-

; ._ .

-•

i:

- ■4

_!M|. : . .
i‘

-■i~

tii-B:
;nB
- ---------

!..:',44; -

r ’w’- fiL

•i
..uj.

; ...... j. ... .
■

i.d

11144:

IImI

B-i— BE-i 4 ■,
'.

l.E: :l’i

41;".r . ; .:|
.cr
---- ----- - i;::?::::. 1.. .. .
fOt-TTFHTl-^

4 ■: ■■ L

■’”■ r:T~t------ r

.tl ; - 1

* <

Ml d

'7'1.

' r';

‘

......

rt1-

y
;
444'!-

4£

.-L-

: ■

-r-t—;■

M4 :

L

. -1. - ■

■

Mr.:’ /•iLMirylLi;

|»-B! •••

Ldd.L4
B'd-R: i: P 'r
IE
Ed BibE^LF .-.
-4 ~^.
.
---- llli.-'
EbEE
l .-LLj

• • • •• -h

HMd-ld-M--- ;d .......M—-L-M MM“ 1:

UiaiE

OE
ME-

It:

I

. :1 U’

Edid ' ■

•

P ’ '- “L

Era

.

:• . .:: -Lu--M.-.:?

MLiLlM-^d
e
|lM4jd m
'S

LM

U -—

►*

3
? 3

IMP

ddw

*4

»-*
•1

:44:Ii4
4-44 1..J1. LHad i
sc*
O
ra
:
B.
-M
«
> !S34

„....,3SE

r.M,LB‘MH-41
■

’

■ ..MLB;

d*d‘: j fc~‘ *}

1MW4

t.

<♦

“

■ ‘ ’i-

IS®
- ' ::4r-—

rK

”1

O

M—4— — -H

'; |.'..!.'_11 Z.»l—.. . fl- •

»
o

9

M' 1"
3
! -i -!~7 "2■^Mr
___
4M
i-MM

X

LjLMM i ' •'

.r jr ■FrWM~T
*

”r*4T7tr

■1" ':

EMM Li:!:'- !!' 11.1. •!::
—
.
ih-M---i'B ' ’ ' ' -H.l:
-e-- ■•■•■■
-■

KM

E
ElL.:

—4—2—

I

□““E
m"r

rrui:

*

iii® ^44—■^-4—
EE:

1

r- H LL.*u-;.

fpiiJi:

. .. ,J. I'LL; ■

I
I

I

. , : -• d ' i •.•

'':i •":

:r.-»4:.:■: ••/ .::•:■

dH'-iEd^L. 4EL..J4:4'4
KhjOi

■ i

—

■

______M—.L..™.1. -.:

■'r~

, /4.'4-

.........
' feiLi 7- -’.i™-' -T
-a : i
?

"4r'-

■I;:?....

Bia L... I
&ba a r

i

I

I

•I.el
J;
. -q

d-w®
...... BM
Ba
-q-B

J b..; I " ■ ■-

H'/ri: Fp.d: - [■’
ddr;
44<iL MM-ll-L.'.4-’“
■r’lL'
7^

!

45.

-5

X

5

This is a purely arbitrary choice of domain.
over any domain of X we chose.

We could plot the graph

The function, it will be recalled,

tells us the values of Y which correspond to each value of X in X's
domain.

In order to find, therefore, the coordinates to plot on

the graph, we should draw up a table of corresponding values of X
and Y.

Here we have chosen 7 arbitrary points in X’s domain:
Function:

(7*8)

Y = 3 X - 4

Domain
X

Y

-5

-19

-3

-13

-1

-7

0

1

-1

3

5

5

11

The reader should verify for himself or herself that these are
indeed the correct corresponding values of X and Y, by substitution

of X = -5, -3, ••••, and so on into the equation.
6.2.2

We now have 7 coordinates:

7 pairs of (X, Y).

points may now be plotted on a graph (see Figure 2 ).

These

All seven

points have been plotted, from (-5, -19) through to (5, 11).

The

reader nay have realised, of course, that because the function is

linear, only two points need strictly be plotted.

The straight line

connecting them will represent the linear function.

More points are

plotted here simply to familiarise the beginner with the techniques

involved.

46.

n
I

^rhi.ririp h:;J

;

hi

-f-

IF SBtt ibihi

"
~r
r
ruoi
-wdw
..
Hr^rritWk L .i-. BBito ~r^
®K
±44
J

.... .

."iHiTV. |.'

.±11 l-H

IrrF!
dri: yr1 r-iitif i-4-i 14
ir.::
____

WLtlij.

1 ■ ±-H±
±;ipTi.^

1:

±F

-r
4.-:-r—
o
sagjWi'T'
73
nn x
HJ]
IpiSS
aygpg?- h:-:

.-.i

4K41”. ±^
WiOii.ilUi-’i" J

J;.: . ...

: ’.•XiHE
}-•’i-;:• “ ‘ ”
4- \ 47-—^ 44;::
-mY- 11V." 47Tt—ip-

•ERffihi i i

sr~~
^h'?;
U---- 14—

! 7

■4:

|

"x;:.

1:P'K
jlf

5 5:55^5552

4: 74.4 FF

....t..±±7E±± •'F.n i

■

:n:4 WWF?

w±

if

: ■■ ■■

L_-

,-1***^** •

■T±--

*--4^ “‘I

;44 7|4-,... J/

-rjr-4 '7

:

454: ±47=4^
r.ii-i :.u..:—44x;.x; ■

, } 1^714

g±744’

—rife
irrw
jiwFlZl
i
Hr •] i * i [i'-thH frh J;
’ ;±li ~i: ’!
r®i _
r—j

MptSP

544 !:: 4f
r’F'iht----45444725U;r!
t

■ ■ i

-yU • i ' I1-4 J'14

■

------- —yrrrr

MbB

±r±K'T4
4±±W □fl

iC-'r. f

?|5l

: : (.... .

—1—
Fj 4: - I ■

±

nF :;F

ft
WtI nfr
nrr 44454 . 2.
r
Mb r
1
ftFf
F
fyffri4-7Wn4144
lL7L i _
t
FF
nrr
■!'•• t I ' 1
CBSiSW4-IT;
-

iihEr -J.-: H+! i3£l-- iiWfi-F 4.
tw; HxEJ
HtlEU 421151
ttrt
41 :4 i

■I:'' •X-

7^

i

X’V

I.:- •

w ■

br iri. ,■ r- ■ pj

ri ..J.:'14.

■ 1 ,r

■

I ’

: ■; ::|T'1::5

1

n

'fr’i :r-.:

:rrW
r; ! r-f-FlbN
.r
-—~r

-I ■■--■-

■"Ht

±±n t

lii
... 4-r • h’--' ■ 1 •

’

Fpr?. irp 1.4

aXPrwn' 1±±

■ i

■

■

t-i” ti..........

F

rr

\

‘■.r. rtr'

rWiW±±|l

"iW r.~ WF
!.:4 ; •:I

. ■■■-r:;.i:i.| -7

4 ■'.:Util-.]“i-T 1: :. ;.. • .(

,r..,r.7

iii;

—

... B i

bhFt.FilL,-.
U'rhth h;t'
.7:^Hxp±i-.l:-.’rnn:

•.'.ii/..-

Hiitf i ji-/FtHt; Tib

-

!
!i

^4.—:4,.l—-1
—ux
..(..«
■ • I ......
.— ■ ... ...

±r:±n Wl±5.... fir.. rf
■?f mtort
7
'.4PU.
o'r t*
T<-- Hhj
_;Lnto_dB■’"i• -±r - f -Ip-':;:''
rip:

<jf.4 p-bnr-f'
—ii

-; 44l 4:K44-44-l 47

±±1F ’
li&iw hw

•’••■ nr- Frt'L
.. -

FhjF~

_,irrFilrli*.:•.

...

4 ‘ fll’ij:

J3nrOr
rsfeSoj
...
tr4-:;'< ,

»«■> -

;?4ii
4; 4.474,
■^737
Fn'p;'- : -I
..•i.::'Jli pl-ii ±-.-

-c >-■; f.--. r

Oi

—■*

..,77
|ir::. .:;.7'44
t;

■ in?

,—u.

'•T'-.

yT. It.'”’r’T

.>n±
BiM. 5:411-3
'
iw
Bfci fW

i? nli'i’l S3Fg

1544

wrnn

nr?hi

4*z|ir

IEHHBfX±

I

!
Jk-U-fe
J .> X

■ -j: 4r; -

: ..l.i/ •.

‘

A>

•Fi-iM; 7r“^--rfijr-:
^4il..ih,h'.i.'-;l ’

ffl

i,
!

i10

47.

Interpretation of coefficients

6.3

6.3.1

Let us consider again the general linear function,

C^i)

Y = a + b X

In the functions have plotted,
a

3

b

There is only one straight line with this particular pair of coefficients.

a and b.

The reader should ask himself or herself:

what is the

interpretation of a and b on the graph?

6.3.2

First, consider ’’a1’.

A little thought will show that,

when:
X

0 , then:

Y

a + b (0)

Y
That is,

a represents the value of Y where the function cuts. or

intercepts, the Y axis.

a is in fact sometimes called the intercept.

In this case, as can be

seen :

-4

i

6.3.3

Let us now consider the meaning of "b”.

b tell us that

when X increases (or decreases) by 1, the corresponding value of Y

i

increases (or decreases) by b tines 1, or, more simply written, b.

That is, b represents the

I

change in Y
change in X
This is sometimes called "the rate of change of Y with respect to
X".

To understand this, first consider the function

I
Y

s

3 X - 4

tfhen, for example, X » 3, then Y = 5.

Y now equals 8.

Now increase X by 1, to 4.

It has increased by 3 (from 5 to 8).

Therefore, the

48.

3
1

change in Y
change in X

3
b

is constant and equals 3t the reader

To verify that the change in Y
change in X

should change X by a variety of amounts and see the corresponding
constant proportional change in Y to X.
6.3.4

b, the coefficient of X, is generally referred to as

the slope or gradient of the line.
2).

Now look at the graph (Figure

It has a constant, upward gradient.

The gradient or slope of

a line is in fact defined as the

change in Y
change in X

and this is shown on the graph.

With a linear function, it is of

course constant.
6.3.5

When

b is positive, the line slopes upwards from left

to right, and changes in both X and Y are in the same direction both up, or down, together.

When b is negative, this signifies that the line slopes downwards
from left to right.

As X increases, Y decreases, and vice versa.

When b is zero, the slope is horizontal.

For in that case.

Y - a + (0) X
Y = a

and the line is a horizontal one intercepting the Y axis at value a.

Exercise 11

The reader should now

Exercise H on page7o,

49.

gtHryi:• •:fel iH-kii":::

brbbbb kBi?
._. ^.._

iM
- MLii'LBz
RLm4llRlL><
RL? ■Bgfcqggj
. . ...

r—±«-t- . • ' "

\

r — -1 - -------- r-

,

RL' kk;l;k]

<

j........

J‘ -L

■' bb zL SRzLLn
,
iZ-ft ■

.ZS‘T-;
• - •I . t- ••*

-i-1-*■»••-* •

ZB ■

w

■■.■Bn.o^r.Lgp
___ 7Bl
r:.

aS

ZkzkjSS
ft®u
ZEjZnl'ZZMBBLr;

LLSMiM

Ii ?

zg®

■r-r-vj:--:-

L7'j

' 'rRLB '"aJ~''r:

■ ■

■

»

B'kk

.

bZ:zb
i. ,4 --R. J *. . : . . . . I • •■ i- ; • -

Ow
—-4 k".
gmfgzMMz
.

SSHZiZt L'^

r^yr?^!
_4_4_1Z4_
bzRLJ
■ b . :.L B L: R L. RhzLjR

bb_ 7L:7gsz_MBR7LR '

MMM- ’

■' b FFpM MLsBgp gggiggg

RRzl: ,
kWLL'
1-77-rr-^
L m---.;.-;.z

►*>

2
o
C4

k -L

_ ^L’Lb

......... M-B-iB-MM zRIR
’Ol"
i■"Wn'M^^LT
gL
—
LLLLL-LM
Or
r BBB zLRiLg b-LLpL--ZZfLL.
,■ Iz|l
}r-:U? : LRRbl -LLLLz
"f
LLR nh• | ■:!: L O
. !.

WMRLr/; ' ■

I -

_

.zSg^nti^S,

WO ;~L
O-P^Sx--;-•.....!- :-r- 1
.... lllpQ

ZBpI

, *.1 rlk«. k !. jl* i '.j.;-;1

'•............ : ; :.,.piJ:
'

t"T
J.,.-p-

®|3®BSE ~ k z
hRbLMje-L —k”T k
ZgBf B"

ffttuH -■

^r_iLX-

z.-Ttr-.-

-.<:-SkT' F?U;?£ F:
_
• RH-Fk-irli:
“k ' "
Rfer/BB-

-U
f_______

■

“-rr—f tr-i-^rl ‘7“ F‘' ' '' ~

R’7t?77Lr-a

SLLfzR

-“R-H L:

. i-"

;-----

BB7

! i:? IF;."*

O- H

‘i

Ik;:r4;..7;:LIsKZ.B.. .- k-Rikz_

Szlii'

rr-z MRRz 04'1’l.:.

bBb

:Z;4-: :

:H Tr7!~:r:r

tRF:;- 1_.

•.' ..L.i:. i’,.:..p.;.;/

~ RR.'F:

i

soogiBiJ

- -.4 «- •-<■; . ^--» - - --ti-* •

l

■) •

• ■ r ■:..-

Bs
sZ-

‘z:: 1:
izitlMt L.ruZ-.-R
___
4 ..■-; rrrr ” t_
- rt ■?. i.!i .• -T? *

OrBoRFRE -..H f- ■ ri *.F. -i rhT*L±rr:1 rrzi ‘-_xtt-.-ri £r.'
•}•

Blv
B^:- •r^^L^flp7

L

aLRM-

- t-L,Z-:
.. i.
-.L-g

TO§sgpi4---i>

A,,. ....

| i-7jB_^r^"-hZrLrp- z-:

••> - r -

• •• 7-{ -

■J

r

ziLB-Lr irrimz rS
—r-R-z.-k::-.- 7
i:-t:2rz ~ rrpzji^*

zB

Z
- r.cr7;:-g.rj5S*^T*

..b-bR- ; ..:feii-:

ri- •' —-r‘
' ;te-r iffgl gQ WR;L;M-i

H- /*'7| ;■

■>4:.:-: Lx.,
. - ;,
: J !!! ,.g L. . Z-M.-i-l

; ■■■'!■-" z. —r-—b

g6

T" Jrikkik
~
~ “1
—

' ~71" i~*0 r*1

-: pz t •_•

I

L- zAR-Zt.- ;;-i
b 7.HM1

..•i-LR: LRAr

"*"iLiL. L -.
ISO

.rF: H174-H ..... £ P’?-L
TSTir.- TTrr-“t—

i 7.
' ’ ■ • •; ~t: RR. z |

wfl

i^L:-

:

, * ~..

:

H ’I-??.-

mof eb ■ -B-irKl'itWrffr
hate
- f 7,' . i j -

r
!RtK

ZZy Z k in; ■ j: ; ■ M-h*»gz Li z
. Bl I

'

_ztz
jLIlzii. lRRR

i -;•! -M-B:

TfeRdM7a-gBS:Qn“"-L'|7
BLB'

iRORttpR
■Li LFT
L FMitw:'sSR Li'.
.V'F
gg
!

-TeLL-

wm-y

RpR

i

!

ziRL L __
• MRRULLLL
zb
__
z~Rt7bRR7b7;
- -»

IL..u.1

.:LLgr' '

< .4- .1—HiLLM

• - n: ■•t •" m ■ ■ ■ ■ zT>^rr~-

RzRRLLb

RRmlLI

--TT”—

h

4 ig-Q kO W£ kife
1*.

I

r

LLLk'

Bl:

RM
■LRIRL

M

: ' r'i'

tega=fe±f.
tZpjjg ggg Map
m

Bzaai
■' jlBLZLai
zMWBF
1
l
RM
lf
. EQ- ZL
L WRRz RR;:

RLLLRgtB WRi^::.
mziiZZZrSriBBL
H4-

RL :i?LtLF,|i:.;]j.;L: RRILil
■■ I• •■-

1—'li-L

.B-.lL

WSTF
■ iHfoffiBiSl
i __Mtezi
..
■ JoBomk
0#

■‘Li-' ;

ll-B.-L

iBBii.
!_■.;: .:z:.

; |. i [i :::

-

Si

Wk

F

ft'RtrrfR
' t~ Pz'

—'‘-M, .-

: BK.y

Rg
LLffiRMliLML'L
-r!■LgL,
L! z M;
bR iLLi:' -Life

*• i- i-Writ"*'4’ • • t"

.-'~—.-7 .- |

bbRbZ'b
zL
s LiL-i]
••• bLs
:IZ iM
ML-Qb
-■u* —r-RU-uz.'Z.

t - ! ;L

L.j./B- --B-B

•■. I L Ln
Li!
L
■|b g:L-t-Lb; ■

1 _.L_„

;•• ‘rIHz

7:';:1t;7.: 4

rr7-|-----

.1

OOEBk.
Oik

TZp- — ZZg

} ... 6
zz B..:.Lfeag®

TTz ziz: F7 ■|L; F: i*..
zgj;rBiFiB‘r

i

I

50.

6.4

Plotting the graph of a quadratic function

6.4.1

Let us now consider a quadratic function.

We have

seen that the general expression for this is:
a + b X + c X2‘
Y

This function has 3 coefficients, a, b and c.

We have already seen

above the particular quadratic function^

X2

Y

In this particular function,
=

a

0

b

0

c

1

Let us draw the graph of this function (see Figure 3).

Again we

shall choose an arbitrary domain for X, purely for illustrative

purposes:

-4

X

4

We may now draw up a table of corresponding values of Y and X, as
before in 6.2.1 above:
Function

Y = X2

Domain

-4 4X^4

X

Y

16
-3
-2
-1
0

6.4.2

9

4
1

1
2
3

0
1
4
9

4

16

These coordinates have been plotted in Figure 3.

reader will see the striking difference between this quadratic

The

51.

function and the linear function of Figure 2.

The curvilinear

nature of the function, Y = X2 is typical of quadratic functions.

The usefulness of non-linear functions will become clear to the
, in the discussion of the fitting of curves

reader in
to observed data.

Population growth, for example, when plotted on

a graph, often more closely resembles the right-hand portion of the
curve in Figure 3 than it does a straight line.
6.4.3.

A proper discussion of how to calculate the gradient

of the graph in Figure 1 requires some knowledge of the calculus,

a relatively advanced branch of mathematics which is not

discussed

In fact, the gradient of this

in this introduction •

function can be shown by the methods of calculus to be
2 X
positive when X is positive;
(1)
It increases in absolute/size as a constant

Thus, it is negative when X is negative;

zero when X is zero.

proportion of X.
X is

Thus, e.g., when X is 1, the slope is 2;

the slope is 7.

when

The "rate of change of Y with respect to

X” is always twice the value of X.

It is therefore no longer,

as in a linear function, constant and independent of X.

Exercise 12

The reader should now

6.5

Ao

Plotting graphs of observed data
6.5.1

Let us now illustrate the value of plotting

statistical data graphically.

(1)

Exercise 12 ,on page .7/,

As an example, Table 1 below shows

’•Absolute” means here the numerical magnitude ignoring the sign.

52.
a/”

gross primary level^ratios (both sexes) for three selected countries
for each year 1965-1973.

in primary

The gross enrolment

education is defined as the ratio of total enrolment in primary
education regardless of age to the population belonging to

the age group that, according to national regulations, should be
enrolled at this level.
Table 1
Gross enrolment ratios^ both sexes, for Ecuador, Iraq and Singapore,
1965-1973

Gross Enrolment ratio (%)
Year
Ecuador

Iraq

Singapore

1965

79.9

73.3

80.2

1966

79.8

77.7

88.1

1967

80.2

79.0

89.1

1968

79.3

77.9

83.7

1969

78.7

79.7

84.1

1970

80.2

80.5

89.3

1971

79.5

81.6

91.7

1972

78.9

83.1

91.2

1973

77.7

84.4

92.5

6.&1 Consider for example, the columns headed ’’Year” and ’’Ecuador”.
These columns of data may be interpreted as corresponding respectively
to the X and Y variables introduced above.

Thus it can be seen that

there are 9 pairs of observations (X, Y), i.e. (year, gross
enrolment,ratio), as follows:

53.

(1965,

79.9)

(1966,

79.8)

(1967,

80.2)

(1968,

79.3)

(1969,

78.7)

(1970,

80.2)

(1971,

79.5)

(1972,

78.9)

(1973,

77.7)

fcOAoXor x ck ft.
These are the coordinates for plotting^on the graph of Figure 4.

After plotting, they have been connected by straight lines, to aid
the eye.

They could, with equal justification, have been connected

with a freehand curve.

as a visual aid.

The purpose of connecting them at all is

It is one way of helping the analyst to make

interpolations between observed data points.

entvftxj .^have been constructed similar!^
verify thVxoaK.

6.5.3

The other two lines
The reader should
.

The graph is a valuable way of showing the general movement

of the enrolment ratios.

Simple, inspection of the data, as

arranged in tabular form in Table 1, does show clearly that there is
a general upward trend in rates in both Iraq and Singapore, and a

slight downward movement in Ecuador.

But the table cannot show

so clearly as does the graph of Figure 4 the details of the

movements^/or the relative gradients of the functional relationships
between enrolment and time.

The graph shows ”at a glance” several

features of the data, including:

- the similar rate of increase, taking the period as a whole^of

gross enrolment rates in both Singapore and Iraq
- the
1969-1970

decline of the rate in Ecuador, except for

54.

1

(D 4-

g:
W ~

>£
►X5 ®
—

tr
o
(D
X
o;
w£

Gt
3 P
(D p
03 r
O F

0 :

°E

»-h r

C

o
E
n> i

HE

0 t
Mi
cn
oj

3

55.

- the greater variability of the rate in Singapore than in
both Iraq and Ecuador

- the periods in which all three rates increased, or declined,
together
6.5.4

The reader will by now appreciate that these three lines

may be seen as three particular examples of the function:
E

=

(A6)

f (t)

where:

E is the symbol for the gross enrolment ratio

t is the symbol for time
Figure 4 shows that they are not linear functions,

The slopes of

the lines are not constant throughout the timerperiod.

Nevertheless

there are certain sub-periods in which they are very nearly

constant;

for example, in Iraq 1968-1973.

As a matter of fact.

real-world measurements of variables very rarely do produce exact
linear relationships.
T

not so reliable!

Human behaviour, and technical change, are

But very often observed relationships are

approximately linear, or may display anAonstant proportional rate
of growth (1).

It can often be assumed, especially for small

changes in independent variables, that, for all practical purposes,
/

they are in fact linear - and may

(1)

In which case a graph of the logarithm of the variable
plotted against time would display linearity.

66.

remain so in at least the near future.

But we are now approaching

discussion of the statistical methods which we can employ

the

to ’’fit” functions to the often rather scattered data we observe.

And that topic is reserved for

6.6

Ciales

Oh

Ten practical hints on drawing graphs

*

1.

Use proper sheets of graph paper if you can.

2.

Use as much of the sheet as practicable.

There is no

point in using just a half of it unless other considerations
dictate this.
3.

Look at the range of values of X and Y.

Decide what scale

would be best, taking into consideration the size of your sheet.
4.

Remember that different choices of scale can make the graph
appear to have very different slopes (though of course their

mathematical properties

are not affected).

There is no

general rule to adopt here, except perhaps that your graph

should be as informative as possible, and not in any way
potentially misleading.

5.

You don’t have to intersect the axes at (0, 0).

To do this

in many cases is simply to invite blank, empty spaces in
your completed graph - helping nobody to understand the

relationships between the variables (which, after all, is
the fundamental purpose of the graph!)
6.

Label the axes clearly.

7.

Use a pencil before inking-in.

You are almost bound to

make some errors.

8.

Do not put too many functions on a graph.

As a general

rule, more than 3 or 4 tend to confuse the eye, and hence

to hinder understanding.

57.

9.

Give your graph a clear, unambiguous title so that the
final user is left in no doubt as to what your graph is
showing.

10.

Quote the exact source(s) of data on which your graph is
based.

Exercise 13
The reader should now

d*

Exercise 13 , on page

58.

EXERCISES 1-13

pp. 59 - 72

Anslj&IS 1-13

pp. 73

92
*

EXERCISE 1

This exercise is designed to familiarise the reader with basic
concepts and operations of algebra.

Read Section 1.1 - 1.5 before

completing the questions.

1.1

A number is represented by x. Double it, add 29 to the result,
Write down the expression for the result.

1.2

The product of two numbers is a and one of them is w.

What is

the other?
1.3

What number must be subtracted from

to get b?

p
, which of the following expressions are correct, and
C

1.4

which incorrect:

P
A

i)

C =

ii)

(A + 100) =

iii)

(A - C) =

(P + 100)

c
(P ■ c2)

1.5

1.6

c

Eliminate the brackets in the following expressions:
i)

(a + b) (a + c)

ii)

(- x) (2x + y - 3)

iii)

(x - y) - 3 (cz, - x)

iv)

(x *
+ y - z) ((---”)

Find the factors of the following expressions, where possible:
i)

4 a2 + 2 a b

ii)

ax + ax

iii)

ax + by

iv)

x

3

- x

2

2

+ ax

cz

3

io

1.7

When a = 2, b s 3, calculate the numerical values of the
following:
i)

2
(a + b) (a^2 - b*)

ii)

/ 3
2
a (a - a )
ab

iii)

a

2

<0

EXERCISE 2
This exercise provides the reader with practice in solving simple
equations.

2.1

1.2

Read Section 1.6 before completing the questions.

A man is four times the age of his son.

In four years* time

he will be three times his son's age.

How old are they now?

Solve the following equations for the unknown x:
i)

3x + 10 = x + 20

ii)

4x
3

iii)

(2x - 2)
3

+ 11

5x
6

+ 69

(x + 1 )
6

3 (x + 2)

b2-

EXERCISE 3

Round the following numbers to the indicated accuracy:

Number

Round to nearest:

i)

89.3245

thousandth

ii)

89.3254

thousandth

iii)

7.299

hundredth

iv)

1.145

tenth

v)

27.6

unit

vi)

3.49

unit

vii)

150.001

hundred

viii)

326,000.0

hundred thousand

ix)

18,000,000

million

x)

18,500,000

ten million

3.2 Add the numbers 2.25, 6.95, 7.35, 2.15, and 4.55
(a) directly
(b) by rounding to the nearest tenth according to the
’’even integer” criterion

(c) by rounding so as to increase the digit before the 5

/J

EXERCISE 44

Zf.1

Tablebelow shows,in thousands, enrolment in Africa by
level of education, 1960 - 1975.

Calculate, for each year.

enrolment at each level as a percentage of total enrolment,
rounded to 1 decimal place.

Sua the percentages by level for

each year.

Table 4.1

q.2

Enrolment by level of education, 1960 ■> 1975
(thousands), Africa

Year

First level

Second level

Third level

Total

1960

19,391

1,740

180

21,311

1965
1970

26,534

3,058

306

29,898

33,817

4,905

471

39,193

1974

41,843

7,411

779

50,033

1975

44,243

7,812

865

52,920

Calculate (rounding to one decimal place)
i)

i% of 72.3

ii)
iii)

10.6% of 10.6
110% of 110

iv)
v)

50%
10%

vi)

0.02%

vii)

36%

viii)

in Grade 5
In a certain school, 80% of all students >
in a particular year were promoted from Grade 5 to Grade 6.
Of these students, 10% dropped out of Grade 6. What percentage

of
of

;100
10%

of ^4
of

7000 students

ix)

of all Grade 5 students eventually dropped out of Grade 6?
Two out of 3 students passed an examination, What percentage

x)

failed?
By how many percentage points did second level enrolment as a
percentage of total enrolment increase from 1965 - 1970?
(see Table

EXERCISE 5

Table y.

below shows public expenditure on education per pupil

(in U.S.

at current market prices) in Latin America, 1960 - 1974.

Calculate the percentage increases in expenditure per pupil.
to one decimal place, between:
*

i)

1960 and 1965

ii)

1960 and 1970

iii)

1970 and 1974
Public expenditure at current market prices on education
per pupil, U.S.#, 1960 - 1974, Latin America

Table

Year

per pupil

1960

57

1965
1970

77

1974

172

97

in thousands
Table -5^2 showsy^the (estimated) male and female populations of
Afghanistan in 1968 and 1975.
Calculate, rounding to two decimal places:
i)

the percentage growth over the period of the

ii)

male population
the percentage growth over the period of the

iii)

female population
the percentage growth of the total population

Table 5^2

Estimated male and femafe populations of Afghanistan,
1968 and 1975 (thousands)
1968

1975

Male

7448

8666

Female

6765

7999

t>s~

EXERCISE 6
What is the characteristic of the logarithm

6.1

i) 62.3

ii) 101.9

ill) 100

v) 72.9

vi) 0.03

vii) 1.01

ix) 0.0004

x) 310,000

of each oftheofollowing
numbers?
iv) 10.12

viii) 0.2

6.2 Are the following logarithms correct or incorrect?

Correct where necessary

i) log 113.3 = 2.0543

ii) log 612,000 = 4*7868
iii) log 0*0071 = 3.8513

iv) log 1.262 = 0.1100
v) log 1,001,000 = 6*0050

6.3

Are the following antilogarithms correct or incorrect?

where necessary:

i) antilog 1.5572 = 3.608

ii) antilog 2.6672 = 0.4647
iii) antilog 0.0010 = 1.200

iv) antilog 4.9999 = 9997.0
v) antilog 1.6990 =0.5

6.4

Calculate each of the following, using logarithms:

i) N = (121.4) (0.06)
0.114
ii) N = (2.721) (0.0071) (71)

iii) N = ( 21.6 ) ■? (0.002)
iv) N =

(20.83) (0.0003)

v) N =

37.2
16

Correct

bi
EXERCISE 7

Calculate each of the following, using logarithms:

7.1

2

i)

N = (0.039)

ii)

N = (17.3)

iii)

N =

5 (1.4)3

(0.21)4
(0.3)6

7.2

2 (199.3) 3

iv)

N = (17.6)

v)

N = (0.0006)

7

Calculate each of the following, using logarithms:

i)

N = (0.072)^
3

62.7

ii)

N =

iii)

N = the sixth root of 1524

iv)

N = the fifth root of 0.73

v)

N =

4
V<0-02) (0.13)

EXERCISE 8
8.1

Table 8,1 below shows the absolute numbers of sale and fenale

teachers in prinary education in Afghanistan in the years 1969

and 1974.

Calculate separately for:

1.) sales

ii ) females
iii ) total of males and females

the average annual percentage rates of growth over the period.
to one decimal place.
Table 8.1

Teachers at primary level, 1969 — 1974, Afghanistan

I '

8.2

1969

1974

Males

9606

14377

Females

1468

3215

Table 8.2 shows, for Oceania, the absolute numbers(in thousands)

of females enrolled at all levels of education in I960, 1970
and 1975.

Calculate, rounding to one decimal place:

1) the percentage growth of female enrolment over the period
ii ) the average annual rate of growth of female population over
the periods:

a ) 1960 - 1975

Table 8.2

b) 1960 - 1970
c ) 1970 - 1975
Female enrolment at all levels of education, 1960 - 1975,
Oceania (thousands)
1960

1970

1975

Female
enrolment 1465

1983

2184

EXERCISE 9

9.1

Female enrolment at first, second and third level of education

in Africa increased from 7.66 million in 1960 to 21.18 million in

1975.

i)

Calculate the average annual growth rate of enrolment

ii)

Assuming this rate were to continue unchanged after 1975,
during what year would female enrolment become 100% greater

than in 1975?
9.2

In a certain country, enrolment doubled in 10 years.

What

was the average annual rate of growth during the period, in
percentage terms?

(to one decimal place).

EXERCISE 10
10.1

Given the continuous linear function

Y

1 ♦ 4X

-4 < X

4

i) what is the range of values of Y ?
ii) what is the value of the function when X =

2.4?

iii) what is the value of X when Y = 0?
iv) what is the value of the function when X

6?

v) what is the value of the dependent variable when

the independent variable « 1?

10.2

Given the continuous quadratic function

1 + 4X2

Y

X $ 4

-4

i) what is the value of the function

=

when X

-4?

ii) what is the value of the function
when X =

+4?

iii) what is the value of the function
when X =

0?

iv) taking this function ;as a particular example of the
general equation of a. quadratic function :
Y

=

a

+

b

X

+ c x2

What are the values of a, b and c?
v) When Y s 0, what is the value of X?

7o

EXERCISE 11

11.1 DPlot the graph of the linear function

Y = 1 + 4X
4

ii)Use the graph to find the values of Y when X » + 3.3.

11.2

Show on your graph of

Y

1 + 4X

the intercept ’’a11 and the slope '’b11

71

EXERCISE 12

12.1 i) Plot the graph of the quadratic function
Y

= 1 + 4X2
-3

X

+ 3

ii) Use the graph to find the value of Y

when X =

12 .2

2.5

Show on your graph the intercept a«
b

changes with X •

Demonstrate how the slope

At which point is the slope at a minimum?

7i
EXERCISE 13
Table 13.1 gives the total number of pupils enrolled in

13.1

primary education in Cuba over the period 1968-1974, and their

distribution^
Table 13.1

urban and rural schools.

EnroIment in primary education, total? urban and rural,

Cuba, 1968-1974

i)

Year

Urban

Rural

Total

1968

811,966

585,745

1,397,711

1969

864,370

601,916

1,466,286

1970

926,240

631,905

1,558,145

1971

994,693

669,941

1,664,634

1972

1,053,549

705,618

1,759,167

1973

1,119,961

732,753

1,852,714

1974

1,150,884

748,382

1,899,266

Draw up a new table, with columns showing the absolute data

rounded to the nearest 1000 pupils, and columns expressing urban
and rural enrolment as percentages of the rounded totals in each

year.
ii)

Round the percentages to one decimal place.
Using the data in your table, calculate the percentage growth I*

absolute urban, rural and total enrolment over the whole period,

rounded to one decimal place.
iii)y<cal3ulate the average annual rate of growth of urban, rural

and total enrolment over the whole period, to one decimal place.

13.2 i) Draw a graph of total, urban and rural enrolment over the

whole period, using data rounded to the nearest 1000 pupils
which you have calculated in 13.1 i) above.
ii) If your table and graph were to be included in a report written
by you, to which features of the data would you draw your readers'
attention?

72

ANSWERS TO EXERCISE 1
1.1

(2x + 29)
4y

1.2

Let the number required be v.

We know that v w = a.

Hence

a
v = —
w
1.3

We know that (z - a) = b

Let the number required be a.
Hence a = (z - b).

1.4

i.)

Correct.

P
Multiplying both sides of the formula A = £■
Dividing both sides by A, we obtain

by C, we obtain AC = P.
P
C s A

1.5

U)

Incorrect.

lii)

Correct•

i)

A

A

100 =

P
C

P
C

— C =

- C =

* 100

s

(P + 100C)
C

(P - c2)
C

(a + b) (a + c) = a(a + c) + b(a + c)

= a

2

+ ac + ba + be

Note that exactly the same result could be obtained by

multiplying the first bracket by each term in the second:
(a + b) (a + c)

a(a

= a

b) + c(a

b)

2 + ab + ca + cb

For ab = ba, ca = ac, and cb = be.

The order in which a number

of factors are multiplied does not affect the product;
the order in which numbers are added is also immaterial.

ii)

(-x) (2x * y

2
3) = -2x‘ - xy + 3x

Note carefully the signs.

iii)

(x - y) - 3 (z-x) = x - y - 3z + 3x
=

iv)

(x + y - z)

(-1)
X

= - 1

- y - 3z

x

+ £
x

and

1.6

1.7

i)

2a (2a + b)

ii)

2
ax (1 + x + x )

iii)

No factors

iv)

x

i)

(a + b) (a

2

(x - 1)
2 -b2)

= (5) (-5) =

-25

ii)

a(a^ - a2)
ab

8
6

iii)

a

2

+ 2

4
3

ANSWERS TO EXERCISE 2

Let the age of the father at present = x.

2.1

x
4*

therefore now

In four years' time 5 we are told that:
4)

(x + 4)
3x
4

x + 4

; chatty

2zz

32

x

<

12

3x + 48

. • 4x + 16
t

His son's age is

That is, the father's age at present is 32, and his son's age

is 8.

2.2

3x + 10 = x + 20

i)

• * 2x

10

x

5

4x
3

ii)

5x
6

+ 11

69

Multiplying both sides by 6,
414

8x + 66 = 5x +

iii)

. K

3x

348

%

x

116
(x + 1)
6

(2x - 2)
3

3(x +2)

Multiplying both sides by 6,
2(2x - 2) + (x + 1)
. . 4x -

4

x + 1

, -13x

39

x

-3

18 (x + 2)

18 x + 36

yb

TO EXERCISE Z

Results of rounding:

^.1

i)

89.324

ii)

89.325

iii)

7.30

9

1.1

v)

28
3

vii)

200

viii)

300,000

3.2

ix)

18,000,000

x)

20,000,000

(a)

2.25

(b)

2.2

(c)

2.3

6.95

7.0

7.0

7.25

7.2

7.3

2.15

2.2

2.2

4.55

4.6

4.6

23.15

23.2

23.4

Note that method (b) is superior to method (c).
rounding errors are minimised by method (b).

Cumulative

ANSWERS TO EXERCISE tj

A-1 Table 7^.2

Enrolment by level of education, 1960 - 1975,
percentage of total, Africa

First level

Second level

Third level

Total

(%)

(%)

(%)

(%)

1960

91.0

8.2

0.8

100.0

1965

88.7

10.2

1.0

99.9

1970

86.3

12.5

1.2

100.0

1974

83.6

14.8

1.6

100.0

1975

83.6

14.8

1.6

100.0

Year

Note that the total for 1965 adds to 99.9%,due to errors in rounding.
summed to

The figure should be displayed as correctly

q^9%It should not be presented, wrongly. as 100.0%.

4.2

i)

0.005

x

72.3

0.3615

ii)

0.106

X

10.6

1.1236

iii)

1.1

x

110

121.0

iv)

0.5

x

100

50.0

v)

0.1

x

10%

1.0%

#64
vi) 0.0002 x
vii) 2520 students

^0.0128

8.0%

ix)

6 ’ i) X

80%
100%

x)

Percentage point increase

=

viii)

0.1

x

percentage points.

0.4
3

1.1

#0.0

33.3%

(12.5 - 10.2)

2.3

It

I

ANSWERS TO EXERCISES^

I
i)

(77-57}
57

X

100%

35.1%

ii

(97-57)
57

x

100%

70.2%

ill)

(172-97)
97

x

100%

77.3%

i

I
I
|

I

*

I
I

^2

i)

(8666-7448)
7448

x

100%

ii)

(7999-6765)
6765

x

100%

18.24%

iii)

(16665-14213)
14213

x

100%

17.25%

s

16.35%

I

I

I

I
I

I
I
i

I
I

I
I

I
I
I
I
I
I
I
|

I

79

ANSWERS TO EXERCISE 6
6.1

6.2

i) 1

ii) 2

iii) 2

iv) 1

v) 1

vi) 2

vii) 0

viii) 1

iv) 4

x) 5

The correct answers (using logarithms to the base 10) are:
i) log 113.3 = 2.0543

ii) log 612,000 = 5.7868
iii) log 0.0071 = 3.8513

iv) log 1.262 = 0.1011
v) log 1,001,000 = 6.0005

6.3

The correct answers are:

i) antilog 1.5572 = 36.08
ii) antilog 2.6672 = 0.04647
iii) antilog 0.0010 = 1.002
iv) antilog 4.9999 = 99970.0

v) antilog T.6990 = 0.5
i) log N = log 121.4 + log 0.06 - log 0.114

6 •<

log 121.4 = 2.0842

(+) log 0.06

= 2.7782

0.8624
(-) log 0.114 = T.0569
log N.

•

- 1.8055

N = antilog 1.8055= 63.9

ii) log N =

log 2.721 + log 0.0071 + log 71

= 0.4348 + 3.8513 + 1.8513
0.1374
N = antilog 0.1374 = 1.372
iii) log N = log 21.6 - log 0.002

= 1.334$*- 3.3010

= 4.0335
J, N = antilog 4.0335
iv)

log N = log 20.83

=

10800.0

log 0.0003

= 1.3187 + 4.4771
= 3.7958

9 • N = antilog 3.7958
v)

=

0.006248

log N = log 37.2 - log 16

= 1.5705

1.2041

= 0.3664
» N = antilog 0.3664

=

2.325

Si
ANSWERS TO EXERCISE 7
7.1

i)

log N = 2 log 0.039

= 3.1822

.\ N = antilog 3.1822
ii)

=

0.001522

3 log 1.4

log N = 5 log 17.3

= 6.6283
» » N » antilog 6.6283
iii)

=

4249000.0

log N = 4 log 0.21 - 6 log 0.3

= 0.4262

N = antilog 0.4262
iv)

=

2.668

3 log 199.3

log N = 2 log 17.6

9<>3895
a »

v)

N = antilog 9.3895 = 2.452 x 10 9

log N = 7 log 0.0006

= 23.4474
A N = antilog 23.4474 = 2.802 x 10
7.2

i)

-23

log N = -J log 0.072

= I (2.8573)

=

1.4287

. , N = antilog 1.4287 = 0.2683
ii)

log N =

log 62.7

= | (1.7973)

0.5991

b N = antilog 0.5991

a

iii)

=

log N =

=

log 1524
•g- (3.1829)

=

N = antilog 0.5305
iv)

log N =

3.973

i0.5305
=

3.392

log 0.73
(T.8633)

(5.+ 4.8633)

=

= '1.9727
«

N = antilog 1.9727

=

0.9391

&L

v)

log N =

(log 0.02 + log 0.13)

= i (2.3010 + T.1139)

= | (3.4149)

=

i (4 + 1.4149)

= 7.3537

N = antilog 1.3537

s

0,2258

ANSWERS TO EXERCISE 8

8.1

K

Pn

5
14377

po =

9606

n

we have:

5 14377

rm

9606
1.084

►

■

from para.

using formula

i) males:

rn.

1
1

0.084

» »

8.4%
ii) females:

rf

= 16•9%

iii) males and females:
rt = 9.7%
‘\a Ark\ $

Note thatyrthe average annual rate of growth of the total, r^ , does
not equal the (unweighted) average, or arithmetic mean, of the male
(ra) and female (r^) rates:

rt X (r° * r_O
Swathe male and female populations ^comprise different proportions of the

total.

8.2

i )

(2184-1465)
1465

x

100% =

49.1%

ii) Using formulaeas in 8.1 above, by substitution of

appropriate values of n, Pn and Po,
r = 2.7%
a )

b )
c )

r = 3.0%
r = 2.0%

ANSWERS TO EXERCISE 9

9.1

Using formula (ZX ) from para^fZfct where:

i)

n = 15

Pn = 21.18
P

o

= 7.66

we may calculate r = 7.0 %

Using formula GA4 ) from para

ii)

n =

I

108 ft)

log (1+.070)

Pn

but

=

2 (i. e., enrolment is 100% greater in year n than

P

o
in the base year 0)
hence:

log 2
log 1.070

n =

0.3010
0.0294

10.24 years

Thus female enrolment would become 100% greater than 1975 during
the year 1985.

9.2

) from para^.7.$ where:

Using formula
n

P
and __n
P
o

10

2

we have:

r

10

2

. • r

0.07a

, % r

7.2%

1

ANSWERS TO EXERCISE 10

j
10 .1

i) Y ranges in value fron a aaximun

when X = 4,

for then:

Y = 1 ♦ 4 (4)

17

to a minimum when X = -4, for then:
Y = 1 + 4 (-4) = -15

ii) Y = 1 + 4 (2.4) = 10.6

iii) 0 = 1 + 4X
-0.25

X

1 '
iv)

This is something of a trick question!
The answer is that Y has no value, i.e. the function

i

I

is undefined.
Y

only has values corresponding to the domain of X:

4

-4
v)

10.2

the dependent variable

i) Y = 1 + 4 (-4)

is

Y;

when

X = 1, Y = 1 + 4 (1)

2 = 65

2
ii) Y = 1 + 4 (t-4)' = 65
2
1
iii) Y = 1 + 4 (0)
iv) a = 1,

I

L

L_

b = 0,

c = 4

v) X =. "■i , which is not a'real”number
4
the square root of a negative number.

for we cannot find

5.

Zb
ANSWERS TO EXERCISE 11

11.1

i)

As the function is linear, it is necessary to plot

only two points (coordinates).

A line passing through the points

will define the function Y = 1 + 4x.

Choosing, arbitrarily,

two points in the domain of X, we find that when X = 2, Y= 1 + 4(2) = 9;

and when X =*3, Y = 1 + 4(-3) = -11.

Hence passing a line through

the two points (2, 9) and (-3,-11) we have the required gTlph

(see Fig. 5).

Note that Y ranges from +17 (when X = 4) to -15

(when X = -4).

ii)

If you have drawn your graph accurately, you should find that

a line drawn vertically from the X axis through X = 3.3 will cut

the line Y = 1 + 4x where Y = 14.2.

11.2

Similarly when X = -3.3, Y = -12.2

On Figure 5 the intercept a may be seen to be the distance

between the origin, (0,0) and the point (0,1).

Hence a = 1.

The slope b is the gradient of the line.

It is the ratio of the

increase in Y to a unit increase in X.

In the triangle ABC, it is

XU

X-

EC

the ratio -To
AB

+4

27
II'H IUU ffmWUUlRtffi

wmIMI
«=?■

safasaafagte
(

i

(

I

:=

J

hJ

!H=::s=hi=!

ANSWERS TO EXERCISE 12

12.1

In Exercise 11.1 (i), only two points were necessary to plot

the (linear) function.

With a non-linear function, however, more

than two points must be plotted in order to be able to draw in
the curved line passing through them.

We proceed by finding

which values of Y correspond to a selection of values in the domain

of X:
X

Y

-3

37

-2

17

-1

5

0

1

1

5

2

17

3

37

We now have 7 coordinates and may plot these against a pair of

axes (see Figure 6).

Note that as there are no negative values of

Y, a negative Y axis is unnecessary.

The curve (a "parabola11 may

now be drawn through the points.

ii)

Lines drawn vertically through X =
2
where Y = 1 + 4 (2.5) = 26.
12.2

2.5 should cut the curve

The intercept a can be seen to be where the curve cuts the

Y axis, at the point (0, 1);

hence a = 1.

can be seen to change continuously.

The slope of the curve

From the point (-3, 37) it is

negative so long as X is negative, and declines in absolute terms

until it reaches a minimum of zero when X = 0.

It then becomes

positive and increases with X to a maximum when X = 3.

...... g fgUHin
gig
-rpft -jif* *^’

Eg ^es we- EeB ga

BhBeBt..........

gl

IB
zjfe ^WMgW
u pHOOm+OT tkifefe iidptMiMW
OHfefe kfe-Etn4-100 1
twl
pf Pi pfipH- few gpiiO QQBpplii
gpH
■~H~ ■* *•*■’+’+’+- '*'*'■" W~-!( Et ,

' kit -Hr' —-'-hT*-'- 4-W fE-—H+t-* '■ -ftr? I ? ’• »n‘t . i ,

-k-M-rl; 4-71. r ;4 ktUU4t M4 M4rtTtTki+r-4T^L4[; .k [t

m±in±i

^-thr-44

11 nO

w»pW-tn

Hrrrfe
zrn

•ig. iBOBBsOMOBMfeOBfeL^WS

g;

—ft—;—

----

feHr-nfe
■BeBhhSB

7ILT

O»0i§liwgg
BB
mu
IBMfct IjWCTBwMWwMHb
iilBHr
SSSffiB'*14
11 SmffSfHgjB
og g B o bPP iOMiSwl
sa
mH 1H £1 ng tHSt§ g-M gp - OtO

'::::.w i Etti~ h * t'
-rri | |

U 7i I

gH

mz Twri-kwu

■LieT ‘514:
d±t ±m jl^lt rH+W

7 _ I. L.J1- _LI 1..: l.kO

--77i—

gagHE^gplI LHtfiSHSHw

4

HH i

Sit

.~:-77

|S2"ttff 44k T~t"H~ H; ~rti~ttfT t

pg igOBB^gg

w

:#ftW
^.-.•-.w? -.IB [W L -L'1
4H
ItBOpEE
uf u

17117111

g&

OiSBOBOB

±4few

—Or—kTk gWdWW Wikitt tS,LHU TtP

l<gte;

g ip BOB

wlBii

UMkjgEgg

£P

•E E7 |4r EEp
HE
BBWBraHraajr

SSn ■

B
iS
MO
E§agQ£BB gffigT

S£ j1

O EkWiEEH'rr
llUlni BBgii
j
ttrr
H
. .^...PiiOBkfkiSBW
a^pa
i~ j
^■iiii
iHliHl 1Peeeww^EOwO^BwOOggsgis

r

■HE: HF

—-i--—.7.

g

■

wp B' p -pfefeft B -

fe"47^

j|g 44 41 sf 44 74 jOp ~q4§ e
SggfiW&iOa
W-'
w
B BPO
O
Ib pw
EiSfill wts < woiiwB
m
n iTnOdpHrni
pmUOfefei- PMlirfe^w
.’ itir-S

eBBOee
iM*«aaaaika«M
IZZIaasZaB* n«B6

Oxfe
’T^ng;

K

a.!I

^4- 4

il

H
i!>rirn :,.U4.!%yinn
': ■ ■11 - i-r —wLOP pOOfeHH LO-pr r.nuk: gOrS tern Id ~ :
Shr r:E4t~ Z ZZ" t~

4-144- -P-k tk-^pgrt +477

hJ

■.•^^7

3

30$

gq
^ieBpig
jJkrrrr- r:rr .rTry-r;-g;.unxr.ir---^--p-Hj‘*^1.T1J i^T
n >11®' ■
Og
B
Ex
Bfe
tniijS
L
Hag
Exiiqg
jEiHg:
^7
wWWf wL:::.i±
O
^.L.CpC.^-pk-i: nflk-H-k^y-rtltn

i

PEE: T^ SE —E - ; 7E
WjgfBBflSs
-*4-r

e

:

BBgfWteSE
H44xttk4tE^Hr

4-rH-

•0

eh ee

ee

1

>TlttT',y

■ -

EpIJIBbIII

FpBSfeE

H tSJi

o

g

■

47£-;-:3-~
c TiH+Oi

ISsHM^Offgte oyo

yteg
I
Hww^sfeg
gOBSww
fmitaiB
wodSHgfclllps ^^Sgftgsg
fefiSgS5s iafi
nSw
IBBW- wi
w!i
"'^■WgjggBl...........
SBrfSigBOlp BStiteBWQiglfektfflllllllil
liPlwwfts aggMBEg
£

•£rr Ht'ES

rTTrTTl^£riniliiriril.-TiJl-

tfIw
trr

>4+4 Pmlti •fr’-r+rt; 'ri+44.4rHi

7: -

§g ffiSE Wffig S7 B

ete-? BfF Ekn; Syn

' **1: ?'.’Tr” ' '1 r'b^^-*Sw£*7r ttc ip-F •

:::::::::::

feggS
£gig

3WE3E5i>sBOB
dnlM' WlkiETEp

itaaaa

iaaa«
taaai

:::::
■aaai
wq:

■„iL:._r:z7_ 44
OEEWEEgWEBEI
HlsS&B

■saiii
w
■HHJjljp®
^i!il
liiaiigllfiotri
h

■■Ml

■■■■I

-wEest

■■■ii

-tr-rrt rrrT-: Cirri grr1! g-T. fxt

HESE^EHEgL;^

ffiffiTflrn II 11 iErHTuWrHi44Hr44iUH

H4ttm.

EMI

An:::::nK3::d::
MMMMWHfCMI

ISSSggs

li|||j|Bi
ita^B
w
■Illi
jpglgfcfcgteHwaaaaa i 11! HI lOHiifl W W w
Siisfffiiiili

SB iQWBftwfflfe

•■aaaaaaw■■••aaaaaat

MMMOOMaMMMMt

th i: i r i h i

H iH? t-1

===!

_........ g::,g- ~:.t:!g:

a
swasggnmg;
IEssSSSS^sSSKSsS1

_________ IH fekffBEW k&

jgg

. ?7IH1.T}4T Tit- -H-R- 4-U+- ■‘4-4- -T-t -i-iH- -i-t-f E-u- j-H-t
wi——4-777 THt nUtilH'OrLTF^T+w
-k--{.;.!
-. '--j1 — ’■ :♦+rr
—J.-.
. .i,.u
-rr+rr+Tr
ItH
l£,i.L’1r—.. - - tTcik [ill mil 7477tntgt-ir

■ jmlUkk -‘t-

1

WE

BH

liiwWl
n li|!
4

.Jllp

1
ft-SBh
wwwwp
i Hr’*H| i

v-1p .jt

mgpwfe O-rtid-Lt

BBOwew

HfH
I Hi l-H IHI tt-HHiffffH IH >HH
1 .1 i 4 Ld 1: fJ i 1 !_Li -1-1 ~1.
T

«8

fessi kilti

Bpgmp

EE

ElEE

^0

ANSWERS TO EXERCISE 13

13.1 i)

After rounding to the nearest 1000 pupils and then calculating

the urban and ruraly^data as percentages of the rounded totals, your
table should be as follows:

Table 13.2

Enrolment in primary education^ total, urban and rural,

in thousands and as percentages of total, Cuba, 1968-1974
Total

Rural

Urban

Year

(ooo)

(%)

(ooo)

(%)

(ooo) n (%)

1968

812

58.1

586

41.9

1,398

100.0

1969

864

58.9

602

41.1

1,466

100.0

1970

926

59.4

632

40.6

1,558

100.0

1971

995

59.8

670

40.2

1,665

100.0

1972

1,054

59.9

706

40.1

1,759

100.0

1973

1,120

60.4

733

39.6

1,853

100.0

1974

1,151

60.6

748

39.4

1,899

100.0

Note that in 1972 the rounded urban and rural figures do not,

because of rounding errors, sum to the rounded total of 1759.
ii)

Percentage growth in urban enrolment, 1968-1974 =

= 41.7%.

(1151-812)100 *
812
Z

Percentage growth in rural enrolment, 1968-1974 =

(748-586)100
% = 27.6%. Percentage growth in total enrolment,
586
(1899-1398)100 0/
35.8%
1968-74 =
1398
iii)

Average annual growth rate of urban enrolment, 1968-1974 »

•%%

Average annual growth rate of rural enrolment, 1968-1974 = 4.2%

Average annual growth rate of total enrolment, 1968-1974 = 5.2%

13.2

i)

See Figure 7.

Note the choice of scale.

Read the hints on

drawing graphs in 6.6

ii)

A report on the development of primary level enrolment in Cuba

over the period 1968-1974, using the

datx

would bring out at least the following features:

given in Table 13.X,

9!
- the fact that both urban and rural enrolment grew continuously

with no period of decline (i.e. enrolment grew '’monotonically”),
with the consequence that total enrolment also grew continuously

- the fact that, as the graph shows, there was a decline in
the rate of growth of both urban and rural (and hence total)

enrolment in the final period 1973-1974•
- the fact that urban enrolment grew more rapidly than rural.

This may be illustrated by your calculations in answering
13.1 ii) and iii), and can be seen in the greater positive
slope of the graph of urban enrolment over time than rural

enrolment.

H OB

-.n ,r,

apgf

OB m gr
t■

°

;

^satesas^sitsis

^BgSBafe
. . _____

Wliil

*

mini

■•

w
Trrr
IBUUMi
(■■■•ei

1
fHHSft VJ

w:

isS i •

ggi
tffi
==2SS

I

,i

! 1

■:

r v .-.

!«!==

0

1

I 2 I 3

4

5

0

7

8

9

12 3'4 5 0

0

7 8 9

5913 i 172126 303438
5 9i
0212 0253 0294 0334 0374 4812(162024 28 32 36
4812 16
1 20 23 273135
11 0414 04531 0492 053* 0569
0607 0645 0682 0719 °755 47 x£ IS
1 l8 22 26 29 33
37 xx 14
! l8 21 25 28 32
12 0792 0828 0864 0899 0934
_____
273*
0969 1004 1038 1072 1106 3 7 10 :14 17 20 24
36 10 13 l6 19 2326 29
13 1139 1173 1206 1239 1271
1303 >335 1367 1399 X43° 3 7 10 13 16 19 :22 25 29
36 9 12 15 19 22 25 28
14 1461 1492 1523 1553 1584
1614 1644 1673 1703 1732 36 9 12 14 17 20 23 26
36 9 II 14 17 20 23 26
15 1761 1790 1818 1847 1875
1922 25
1903 1931 1959 1987 2014 36 8 11X4X7
36 8 II I4 16 19 22 24
16 i|2O4i 2068 2095 2\22 2148
2175 2201 2227 2253 2279 35 8 1013 16 18 21 23
35 8 1013 15 18 20 23
-i7-|i ’2304 2330 2355 2380 2405
17 20 22
2430 2455 2480 2504 2529 35 8 IO 12 15
25 7 9*2 14 17 19 21
2553 2577 2601 2625 2648
2672 2695 2718 2742 2765 24 7 911 *4 16 18 21
24 7 9*1 13 16 18 20
2788 2810 2833 2856 2878
2900 ?9?3 2945 ___
2967 2989 246 8ll 13 »5 x? >9
20 3010 3032 3054 3075 3096 3*18 3139 3160 3181 320X 246 8ll 13 15 »7 X9
14 16 18
21 3222 3243 3263 3284 3304 3324 3345 3365 3385 3404 24 6 8 10 12
3598 24 6 8 10 12 14x5 »7
22 3424 3444 3464 3483 3502 3522 3541 356o 3579 ...
. 3655 3674 3692
23 3617 .3636
. . 37X1 3729 3747 3766 3784 24 6 7 9ii 13x517
12 14 16
24 3802 3820 3838 3856 3874 3892 3909 3927 3945 3962 24 5 7 9*x
4116
X2 14 »5
7
9*o
4048
4065
4082
4133 23 5
4099
25 3979 3997
3997 4014 4031
26 4150 4166 4183 4200 4216 4232 4249 4265 4281 4298 23 5 7 8 10 II 13 15
13 i4
.... 4409 4425 4440 4456 23 5 689 ix
27 4314 4330 4346 4362 4378 4393
689 1112 14
45’8 4533 4548
_
28 4472 4487 4502
... 4564 4579 4594 4609 23 5
29 4624 4639 4654 4669 4683 4698 4713 4728 4742 4757 13 4 6 7 9 1012 13
30 4771 4786 4800 4814 4829 4843 4857 4871 4886 4900 13 4 6 7 9 1011 13
81 4914 4928 4942 4955 4969 4983 4997 5011 5024 5038 X3 4 678 10 11 12
12
32 5051 5065 5079 5092 5X05 5”9 5132 5145 .5X59
.. 5X72 13 4 5 7 8 911
910 12
33 5185 5198 5211 5224 5237 5250
. . 5263 5276 5289 5302 13 4 5 6 8
910 11
34 5315 5328 5340 5353 5366 5378 5391 5403 5416 5428 13 4 5 6 8
910 11
5
6
7
124
5478
5551
5502
35 5441 5453 5465
5514 5527 5539
5490
36 5563 5575 5587 5599 5611 5623 5635 5647 5658 5670 12 4 5 6 7 8 10 11
8 9 10
37 5682 5694 5/05 5717 5729 5740 5752 5763 5775 5786 12 3 5 6 7
8 9 10
38 5798 5809 5821 5832 5843 5855 5866 5877 5888 5899 X2 3 5 6 7
39 59X1 5922 5933 5944 5955 5966 5977 5988 5999 6010 12 3 4 5 7 8 9 10
40 6021 6031 6042 6053 6064 6075 6085 6096 6107 6117 12 3 4 5 6 8 9 10
41 6128 6138 6149 6160 6170 6180 6191 6201 6212 6222 12 3 4 5 6 789
42 6232 6243 6253 6263 6274 6284 6294 6304 6314 6325 12 3 4 5 6 789
c. : X2 3 4 5 6 789
- _ 6425
43 6335 6345 6355 6365 6375 6385 6395 6405 6415
789
44 6435 6444 6454 6464 6474 6484 6493 6503 6513 6522 X2 3 4 5 6
45 6532 6542 6551 6561 657X 6580 6590 6599 6609, 6618 12 3 4 5 6 789
7 7 8
48 6628 6637 6646 6656 6665 6675 6684 6693 6702 6712 12 3 4 5 6
47 16721 6730 6739'6749 6758 6767 6776 6785 6794 6803 12 3 4 5 5 678
48 ; 6812 6821 683O,6839 6848 6857 6866 6875 6884 6893 12 3 4 4 5 678
6946 6955 6964 6972 6981 12 3 4 4 5i 6 7 8
49 || 69c102 6911 6920'6928

10 I 0000 0043 J10086 0128 0170

ia

1

1

1

2

3

4

fl

6

7

8

9

123 450 789

7050 7059 7067 x 23 345 678
50 6990 6998 7007 7016 7024 7033 7042
. .
51 7076 7084 7093 7101 7110 7118 7126 7135 7143 7x52
. 123 345 678
52 7160 7168 7177 7185 7X93 7202 7210 7218 7226 7235 122 345 677
.. 7308 73x6 12 2 345 667
53 7243 7251 7259 7267 7275 7284
. . 7292 7300
54 7324 7332 7340 7348 7356 7364 7372 7380 7388 7396 12 2 345 667
55 7404 7412 7419 7427 7435 7443 745 x 7459 7466 7474 122 345 567
50 7482 7490 7497 7505 75X3 7520 7528 7536 7543 7551 122 345 567
7582 7589 7597
57 7559 7566
.... 7604 7612 7619 7627 12 2 345 567
.. 7574 ..
58 7634 7642 7649 7657 7664 7672 7679 7686 7694 7701 11 2 344 567
59 7709 7716' 7723 7731 7738 7745 7752 7760 7767 7774 11 2 344 567
344 566
60 7782 7789 7796 7803 7810 7818 7825 7832 7839 7846
.. 7917 11 2 344 566
61 7853 7860 7868 7875 7882 7889 7896 7903 7910
63 7924 7931 7938“ 7945 7952 7959 7966 7973 7980 7987 11 2 334 566
63 7993 8000 8007 8014 8021 8028 8035 8041 8048 8o55 11 2 334 556
64 8062 8069 8075 8082 8089 8096 8102 8109 8116 8122 11 2 334 556
65 8129 8136 8142 8149 8156 8162 8169 8176 8182 8189 11 2 334 55*
66 8195 8202 8209 8215 8222 8228 8235 8241 8248 8254 11 2 334 556
67 8261 8267 8274 8280 8287 8293 8299 8306 8312 8319 l X 2 334 556
68 8325 8331 8338 8344 8351 8357 8363 8370 8376 8382 I I 2 334 456
“
8445 I I 2 234 456
69 8388 8395 8401 8407 8414 8420 8426 8432 8439
8494
8500
8506 I I 2 234 456
8457
8463
8470
8476
8482
8488
70 8451
‘ ‘ I I 2 234 455
71 85X3 8519 8525 8531 8537 8543
... 8549 8555 8561 8567
73 8573 8579 8585 8591 8597 8603 8609 8615 8621 8627 112 234 455
73 8633 8639 8645 8651 8657 8663 8669 8675 8681 8686 I I 2 234 455
74 8692 8698 8704 8710 8716 8722 8727 8733 8739 8745 112 234 455
75 8751 8756 8762 8768 8774 8779 8785 8791 8797 8802 I 1 2 233 455
76 8808 8814 8820 8825 8831 8837 8842 8848 8854 8859 I I 2 233 455
77 8865 8871 8876 8882 8887. 8893 8899 8904 8910 8915 I 1 2 233 445
_ . 8932 89^8 8943. 8949 8954 8960 8965 8971 I 1 2 233 445
78 8921 8927
79 8976 8982 8987 8993 8998" 9004 9009 9015 9020 9025 1 1 2 233 445
80 9O31 9036 9042 9047 9053 9058 9063 9069 9074 9079 1X2 233 445
81 9085 9090 9096 9101 9106 9112 9117 9122 9128 9x33 I I 2 233 445
83 9x38 9’43 9149 9X54 9X59 9165 9x70 9175 9180 9186 I I 2 233 445
. . 9238 I 1 2 233 445
83 9x91 9196 9201 9206 9212 9217 9222 9227 9232
84 9243 9248 9253 9258 9263 9269 9274 9279 9284 9289 I 1 2 233 445
85 9294 9299 9304 9309 93>5 9320 9325 933° 9335 9340 1 I 2 233 445
86 9345 9350 9355 9360 9365 9370 9375 938o 9385 9390 1 I 2 233 445
,9440 Oil 223 344
87 9395 9400 9405 9410 94X5 9420 9425 9430 9435
88 9445 9450 9455 9460 9465 9469 9474 9479 9484 9489 O1 1 223 344
89 9494 9499 9504 9509 95X3 95*8 9523 9528 9533 9538 OX I 223 344
90 9542 9547 9552 9557 9562 9566 9571 9576 9581 9586 Oil 223 344
9605
91 9590 9595 9600
.
. . 9609 9614 9619 9624 9628 9633 OKI 223 344
93 9638 9643 9647 9652 9657 9661 9666 9671 9675 9680 OI I 223 344
9727 Oil 223 344
..
93 9685 9689 9694 9699 9703 9708 9713 9717 9722
94 973 x 9736 9741 9745 9750 9754 9759 9763 9768 9773 Oil 223 344
95 9777 9782 9786 9791 9795 9800 9805 9809 9814 9818 011 223 344
9850 9854 9859 9863 Oil 223 344
.....
96 9823 9827 9832 9836 9841 9845
" 9903 9908° OK I 223 344
97 9868 9872 9877 9881 9886 9890 9894 9899
98 99X2 99*7 9921 9926 9930 9934 9939 9943 9948 9952 Oil 223 344
99 9956 996i 9965 9969 9974 9978 9983 99871 9991 9996 Oil 223 334

6937

The reader is probably aware that more detailed logarithm tables exist. Had such more

detailedtable^been used in the calculations included in this chapter, slightly different
answers may have been obtained.

f

1

i

4

/bv77 4.<M
0

1

2

3

4

5

0

7

8

9

01
•02
•03
•04

1000 1002 1005 1007 1009 1012 1014 1016 1019 1021 001
1023 1026 1028 1030 *033 ’035 1038 1040 1042 ’045 00 1
1047 1050 ’052 1054 1057 ’059 1062 1064 1067 1069 00 1
1072 ’074 1076 1079 1081 1084 1086 1089 1091 1094 00 i
1096 1099 1102 1104 1107 1109 1112 1114 1117 1119 01 1

05
•06
•07
•08
•09

1122
1148
”75
1202
1230

•10
•11
•12
•13
•14

”59
1288
13’8
’349
1380

•15
•16
•17
•18
•19
■20
•21
•22
•23
■24

’4’3
’445
’479
’5’4
’549
’585
1622
1660
1698
’738

•25
•26
•27
•28
•29

1778
1820
1862
’905
’950

•30
•81
•82
•83
•34

’995
2042
2089
2138
2188

•35
•86
•37
•88
•89

2239
2291
2344
2399
2455
2512
2570
2630
2692
2754
2818
2884
295’
3020
3090

00

•40
•41
•42
•43
44
45
•46
•47
•48
•49

”25 1127
”5’ ”53
1178 1180
1205 1208
”33 1236
1262 1265
1291 ’294
’321 ’324
’352 ’355
’384 ’387

”30
1156
”83
1211
”39
1268
1297
1327
’358
’390
’4’9 1422
MS2 ’455
i486 ’489
1521 ’524
’556 1560

1416
»449
’483
>5’7
»552
’589 ’592 ’596
1626 1629 ’633
1663 1667 1671
1702 1706 1710
’742 ’746 ’750
1782 1786 ’79’
1824 1828 1832
1866 1871 ’875
1910 ’9’4 ’9’9
’954 ’959 ’963
2000 2004 2009
2046 2051 2056
2094 2099 2104
2’43 2148 2’53
2’93 2198 2203
2244 2249 2254
2296 2301 2307
2350 2355 2360
2404 2410 24’5
2460 2466 2472
2518
.
2523 2529
2576 2582 2588
2636 2642 2649
2698 2704 2710
2761 2767 2773
2825 2831 2838
2891 2897 2904
2958 2965 2972
3027 3034 3O4»
3097 3’05 3”2

0

123 456789
11 1 222

-50

1

2

3

4

5

0

7

8

9

123 4 5 0

7 8 9

3’62 3’70 3’77 3’84 3’92 3’99 3206 32’4 3221 3228 11 2 3 4 4 5 6 7
3243 3251 3258 3266 3273 3281 3289 3296 3304 1 2 2 3 4 5|; 5 6 7
33’9 3327 3334 3342 3350 3357 3365 3373 338’ 122 3 4 5 5 6 7
3396 3404 34” 3420 3428 3436 3443 3451 3459 122 3 4 5 667
3475 3483 349’ 3499 35o8 35’6 3524 3532 3540 1 2 2 3 4 5 667
358i 3589 3597 3606 3614 3622 1 2 2 3 4 5 6 7 7
3548 3556 3565 3573 _
' ~ 3707 ’23 3 4 5 678
363’ 3639 3648 3656 3664 3673 3681 3690 3698
374. ’ 3750 3758 3767 3776 3784 3793 ’23 3 4 5 678
37’5 3724 3733 ..
_ 3846 3855 3864 3873 3882 ’23 4 4 5 678
3802 38” 3819 3828 .3837
3890 3899 3908 39’7 3926 3936 3945 3954 3963 3972 123 4 5 5 678
398’ 3990 3999 4009 4018 4027 4036 4046 4055 4064 123 4 5 6 678
4102
4111 4121 4’30 4’40 4. ’_50 4. ’_
.
.
59 ’23 4 5 6 789
4074 4083 4093
4169 4178 4188 4198 4207 42V 4227 4236 4246 4256 ’23 4 5 6 789
4266 4276 4285 4295 4305 43’5 4325
- - 4335 4345 4355 ’23 4 5 6 789
4365 4375 4385 4395 4406 4416 4426 4436 4446 4457 ’23 4 5 6 789
’23 4 5 6 789
..
4467 4477 4487 4498 4508 45’9 4529
_ _ 4539 4550 4560
457’ 4581 4592 4603 46’3 4624 4634 4645 4656 4667 ’23 4 5 6 7 9 ’o
4677 4688 4699 47’0 4721 4732 4742 4753 4764
.. . 4775 ’23 4 5 7 8 9 10
4786 4797 4808 4819 483’ 4842 4853 4864 4875 4887 ’23,;4 6 7 8 9 10
4898 4909 4920 4932 4943 4955 4966 4977 4989 5000 123 567 8 9 10
5012 5023 5035 5047 5058
- . 5070 5082 5093 5’05 5”7 124567 8911
5’29 5’40 5’52 5’64 5176 5188 5200 5212 5224 5236 124 5 6 7 8 10 11
5248 5260 5272 5284 5297 5309 532» 5333 5346 5358 ’24 5 6 7 9 10 11
5483_ »34 568 9 10 11
5370 5383 5395 5408 5420 5433 5445 5458 5470 ,.
5495 5508 552’ 5534 5546 5559 5572 5585 5598 5610 ’34 568 9 10 12
5623 5636 5649 5662 5675 .5689, 5702 57’5 5728 574’ ’34 5 7 8 9 10 12
5754 5768 5781 5794 5808 5821- 5834 5848 5861 5875 ’34 5 7 8 91112
5888 5902 59’6 5929 5943 5957 5970 5984 5998 6012 ’34 5 7 8 ion 12
6026 6039 6053 6067 6081 6095 6*09 6124 6138 6152 ’34 678 1011 13
6166 6180 6194 6209 6223 6237 6252 6266 6281 6295 ’34 6 7 9 10 11 13

11 1
11 1
11 1
I 1 2

222
222
222
222

51 3236
•52 ! 33”
•53 3388
•54 3467

II 2
11 2
I 1 2
II 2
11 2

1140 ”43
1167 1169
”94 ”97
1222 1225
1250 ”53
1279 1282
’309 13’2
’340 ’343
’37’ ’374
’403 1406

1146
1172
”99
1227
1256

01 1
01 1
o1 1
01 1
o1 1

222
222
222
223
223

•55
•56
•57
•58
•59

1285
’3’5
’346
’377
’409

O I I 11 2 223
01 I 1 2 2 223
Oil 1 2 2 223
O I I 1 2 2 233
01 I 1 2 2 233

•60
•61
•62
•63
•64

’435
’469
’5O3
’53»
’574
1611
1648
1687
1726
1766

’442
1476
1510
’545
1581

0 1 I 1 2 2 233
O I I 1 2 2 233
O I I 1 2 2 233
01 I 122 233
O I 1 1 2 2 333

65
•66
•67
•68
■69

O1 I
01 1
01 I
OI I
OI I

1 22
222
222
222
222

•70
•71
•72
•73
•74

OI I
01 I
01 1
OI I
O1 I

223
223
223
223

OI I
OI I
OI I
01 1
I I 2

223 344
223 344
344
223 344
233 44 5

80 6310
81 6457
■82 I 6607
83 ! 6761
•84 ’ 6918

6324 6339 6353
"
647’ 6486 6501
6622 6637 6653
6776 6792 6808
6934 6950 6966

6368 6383 6397
6516 6531 6546
6668 6683
"" 6699
6823 6839 6855
- - 6998
- - 7015
6982

233 445
233 44 5
233 445
233 445
233 45 5
I I 2 234 455
1 1 2 234 455
1 I 2 234 456
I 1 2 334 456
I I 2 334 456

85 ' 7079
•86 : 7244
•87 1! 74’3
•88 ,7586
,
•89 7762

7096 7112 7129
7261 7278 7295
7430 7447 7464
17638
7603 7621
.
7780 7798 7816

7’45 7161 7’78
73” 7328 7345
75’6
7482 7499
.
7656 7674 769»
. .
7834 7852 7870

2535 254’ 2547
2594 2600 2606
2655 2661 2667
2716 2723 2729
2780 2786 2793

1618
1656
’694
’734
’774
1807
1816
1858
’849
1901
1892
’936
’945
1982
’99’
2028 2032 2037
2075 2080 2084
2123 2128 2’33
2’73 2178 2183
2223 2228 2234
2275 2280 2286
2328 2333 2339
2382 2388 2393
2438 2443 2449
2495 2500 2506
2553 2559 2564
2612 2618 2624
2673 2679 2685
2735 2742 2748
2799 2805 2812

90
•91

7962 7980 7998
~ " 8185
8’47 8166
8337 8356 8375
„ ’ 8551 8570
853
8730 8750 8770

8017 8035 8054
8204 8222 8241
8395 84’4 8433
8590 8610 8630
8790 8810 8831

’34 6 7 9 1012 13
235 689 11 12 14
235 689 11 12 14
235 689 ” ’3 ’4
235 6 8 10 ” ’3 ’5
7’94 72” 7228 235 7 8 10 ”’3 ’5
7362 7379 7396 235 7 8 10 ’2’3 ’5
7534 755’ 7568 235 7 9’0 12 14 16
7709 7727 7745 245 7 9” 12 14 16
7889 7907 7925 245 7 9” ’3 ’4 ’6
8072 8091 8110 246 7 9” ’3’5 ’7
8260 8279 8299 246 8911 ’3’5 ’7
8453. 8472 8492 246 8 10 12 »4’5 ’7
8650 8670 8690 24 6)81012 14 16 18
8851 8872 ""
8892 246 8 10 12 1416 18

2844 2851 2858
2911 29’7 2924
2979 2985 2992
3048 3055 3062
3”9 3’26 3’33

2864 2871 2877
293« 2938 2944
2999
___ 300630’3
3069 3076 3083
3»4i 3U8" 3’55

I I 2
I I 2
I I 2
I 1 2
1 I 2

95 89’3 8933 8954 8974
■96 9120 9’4’ 9162 9’83
•97 9333 9354 9376
9397
. ......
•98 |955o 9572 9594)96’6
•99 [9772 9795198’7!984o

8995 90’6 9036
9204 9226 9247
94’9 944
... ’ 9462
9638 9661 9683
9863 9886 9908

9057 9078 9099
9268 9290 93
_”
9484 9506 9528
9705 9727 9750
993’(9954 9977

”32
”59
1186
1213
1242
1271
1300
’330
1361
’393
1426
’459
’493
’528
’563

1600
’637
’675
’7’4
’754

”38
1164
1191
1219
”47
”74 1276
’303 1306
’334 ’337
’3*5 U68
’396 1400

”35
1161
1189
1216
’245

’429
1462
’496
’53’
1567
1603
1641
’679
1718
’758

’432
1466
’500
»535
’570
1607
’644
’683
1722
1762

’799 1803
1841 ’845
1884 1888
1928 ’932
’972 ’977
2014 2018 2023
2061 2065 2070
2109 2113 2118
2158 2163 2168
2208 2213 2218

’795
’837
’879
’923
1968

2259 2265
2312 23’7
2366 237’
2421 2427
2477 2483

2270
2323
2377
2432
2489

’439
’472
’507
’542
’578

1614
1652
1690
’730
’770
1811
’854
1897
’94’
1986

333
333
333
334
334

222 334

334
334
344
344

I 1 2
1 I 2
I 1 2
I I 2
I I 2

334 556
334 556
334 5 5 6
344 566
344 566

•75
•76
•77
•78
•79

■92

•93
•94

7943
8128
8318
8511
8710

6412
6561
67’4
6871
703’

6427
. . 6442
6577 6592
6730 6745
6887 6902
7047 7063

246 8 10 12
246 8ii 13
247 9” ’3
247 9” ’3
257 9” ’4

’5’7 ’9
’5’7 ’9
15 17 20
16 18 20
16 18 20

1

UNIT II - BASIC STATISTICS

by Robin Shannon, Lecturer
Department of Economics,
University of Newcastle-Upon-Tyne
(United Kingdom)

CONTENTS

iii

Introduction

USEFUL NOTATION

1

1.1

The summation sign

1

1.2

Subscripts and superscripts

4

SECTION 1

SECTION 2

FIRST STEPS IN DATA ORGANISATION

2.1

Raw data

5

2.2

Data aggregation

6

2.3

Tables

7

2.4

Time-series tables

8

2.5

Geographical tables

8

2.6

Frequency tables

9

2.7

Two-way tables
Ten practical hints on designing tables

9

11

FREQUENCY DISTRIBUTIONS

12

3.1
3.2

Classifying data
Hints for constructing frequency distributions

12

3.3

14

3.4

Histograms
Frequency polygons

3.5

Relative frequency distributions

17

3.6

Population pyramids

18

3.7

Cumulative frequency distributions

21

3.8

Frequency curves

23

MEASURES OF CENTRAL TENDENCY

26

4.1
4.2

Summary statistics

26

The arithmetic mean

26

4.3
4.4

The median

29

The mode

30

MEASURES OF DISPERSION

31
31

5.2

The range
The standard deviation

5.3

The variance

33

2.8
SECTION 3

SECTION 4

SECTION 5

5.1

13
17

32

- ii

5.4

Comparison of standard deviations

35

5.5

Calculation of standard deviation of grouped
data

3*?

RELATIONSHIPS BETWEEN VARIABLES

40

6.1

Association and causation

40

6.2

The importance of the theory of probability

40

6.3

Association between variables

42

6.4

Covariance

45

6.5

The coefficient of linear correlation

47

6.6
6.7

Spearman’s coefficient of rank correlation

50

Causal relationships:

52

SECTION 6

6.8
6.9

regression

The method of least squares

56

General formulae of the regression coefficients

56

Interpretation of the estimated regression
coefficients

59

6.11

Extrapolation

61

6.12

Interpolation

61

6.13

Non-linear curve-fitting

63

6.14

Regression:

6^

6.10

)

a summary

EXERCISES AND ANSWERS
Exercise 1 (Suggested to reader on p.3)
ii
it
Exercise 2 (
p.14)

69-70,

80-81

71,

82

Exercise 3 (

ii

ii

p.17)

72,

83-84

Exercise 4 (

ii

ii

p.23)

73,

85-86

Exercise 5 (

ii

ii

p.29)

74,

87

Exercise 6 (

ii

ii

p.31)

75,

88

Exercise 7 (

it

it

p.38)

Exercise 8 (

it

p.52)

76,
77,

89-90

it

Exercise 9 (
Exercise 10(

it

n

93-94

it

p.59)
p.67)

78

it

79,

95-96

91-92

iii

Introduction
It has often been said that "you can prove anything with
%

statistics".

It has also been said that "there are lies,

damned lies and statistics"!

Such statements exemplify an

attitude to statistics which is all too frequently held - but

it is hoped that by the end of this Chapter the reader will
be convinced that it is a mistaken attitude!
The word "statistics" may bring to mind a variety of

impressions.

Most people recognise that numerical data are

. collected, organised and presented by the statistician.

This

chapter will have a good deal to say on this very important
aspect of the statistician's work.

But it is perhaps less widely

recognised that statistical methods exist also for the analysis

and interpretation of numerical information.

Empirical evidence

may, through statistical methods, be used to assess hypothesised
relationships between variables.

Naturally, such methods -

sometimes complex and technical - may be misused, whether
deliberately or not.

abused.

But non-statistical methods, too, may be

Anybody concerned with using statistics in the

educational field should be well aware of the scope - and

limitations - of statistical techniques.
The nature of statistics
There exists a body of mathematical statistical theory of

a highly formal nature, founded in various aspects of the theory

of probability.

This background theory cannot be explored here

iv

Our concern is above all with an exposition of statistical
methods for the practising educational administrator or planner.

involved in such down-to-earth questions as, how many children

may be enrolled in primary educational in 5 years’ time?

or.

how may intake rates into the educational system be expected to

develop over the coming decade?

To answer such basic questions, a number of mathematical

and statistical techniques need to he mastered*

In the previous

Chapter, the basic mathematical techniques necessary for an

understanding of the statistics presented here have been developed*

The reader should not, therefore, commence this Chapter before

having studied that Chapter.

The reader is again urged to complete

the exercises.

Competence in statistics cannot be gained through

reading alone.

One hour of practical work is probably worth

several hours’ reading1

This Chapter will develop from an initial introduction to
certain notational conventions to an examination of ways of
arranging and presenting data.

It then moves on to an exposition

of a variety of summary statistics, after which the reader is
ready to be introduced to the concepts ctf correlation and regression

analysis.

1.

1.

USEFUL NOTATION

1.1

The summation sign

1.1.1

In analysis of educational data, it is frequently

necessary to add (sum) a series of numbers.

For example, the

reader will probably be familiar with the general idea of an
’’average” of a set of numbers.

We shall formally introduce

several average measures in Section 4 below.

To consider the

use of the summation notation let us consider the arithmetic

mean, which is defined as the sum of a set of data, divided by
the number of figures being summed.

Suppose we wish to find,

for example, the average class size in a particular school.

Let class size be denoted by the variable X.

If there were.

say, 5 classes, their size could be written symbolically as:
xl» x2’ x3f X4* X-5
with each numbered subscript denoting a particular class.

1.1.2

How many students are there in all classes?

Clearly, the

sum of the 5 classes:
X1 + X2 *

(1)

+ X4

The average number of students per class is therefore:
X1 + X2

3 + X4
5

(2)

Generalising, if there were n classes the total number of

students would be:

* xn

X1 + *2 +

(3)

and the average number of students per class would be:
+ xn

X1 + X2 +

n

(4)

2.

lol»3 Clearly, this is a very cumbersome and unwieldy way of
writing down what is, in essence, a very simple idea.

It is

here that the summation sign, written 22 (pronounced "sigma”),
proves so useful.

above*

To see how it is used, consider formula (3)

Utilising the 22 notation, this may be re-written:

n

i=l

(5)

xi

The figures below and above the

tell us to sum from

i=l (the first value of x) to i=sn (the last, nth, value of x)*
Thus, for example, the expression (1) above say be written in

the much more convenient shorthand:
5

i=l I
and similarly formula (2) may be written:

5

ir,1i
5
1.1*4

We have dealt with the summation of a variable,X*

that when we sum a constant, a, n times we obtain:
n
a
ial

» na
For example, if a=2, and n=4, then:
4

i=l
= 2 + 2 + 2 + 2
= 4/2

= na

Note

3.

1.1.5

When sunming expressions involving two or more variables.

a little careful thought is necessary in using the notation.

For example:
y1 + a)

i

n
>" a
+ Xn + yl + y2 +

= X1 + x2 +
n

n

r-

- + yn

Xi

i=l

i=l

yi

+ n a

But this approach, which involves taking the

sign to each

term, is only valid where addition or subtraction are concerned.

It is not appropriate with multiplication or division.

For

example:
n

i=l

(xi yi a)

(6)

does NOT equal;

(7)
(The reader should satisfy himself or herself of the truth of

this proposition).

The only valid manipulation of (6) is to

bring the constant outside the bracket, the expression (6) then
becoming:

n
a

(5Ci

Xi)

The reader should now

do

EXERCISE 1

Exercise 1 on page

i=l

4.

1.2
1.2.1

Subscripts and superscripts
The first sight of a ten such as:

x.t .

13

can be very off-putting to the beginner in statistics:

There

seems to be an incomprehensible jumble of lettering vhich appears
designed to confuse rather than clarify.

In fact, as so often

with techniques in mathematics and statistics, a little time
invested in learning the method will pay handsome dividends in

time saved later.

Notational devices such as subscripts and

superscripts are simply a very convenient form of shorthand.

1.2.2

For example, the reader may imagine that he or she were

investigating how many students there had been in each class in

all grades in a school over a number of past y^are

define the variable

Let us

as the number of students in a class.

Let:

i 3 1,

, n s number of classes in each grade

j - 1>

♦ fc = number of grades

t = year
Then, for example, we could write

t

Xij s number of students in the ith class in the
jth grade in year t

Ihe reader may appreciate that Xtij is a very convenient way of
writing down what would otherwise have to be written, in non-

symbolic terms, in a long sentence.
1974
24

For example:

= number of students in the second class of

the fourth grade in 1974
1.2.3

Recalling our summation notation, the expression which

stands for the total of all students in all n classes in the jth

5.

grade in year t would be:
n

t

X. .
ij

i=l
t
•
1J

*

t
X2j

xt .

If we wish to sun all the students across not only all classes but
t

also all grades in the year t, we must employ a double

n

i=l

notation:

k

3=1

x.t .
U

(8)

This conveniently stands for the lengthy expression:

f

r:

t
X11

t
X21

Xil

t
Xnl

t
X12

t
X22

t
Xi2

t
Xn2

t
Xlj

t
X2j

t
x. .

t

t
Xlk

t
x2k

t
Xik

nj

*

The value of subscripts , superscripts and the

L

clear in this case.

ti
nk

x

notation is particularly

Assuming there are n classes in each grade,

(nk) separate terms have been condensed, thanks to these notational
devices, into one simple expression, (8).

L
L
L
L

2.

FIRST STEPS IN DATA ORGANISATION

2.1

Raw data

2.1.1 Data which have been collected?perhaps by questionnaire, but not
yet further organised in any way are often referred to as "raw” data.

The educational statistician may have at his disposal a great deal of

such data in his basic records which might, for example, be in the

fora of completed annual questionnaires.

Hie data in these records must

6.

be arranged, summarised and presented.
ordered patterns.

The statistician will seek

Although the topic of good questionnaire design

cannot be discussed here, it should be pointed out that a welldesigned set of questionnaires and carefully maintained basic records

are invaluable foundations for basic data organisation and analysis.
©

2.2 DatS aggregation
2.2.1

In drawing up tables (the main types of analytical tables are

discussed in 2.3-2.7 below) varying degrees of data aggregation may
be performed.

Presentation of completely disaggregated data would

in general be an indigestible meal for the final user of the
statistics (1).

For example, consider data on pupil enrolment in

educational institutions at all levels.

The data will probably be

gathered by means of an annual questionnaire, and will give a variety

of information usually including the sex, age and grade of all pupils.
For basic tables these data may be aggregated in a variety of ways.

Pupils may be aggregated by sex, age and grade for each level.

At

a higher level of aggegation, national aggregates may be found.
2.2.2

The criteria for aggregation will always depend on the purposes

of the analysis.

Certain basic aggregates will almost always be

made, such as those mentioned.

But a number of other aggregates will

be necessary for particular purposes.
aggregated by language spoken;

For example, pupils might be

by geographical region;

by type of

school, or according to many other possible classifications.

at the highest level, national data

(1)

And

may be further aggregated.

For some purposes of course, the original information, with no

aggregation at all, may be essential.

7.

for example by geographical groupor by levels of economic
development, or still further criteria*

2.3 Tables

2.3.1

In developing ordered patterns fron the mir infornation,

There is a

an early step is to construct well-designed tables*
great variety of different ways of drawing up tables*

The cobmou

attribute of good tables is that they all seek to show patterns

over space, siie or time.

Their conson objective is to bring order

to the chaos of raw data, and to infers the reader as unambiguously

as possible about an aspector aspects^of the structure of the
information collected.

/

s2e3.2 Depending on the nature of the data and the purposes of the

analysis, four common basic types of tables are generally used:
those showing a time-series, i.e., comparing a variable

i)

or variables at one period of time with another period

ii)

those showing a geographical distribution

iii) frequency tables

iv)
2<4

two-way tables

Time-series tables

2e4el

A time series may be defined as a set of ordered observations

on a quantitative

characteristic of an individual or collective

phenomenon taken at different points in time (1).

Numerous time series

are presented in the Chapter oh Basic Mathematics, and there are
further examples below.

Here we stress the main points to bear in

mind when presenting time series in tabular form:
always specify clearly dates to which the data refer

always specify clearly the definitions of the data
where definitions change in the course of a time-series, bring

this fact to the table-reader's attention.

This amy be done

either by a footnote or in the figures themselves; e.g., by
using italics.

2.5 Geographical tables
2.5.1

Raw data may be aggregated and presented in tables showing

geographical (e.g. regional, or urban/rural) distribution.

Such tables

are often of great value in making inter-regional compari

If

they are also time series tables, regional developments over time may
be analysed and compared.

(1)

See the Chapter on Basic Mathematics, para. 2.1.1

2.6

Frequency tables

2.6.1

Examples of frequency tables are presented in Section 3

below.

A frequency table shows how often a particular characteristic

is present.

Often the data is expressed in percentage fora, giving

an easily-grasped impression of the pattern which may be present.

2.7
2.7.1

Two-way tables

Two-way tables are very widely used by the educational statistician.

In Example 1 below, data have been taken from a basic Unesco questionnaire
referring to Cameroon in 1975-76.

and grades horizontally (raws).

Ages are shown vertically (columns)
At each intersection of a column and

row (cell) is the number (absolute frequency) of pupils of a particular
age in a particular grade.

Data, which have been collected by the

national authorities, are thus presented already aggregated by single
year of age, sex and grade.

Rows and columns have also been aggregated

(see the extreme right hand column and bottom raw respectively) and

the grand totals (for males and females together, and females only)
presented in the bottom right hand corner cell.

•

Example 1: a two-way table presenting enrolment by age, sex and grade
Table 1; Enrolment at the primary level, by age, sex and grade, Cameroon 1975-76.
Age of
pupils in
completed
years
Under
5 years

5 years

■yroTALy
Sex

7 years

Female
Both sexes

Both sexes
Female
Both sexes

Female
8 years

Both sexes
Female

9 years

Both sexes
Female

10 years

Both sexes
Female

11 years

Botli sexes
Female

12 years

Both sexes
Female

13 years

14 years

15 years
16 years

Both tfexes
Female
Both sexes
Female
Both sexes

Female

Both sexes
Female
Both sexes J

TOTAL
all ages

Grade 3

Grade 4

Both sexes

Female
6 years

Grade 1

Grade 2

Female

___ 6\2.l
3 2 !______<4 j
3ioj _____ 5
'
5481~ 34 J
>23 2Sfa] 2.X 002I
332

I
Grade 5

Grade 6

Grade 7

rar
grades)

|
|

1

<3 U14I
314 faU-g

I

mu 8^4
Sfa 804
3 %oa
234
va
45 413.1 60 Mi]
%
3otj
£12.
l<2 \3>M
34 244]
02S 4314
^40
3>\ 8541 SG Igd 41 424I U 02?
fesn __ 421 ZZjJ I lyl <0;
14 aaol
fasal 11 4?1] < mb
08_____ tg ______ 1
il lbs] 34 <4oj 4U 14 54334 ft
<31
1
195
SIP41
4 448] \< 44
fli 45 \3 owl 3 t> X
■auAj .....
<8
4 424] n <0'1 33 G4 3<
2' 3S3
am ^914 <413]
1 233] 4 fa48 \4 434] \b '<^ 4 gfao]
°l%
«4<22
b
49d
aa
bad
5.1
bfao]
28 goo! \4 <441 i <o4l ~o8<yd
\ fc241
fcUoJ
to Ofaol \3 481 '2 fog 4 4 2 34
fcaiy
12. 4aXT 2 ' "4! 28 4<8j 2fo 42'] 44 OU ] I G K3j
fcisl
< 2^ 1 3b(o '2 4fall IO b&fa.|
355]
\ bfc»l-4
Ul 130
3241 \ os<d
<I2] \a 243] 22^'81 3' <8d U- oog| ~44<6^
\ o&J 383, a asa S 3<4j IO 'o£ ' 2 14\] \ faSO 33 W
\ ^83] <34lJ '244^ 33 <184 a 8 3b 44
108]
38]
I Aol
433,1 a
< fa 28 \O O<1
v \oa.
\4ol
441
3o \8fcl
4-vo] a 332 rVbtj jmsuo] \ 4-41
2 a]
<3314
•4
<53
\ 2,O<lj
2 44g
341
fa'ol 2 824I \3 ood
mJ
4841 ~4 ^<141
'8^1
<13'1
H-j
tySj
\8G|
°>ai| u
a.oo
444j2o\ U.4OJ
K mo] Miaiod
<3> 4'4 %u. fc4<o
^lu-bl s'4^4 {00 04^
\oo s's

The table shows, for example, that (in 1975-76) there were 1102 girls in the seventh grade
aged m; and a total of 11028 children in the fourth grade aged 8 years. A total of

1,122,900 children were enrolled at the primary level.
--- ?
,
.
—;

O

II.

2.7.2

The four basic types of table discussed may of course appear

in a mixed form.

It would be possible, for example, to have a table

which presented regional enrolment data over a period of time.
classified two ways, in both frequency and percentage form.

But there

would be obvious dangers for clarity in presenting a table of such

complexity!

There are a number of simple principles which should be

observed in designing any table.

Bearing in mind that the fundamental
to
purpose of a table is always to help the reader^better
comprehend
the patterns and meaning of quantities of numerical data, the main
principles are listed in the following paragraph.

2.8

Ten practical hints on designing tables
a table should be unambiguous and as simple as possible

footnotes should be used to explain any omissions, necessary

approximations, or changes in definitions which might occur
over a period of time
units of measurement should be made clear, particularly any

changes in the units over time
a table should not be too large.

meal for the reader,

This makes an indigestible

The over-sized table should either be made

into two or more, or have its data further aggregated

if groups of data are to be compared they should be placed close
together in the table
summary statistics (see especially Sections 4 and 5 below) and

percentages, etc., should be placed'close to the data from which
they are derived

generally, a vertical, rather than a horizontal, arrangement of
data is preferable. The eye moves down columns of data more easily
sections of a table may be separated off by horizontal or vertical
lines, again to aid the eye

- totals should usually be given.

These are not only valuable in

themselves, but are often a useful check on the accuracy of their
components
sources should always be given,preferably immediately beneath the
table

tt.

3.

FREQUENCY DISTRIBUTIONS

3.1

Clasaifyin^ data

3.1.1 When bringing soae initial order to large quantities of
raw data it is very often useful to distribute the data into
groups.

These groups are known as classes or categories.

The

number of items of data falling into a class is known as the
class frequency<

If the data are arranged in a table by classes.

and the corresponding class frequencies are also tabulated, the
table is called a frequency distribution in tabular fom.
Example 2:

a frequency distribution

A survey of teachers was carried out in 1973 in Afghanistan.

Table 2. below is a frequency distribution of the ages of all the
teachers in the survey:
Table 2:

Ages of a sample of teachers in Afghanistan, 1973

Frequency

3.1.2

< 20
20 <25
25 <30
30 <35
35 <40
40<45
45

237
7284
8146
2166
1198
963
959

Total

20953

The following important points should be noted about the

so-called grouped data in Example 2.

)2.

the variable under analysis, age, has been grouped into

classes consisting of intervals of 5 years, except for
the first (<20) and last

/

which are called open

class intervals
-

the sign ,l < ” has been used to avoid any ambiguity about
the correct classes into which all individuals fall*
the intervals cq^) d

it

hm'H-Cu

:

20-2^
25-M

etc*
z

ah

uthe end numbers are known as the class limits (or boundaries).

For example, referring to the 20<25 class interval, 20
is the lower class limit and 25 the upper class limit

- the difference between the upper and lower class limits
of a class interval is the class width:

in this example

(apart from the first and last classes), the class width

is a constant 5 years*

Equal class width is convenient.

but not essential

the midpoint of the class interval is known as the class

midpoint*

It is obtained by summing the lower and upper

class limits and dividing by two*

The class midpoint,

of the (30 < 35) class is therefore (
3*2

304-35
•) = 32*5 years *
2

Hints for constructing frequency distributions

Find the largest and smallest values of the variable*
The difference is called the range of the data

split the range into class intervals*

Usually a minimum

of about 5, and a maximum of about 20, classes will be

m.
found best

find the class frequencies by counting the numbers of

observations belonging to each class interval •

EXERCISE 2

The reader should now

3.3

3.3.1

Exercise 2 on page 7/ •

Histograms

We have seen that a properly-constructed table is a first

step in the informative presentation of data.
representation is known as a histogram.

One useful graphical

A histogram is a set of

rectangles with the following characteristics:
the area of each rectangle is proportional to its class

frequency
each rectangle has as its base (along the horizontal X
axis) a class interval centred at the class midpoint.

If the class intervals are all of the same width, then the heights

of the rectangles may be made equal to the class frequencies.
If class intervals however are not of equal width, the heights

must be adjusted in accordance with the principle that areas
remain proportional to frequencies.

/$:

Example

comitruction of a histogram

Figure 1 presents the data from Example

histogram.

in the fora of a

Note that frequencies are scaled on the vertical,

(^)» axis, and the variable, in this case age, is scaled on

the horizontal (x), axis.

Rectangles are of width 5 years,

centred on class midpoints, 22.5, 27.5, .... 42.5.

the first and last rectangles been constructed?

How hare

In order to do

this, it has been necessary to sake an assumption about the

lower class limit of the " < 20" open class interval, and about
uU5 xaA ©•4<r'1
the upper class limit of the A
/
open class interval. For
the purposes of this Example, these have been arbitrarily assumed

to be 15 and 60 respectively.

(Ideally the statistician would

like to have further detailed information to determine the true

lower and upper class lisits).
Thus the first class midpoint is 15+20
17.5 years;
2
45+60
the last is
= 52.5 years. The first has a length of 5
2
years. as have all classes other than the last, which has a

width of 15 years.

In order to preserve the rule that the area

of each rectangle be proportional to its class frequency, the

height of the final rectangle must therefore be divided by 3

(i.e.,

) before plotting on the graph (remembering that area a

width x height).
959
3

Thus the final rectangle is plotted at height

= 319.7.
Figure 1 shows in a strikingly clear fashion how the teachers

were distributed by age, the great majority of them being between
20 and 30 years old.

«
•>O I............

- I

J0O
■J;

■■4—jow

7580

L .

mi rrprrr-T

I.. ir-M

:Zl

•:r'T t 1/

[

L/L.J

.. #.
I

..

f

_p

I ; ^TTiFT
r.m. :::

:.. i . I

1

■i ; \4

r:i

r^’’- r”-’

:

'

■ ’

;.....

■

i

I

i 'I

I ■
number
of
teachers
(frequency)

I

. -\
.. I

1-!

I

I : •i

.J...,J.

I

4

■ r
T

i........... ■

.

i

i

L

U0«o-

I-

i ”.'. ' " j -

“ •

i

S00V-

i

of a sample of teachers in

Afghanistan, 1973
.

I

... r

. :•?:. I

frequeincy polygon. Biioving frequency

A ^rrhntrl titnri

■

; i rf ; !i
4

T

;■.-41*. I
!

■"T-

.; : j.;:

dlstributiiori of

I

-ir

rrw

r___
i"'n__ ___________
,............ .
Wi,
.
Histogram and,

...

zL I

^0- . ..r
l>t00'

' 171 •' Ht-. - j r~
Histogram i

*

*

Figure 1

t

.t.L. .

1

I

i

r

/

| Frequency polygon

Jrto -

I

Jeoo-

^Voo -

1

Aeuo/Mo -

- -| - ;
,

/oio •

.. J.

....
t

•-1

$00 -

IS

r~

4o

xr

Source;gee Table 2^

Jo

bo

Mr

yo

io

Age (years)
___j

I

i

/7

3.4

Frequency polygons

3.4.1

The broken line joining the midpoints of the tops of the

rectangles in the histogram in Figure 1 defines a frequency polygon.

It can be seen that in the case of the first and last class
intervals the line has been extended beyond the original range

(15-60 years) of the variable.

Thia is necessary to ensure that

the area under the frequency polygon is exactly the same as the

area of the rectangles in the histogram.

EXERCISE 3
The reader should now

3.5

3.5.1

Exercise 3 on page 72. .

Relative frequency distributions
The relative frequency of a class is the frequency of the

class divided by the total frequency of all classes.

Usually it

is expressed as a percentage
Example Z};

a percentage relative frequency distribution

Using the frequency distribution in Table 2, Example 2, a relative

frequency distribution can easily be constructed:
Table 3:

Percentage distribution of ages of a sample of teachers
in Afghanistan? 1973
Age
(years)

Relative frequency

< 20
20 C 25
25 < 30
30 < 35
35 < 40
40 < 45
lyS
over

1.1
34.8
38.9
10.3
5.7
4.6
4.6

(%)

Total

100.0
The sum of the percentage frequencies should obviously be 100%.
However rounding errors may sometimes cause the total be be, e.g. 99.9%.

)2.

3.5.2 A graphical representation of relative frequency data

can be obtained simply by changing the scale on the vertical

axis from absolute to relative frequency.
exactly the same.

The diagram remains

Such graphs are known as relative frequency

or percentage histograms, and relative frequency or percentage

polygons.

a

3.6.

Population pyramids

3.6.1

One very widely-used form of a frequency distribution is

the population pyramid, a graphical representation of the
population classified according to sex and age—group.

this is a double histogram.

In effect.

However the rectangles now have

their bases on the vertical axis (age), frequencies (absolute

or relative) being measured along the horizontal axis.

This

axis is scaled positively in both directions from the origin;
one side for males, the other for females.

3.6.2

In Example

below we present a population pyramid.

The

pyramid technique may of course be applied to data other than

population»

For example, enrolment by sex and single year of

age, or grade, or by level of education (assuming certain age

limits to levels) could be shown.

In many less developed countries

the enrolment pyramid shape is distorted at the lower ages,
because children are late in enrolling.

This distortion can in

fact be a useful indicator of how far the educational system has
developed.

At a later stage of development (assuming no significunt

decline in birth rates) the classic pyramid shape is 1ikely to emerge.

Example 5~:
Table

construction of a population pyramid

presents the population of Indonesia in 1971 by 5-year

age group and sex, as percentages of total population^ Figure

2 presents the graphical representation of these data.
Table

Population of Indonesia by age and sex, as a percentage

of total population,1971

Age
(years)

Males: % of A-cftaX
Females: % of
population
-y-ol population

0 < 5
5 < 10
10 < 15
15 < 20
20 < 25
25 < 30
30 < 35
35 < 40
40 <45
45 < 50
50 < 55
55 < 60
60 < 65
65 < 70
70 < 75
75 and over
not known

8.9
7.5
6.2
4.5
3.6
3.1
3.2
2.9
2.6
2.0
1.5
1.1
0.7
0.6
0.4
0.2
0.1

8.7
7.3
6.0
4.7
4.2
3.8
3.7
3.1
2.6
2.0
1.5
1.2
0.8
0.6
0.4
0.2
0.1

Total

49.1

50.9

(/) Nq/z i

-A> Irvd.

I

3.6.3 The population pyramid of Figure 2 has arbitrarily
assumed a maximum age of 100.

Pyramids offer a clear visual

appreciation of the difference in age and sex structure of

different populations.

The demographic implications of this

figure cannot be discussed here

t but it might

be noted in passing that this Indonesian pyramid is typical of

a less developed country with relatively high rates of mortality
and fertility, giving the pyramid a characteristic broad base and

Figure g-

Population of Indonesia by age and

!

h-lr- Be
l,i;- I:i

■r

r '

-

1

:

:■

i

I

j

• T

;

’I

I

!

■

i

...

-

-

.

...

1 ■

11
~rT iT’i'- io

f,-

L

i

r
/o

u±i

f (r

I

: -r-

• ■ H.

L..'
.

I

si T' :]'■ 1”'

\

i

.1: i

■ i

: | r-H

i

F R:. 1
i

i

■ j

■

.. 1

+

i
i

females

IfO

Jr
]o
LC
Ifi

r A J i!

T

t

ir
lo
f
0 0
0
L

4

:

:

1

!
♦

■ 1

T

«

-rJU

-r
J

T

H

T-

r

7

1

-r

9
•

■•’

A>
!

Percent of total population

’

1

■ ■ I

f

TT’?

i

r trH-

!

nr

I

■ti
Sources 'see Table 4 '

:.rzgr

i

rr

c
7

'

lo

I I • r

H

•“if ■'i.... ;
—
i

lo

!

1

j

i:

/r ::R'

..

.... -.f

I.:*
I.
'•

<?r

i males

•

•

Elr

1- !■

I

___ L:

!<■

I

I

■

Li,.

/do

1
i- ’

;

LL

i' ;
t

S A6e :
•I

i

T J'j •
e..— .

11

...

. I

i' -U!■

T7.T " TJ "

? 4^

-’‘.a

: fff 1

!

I

1971-

/'■ "TfpTi-r—'.■•e

I

IEO 1

narrow top.

3.7

Cumulative frequency distributions

3.7.1

The total frequency of all values less than the upper

class limit of a given class interval is called the cumulative

frequency up to and including that interval.

For example, in

Example 2., the cumulative frequency up to and including the class

interval 35 <40 is 19031.
than 40 years.

That is, 19031 teachers were aged less

Similarly, considering Table 3 in Example//, in

relative frequency terms, 90.8% of teachers were aged less than

40.
Example 6 :

a cumulative percentage frequency distribution

Table C*presents the cumulative age-distribution of teachers as

A.
calculated from the data in Tables

of Examples^, and assuming

arbitrary lower and upper limits of 15 and 60 years.

Table 5"i

ctHoulative distribution of ages of a sample of teachers
in Afghanistan, 1973
Age
(years)
15 <20
20<25
23<30
30 <35
35<40
40 <45
45<60

3.7.2

Cumulative frequency

Absolute
237
7521
15667
17833
19031
19994
20953

%

1.1
35.9
74.8
85.1
90.8
95.4
100.0

The percentage cumulative frequencies have been plotted

on a graph in Figure 3..

Percentage cumulative frequencies less

than each upper class limit are plotted against that upper class

limit.

The resulting graph is called a (percentage) cumulative

Figure 3
"“I

%

FTTTT
4 T
' : i

Percentage cumulative frequency digtribution of ages of
a. sample! of teachers, in Afghani a tan f ;1973i-

:

L

Era 4

i /oo-i

•-1 j; r
r -ra r
4 ■;l i 1
..i.-.l. '....J

TWi

I

H. r •I..'
Cumulative
frequency
(%)

. c

...

!

; ■ i •

I

; J:

Lo d

Ez-t-

M-r

;d.'.

■

.. ... |

•

J ■
.•uLL.-l

I

i

!

4.

i

I FUj d
!

1 .
■

I ■. I

:

i-H:
r

1:.. ■

;

I”

___..i.
1

!
• -4......
r• i

/1
/ 1
/
1

3o -

1:

I

I...

T

. . !

/ * ’■

w-

1

;.

- -’--^4 -

...... !

r:

fo ----

|

.

’i

..[.. J ... _•
i

!

!

J.

■

I

..

■—r -F-r- ;

...4

p-:

i.

1
1

'

1

Jo '

1 <
1
1 1

/O ‘

0^4-

/f

J

r
^0

^0

Af

median age

-)------------------ j-----

Jr

lo

J....:;...J. .

24.3
I

Source s

---------- 1-------------- -7-

r......... 1

•

Bee Tabled"

1

!

rr

- -4
.

i-

;
-i .. .■.

• ? . . 1

d U

■!

' i
'....J..'. :

’■ -

Age,(years)
’

!
■4

;

.

•

,

I

i i i ' !

j

... I--^. 4 . . :

!

frequency polygon or a (percentage) ogive.
example of a ’’less than” ogive.

Figure 3 is an

A ’’more than” ogive can easily

be derived from a ’’less than” ogive.

In the context of Example

6 a ’’more than” ogive would be constructed by considering the
frequencies of teachers 15 or more years old, 20 or more, and so

on.

1 .

EXERCISE 4

The reader should now do Exercise 4 on page 73*

3.8

Frequency curves

3.8.1

r-

In Figures 1 and 3 the frequency polygon and the ogive have

been constructed by connecting the appropriate points by straight
lines.

In a situation in which the rectangles under the frequency

polygon become progressively narrower (i.e. the class intervals

become ever shorter), the lines joining the midpoints of the tops

I

i

of the rectangles come ever more closely to approximate a smooth

curve.

Such a curve is called a frequency curve.

Similarly, smooth

ogives may be obtained in principle.

L .

3.8.2

['•

frequency curve when dealing with a continuous variable (such as,

Theoretically, we can only obtain a perfectly smooth

e.g^ height, weight, time, intelligence, etc.).Whenever a variable

is discrete (such as, e.g., class size, grade enrolment, etc.), a
perfectly smooth frequency curve can never be obtained.

Nevertheless,

in practice, discrete variables may very often be treated as if

they were continuous, with very little loss of accuracy.

Such

treatment is clearly best justified where class intervals are narrow.

Further, where the data are sample data, the larger the sample
size, the more likely is it that chance fluctuations, causing

large differences between adjacent classes, will be diminished

in size*

3*8.3

There is a variety of characteristically-shaped frequency

curves.

Figure 4 shows forms commonly met in the educational

field.

Perhaps the most important, both in theoretical and

practical statistics, is the second curve shown, which is

symmetrical and bell-shaped.

If perfectly symmetrical, it means

that observations an equal distance on either side of the central

maximum have the same frequency.

For example, intelligence, as

measured by standard tests, is approximately symmetrically distributed.
In practical, descriptive statistical work however, any of these
types of curve may occur.

We have already seen in Figure 1 an

instance of the first curve in Figure 4, a frequency polygon skewed
to the right (defined as positively skewed)•

A U-shaped distribution.

as shown in the fifth curve, could result from graphing the

frequency of deaths (on the vertical axis) against age at death.
Relatively high mortality tends to occur among both the very young
and the old.

The seWctuWxcurve, S-shaped, might be encountered in

drawing a cumulative frequency curve of pupil age distribution.

We shall meet a curve of this shape again in the discussion of

non-linear curve?-fitting in Section 6 below.

Particularly useful in

time-series analysis, it is known as the logistic curve.

As a mental

exercise, the reader should think of practical examples of the third,
fourth and sixth frequency curves in Figure 4.

fekfefeps

feE.fei
fe

*r ^yrpfefe
-:::F

-

1

fe fefeT
Fl !stsi;s
fe ■■fefef'■■|FEE
:--B:

pssps!

7T•!:::; s'l

•:’!:::-: '.rr..;

iliP"

H

:::.-

w

..........sfE [feeg ;E'E;i
Tfe FePMsBbW•fe ;.; bee

fe

fep®
I ___ E-S
hEBs
EE-s-s- ; ;EEBE.jF;EEEEEi-FE
feFfefefefem
I- Pfe

FiEiBBEsF
EsBEFEF_________
rpfefefefefeBI
ifefe
vE-S-FEn
fefefe
•■■••ferfefefefefe
feifefefefefee
fe :\kfeifefefeBggfefefe-g
•' • E~lE

EgEE
Wfe-: sfefePsgfeip
—isfe'BtEppEpfeiE
. .... , fespEpl
TTfmn-fe ym f-rferm:

mrissh-a: s

i. " ;' " FBF n sr^nFiEE;

—r- : r- • - m • - ■ •

jEB -53-^■EiWn EjEsEs HBfeB’BprjfefgfeB'

nSilSlS^S^
SfS' £EE

EspEB;

fe:\fefc
fefen H®S
,
-HF
- '-~-r ■

:PWfeBfe

---'-jy'-.

-steEEE'i■Efefe
’Es- :sBp^aEEBEE

;EEnte fee “SS-El
mi-fefe EikE
.
SE
h' E
oifelegeg
s|E
,
tnrpf:.7
Hfe—
"ife
e^Bfe feiEdE rfe rfe
Bs:Bfehm
M- —
’■ ' sfe
tsiM;
Isb-Be
EBEEF
fefei rife:.’ Bfe
rngfeejifepfee
SEpEp
tSSt

Ep;BB'

sff-j

■' 'S? ::

BBhEE

, ■ .„ , ,, ■ ■ ifefeP-- ■ . ,,
FkfeEBfeL ,;-.osSn-SEsgEEs

fep

. sfe

fefefefefej

- -. t ' s|--r .; .;

■p Es-E-EsjS
j.__

E-.n-.;;.l----- ri-----:.-:rt :—• r.-irrnrr

feS

’.s;sfe
BE-iEsEI
-f-—.-rr. rri-rri-mrl

1' fetfei rfeifefefe-fegtfer
- BfesFsfefe
5553
fefefej
■ ytmsfeifesFFfeBI

npppEEfeEjfelfe
feilfejsEfetfefer

SB BBH'''feBPfePhOffe j Wwte
fefe
atMfeffefefe:
Fir iBBi Bspplss Sp
?
BFPsfeEsfeilgEpiE pEEpfelifefefeiE
Pferfeiikii- »

\

fekfe ifetl.m Li "i Ljj r

HFinBlfe. Pp—

?SEi

Ps

'p
BBHl
w
feferfefefefefesHfefefefefeferSHr
feknfe BffefePBfeBfekbfe;.:
:
spsq^Sgeg
Hgt
Upfe SH-isEE;
Pk^fefefeEfefe
;eB kn-kr ntifeEj
,fefe
pi
rSSs . mFiiB
fefejfe
fefel-W
iiBefe
ife
sss
]fefefefe
jBsiEtS
fe;
WfeWB
....-4^ I
ese
ie
fegSB
nFFfei,
T-H
.
yw
:fefefeHuhfefeH;fe
. ifefefek:: ‘
Btfe .
.. ipEE^F

IF:; rtBBEii+sv

ifel.

j

Jss i_- r:lnxr i:B,—-

* i* e x*! i'—- t"rt ** ~ * e

rim; trt;

rife ”?!

ifeswHfc
fefefe
—ife FWIEIss^e feB fee
kMfe fefe
'jss
ffllffi iEinisEEjElT
W1: ! Fj-Ffe
kb&i:
e.WE^n ;fefe
: EpfS

-i ;h.:trr:r yrimr:
fell?

..

ilLiJF- P--L-!

-HEE
Eat"

T—tr-'

tt;

fefefefe-si^S

SHSES

•• gflin -

< Sfe.~;; fejfe fe|':':t

-rg—

eje

P Bees

ifefefepSiR

**!
P

BE BEE Bs
i
sfefeEfeEs
fefei
■^1 R
nr:,
spsPfefeip
BsWE;
BE;|iFlBBEi
p r ri;:- fen‘Life -—---• -1
xrrrrrn
i"
” sEn
slEiEiE-i; BiElfe...
■ ferfe
BBiB
EpP^Epp |e
fefefefer
pBPfe -rpjrrF-'
:-J; 1::r rfeferz
rk-fez :;‘:-fe’ BFfF::
nESs
S^sc;;Esf? sEWB
EE
r s 's t: Jf:i;:E feie Be
■^^fefefefep
lg friM
.,.-...EnfekrA-EsE EBs-.
—
Hfe
'
Bs
nEssBfcss
BgEl
'
Sjs
ES
SKH
BBFpg
::7:uE0^f'
fe§
"’“■ Fife'
irfe:;
—

... H^.snn

:ut:-

Ci-r

1 T.

1; Es: k-stsr:.
^vT' ' ' | * e* ■ ■

fflw

st
literfiga
m

;u-'t'r:L;.T.~:liy/i

t4 —*

-*■*?+• t-r- j’’**"* I t “S' B* * *E* j~?

~~ '

ifefefe-B fefe
ferFTT
t;gjss4s|s|g
•Stigtwjg!
ePhifeiW
rn———fer
- " . |--sEiBBfefe
7^fefefe
... . PkEBjiEifefeEpEiBE:: WS
^fefe
3E
’
ssf
rFr^rr:
fe|ti
Olfe B
EB-P
OfefesE fefe
.... feppfe jffl
FOPippEkP
|g:
SB|S|sjjSIfH
^yg-j
p-fe-fe
fefefeji
lEBlFBpPPfe-,
sigpkLuj
-k- fe
•slE; EEslsOSgEEsE
;' jFfeiH
k
■'- BBss sr'E EE
2. B Er’E:'
■iilW nSpfgE :.; jrlrFB iifefe
•r ■ t- ’ • *■' BHferB-ZX

I TV- '■ '■■

lis ~:~’ n~irritl!~i . p.. jw'
t-u-s, | sss+smUur:

ii:

Fife-

-fml:r.::t ■:: rmFirt ji-BFrtil-j-tF-rrrr1-irnrii

ttz; rife r”’l "E_i"' ii'

i t:

•

■ferfe-fe.
wiwiifeifefes
feife
k
;B;iEs;'
ittfeKHife
Pfe
isW tw
; ssBpEEElfWHii,..
__^_ n-EBE:sWwFEsE EEEBB F
stsEEsp
Eis; ''■EE:'-'- k-- BWEEWl
EEEEEE; fe^k “4
sft
fe
BfefeisfekpEB
‘i-Eii W
EEBEE
e
B
m
fc BB
. sfe fel fr sfE: , t...4eBifeEg
fete
. ,
-ufefeh-:pi-FFpF-:b;.S7 .....

„_Q Ss!

^BfeFTj.'i -'tg. ■:
; .;<

Z; i: -'cr J-E.' :■' '.

ilsfe'

5Lu3:

trsjr-.

X-i: n:

P-fe

HSjPrE^^i?

....

""

'

— i---r:r;e*r:: :.-sk; fesp; fen;;;.. mg-n 4p-i B in Fu

;pf iFfefefesfes
w
fe
EkEEFEfe nlni EiBhEnpE-pEEniin |ii EpstpffpHfeng;Ei BsPPfe
fefei
inpkpEEpBpBBsEE E;|B jnipn^n BePBFPfPR n<v
--x ’; ElBfe pBfeS sW
rjir!

Ens

sF; fgr?Je

*■' n1"-,'

; r: ■ ■ i- T&’tttft
LiiHI

fefOnfeBPppinE; -fe; feife Efefefei|felkWi
;;< sBEpB
EifepEB;
; PpiEpssBs-n jlpn

‘ pp

fefefBih Hn b n! jbHBlrih jilt -js 'Htiji e niipfep
Bfeg fejlfe -geg n teife W?i ■
ferj sEHisfS
•fei
fefe
spsE;
:rttr .r.T.r:
;_
BE SiEssE iHEE (EEEtFil E
Lfe
—
i.JJ
ils
IB
fSfefeirfe
ksfenpuufefe:
Sislsiisn •-S
aE&! i'SKrF iEtEppE WiE spin 1
g-gg gSgr ggjB
;ip[ EB-nr Ifefefefefe
fesinfeEEBeF fefeh
eBnOI
lp
■B \r::.’J-r
J- 41 kirtti • •
ss
Ffe
^r.,wiHi&s
nEiEiiEsEEW
nissEEl- -Es- FifeF/
XsFP
s'
_
E~
P
EBFE——B
HfeFfefeB•■-.H: F fefet Ffefe
SSSBFESESEEF
__
eHi
EEsBis xmimi
BnlEiEEssEsi WE
fetfefe.-:': B
fegif- |fe fe;
s-siEkfetr
::
F fr-ri si; fef ifefe'lfe:
-i-E
■iprfeife b
;EB;EEEiiEii
B§te
pfeefeifeHsfe 1sis' o r:fe : s sE; Fehss: spss: I’i-kF-F: ii-FE skE knE BsP
___
JE
iEfe
fefefefefefe WWMW SEBEE BRbIBB iBpEE EEnlsji^
FFi:
sEWWs
En
ltF:
j;F.sj:’
Hi::;:F
F
s:
feBI
;:::
i n Ei in
hF;fe
ePee-BB
..EEFEEnr PEEkn
—
fe
E
l-.Er'tkn
(F
-.UTr'-:
F"'
EL1
:;
i
:
;
'?FT
~
fefe •’” ^• -j- - kfefeRir
BPsEEs
E
•EEsEEk .tinn nEEsES
SESrS, ssBEpEs^
F. Frr
fepB
ksssE
E
j
;
■
,
ssknEE
sHsP; fet B fe
4^
EjksEsFEiSjLii EiO
EEpEnpEiiii
-Sts E'-pip:' SEESEn-Es

. aw fiiffc
tl
■ ;r:;i

- *■*■*■^1 ■» • • • • ► <- •-! • • •**•♦ ■*

• •- b *-♦ —•

•—

h

’EiH:.

. ,

. .-^ ► w, .,

■

tr;;';!n:hrrnr;:; r* r; -;!
nr;; ^Vl;;::!: n iinji

i::: tn::

■

; i ■ ■ ?

• • f • b.

:iLVrr.(nEr|x.:rvn:hr.:;}rr?-'!E n::.:

-t’t-b 1r i

4-

MEASURES OF CENTRAL TENDENCY

4.1

Stbliul ary statistics
The reader has now been introduced to some of the basic

4.1.1

techniques involved in the organisation and presentation of raw

The concept of a frequency distribution has been shown to

data.

be particularly useful.

himself or herself:

But now the reader should pause and ask

what analytical help do these visual

presentations offer us?

and the answer is:

whilst they may well

suggest further analysis, alone, they can tell us little of
either a summary nature (e.g. what is the average age of teachers
in Table 2?) or of the relationships between variables (e.g. how

is the average age of teachers in a particular country changing over

this and the next Section we therefore examine the first of these
issues:

how frequency distributions may be usefully summarised.

In Section 6 we shall examine how we may quantify ’’correlation” and

’’regression” relationships between two variables.

4.2

The arithmetic mean

4.2.1 An average is a measure of the central tendency of a set
of data.

We wish to define a statistic giving, in one simple

number, a measure of the central tendency of a whole set of

numbers.

The ‘’average” of everyday speech is known by the

statistician as the arithmetic mean.

In the notation introduced

above, the arithmetic mean is defined as:

x.

i

(9)

that is, it is the summation of the n observed values x

variable x, divided by the number of values, n
written x, pronounced "x bar

i

of the

It is frequently

calculation of the arithmetic mean from grouped data

Example 7:

Where the data have been grouped in a frequency distribution,

such as is shown in Table 6, modifications are necessary to
formula (9).
Table 6:

Frequency distribution of ages of ^0 school children

¥

Age

Frequency

8
6
8 <10
10 < 12
12 < 14
14 < 16

30

Total

90

23
22
12

3

If the reader tried to apply formula (9) to this data, he or
she would immediately wonder:
of X (age) to use?

what are the appropriate values

To solve this problem, a numerical value.

representative of. each interval must be chosen.

The assumption

is made that observed ages in each interval are evenly spread
throughout the interval.

So

(in the absence of information to

the contrary), we take the class mid-point as the representative

value.

Hence Table 6 is re-written as follows:

Table 7
frequency
f.

eless midpoints
x.
J

f ,x.

30
23
22
12

7
9
11
13
15

210
207
242
156

J

6 < 8
8 <^10
10 < 12
12 <14
14 < 16

3
k

J J

us
k

f .x.=?6o
j=l 3 3

The arithmetic mean, x, is now calculated from the following
formula:

k
3=1
k
3=1

f .x.
3 3

(10)

f.
J

4

where:
k
>1

f. = n
J

k

3 = 1,

number of classes

Substituting in formula (10) the sums from Table 7:

x

860
90

9.6 years

4.2.2

In the event that one, or both, of the lowest and highest

class intervals are open-ended, (see Examples 2 and 3 and para.
3*1*2 above for examples of open-ended class intervals) the

statistician must make assumptions (in the lack of further
information) about the respective lower and upper class limits.

Having made these explicit assumptions, the necessary class

midpoints may then be calculated.

4.2.3

The arithmetic mean is one, very important, measure of

central tendency.

Its usefulness is underlined by its following

properties:

it has a clear intuitive meaning
it is easy to calculate

it uses all the observations
it is widely used in comparing frequency distributions*

w

central tendencies.

The reader will have come across

many such examples.

Average educational attainment

scores. average ages, average class size, average

spent in school, are but four examples of

time

comparative summary statistics used both to describe
situations and to investigate possible relationships.

%

Other measures of central tendency are, however, also available for

They include the median and the mode.

particular purposes

EXERCISE 5

The reader should now do Exercise 5

4.3
4.3.1

*

The median

The median is the middle value of a set of values of a

variable where the values are placed in order of magnitude.

Thus,

for example, if eleven individuals were ordered by their age:

2, 3, 7, 8, 9, 12, 14, 15, 16, 19, 20
the median age is 12.

For there are 5 younger and 5 older.

example refers to an odd number of values.

If there were an even

number of values, say:
6, 8, 14, 17
the median is calculated by taking the arithmetic mean of the
middle two values.

In this example, the median equals:

8 > 14
2

11 years

This

4-3.2

When dealing with frequency distributions, the median

value is that value above and below which lie 50% of all the

values•

Referring back to Figure 3, we can see that one way of

finding the median of a frequency distribution is to use the
percentage ogive.

A horizontal line has been drawn from the
4*

50% point on the cumulative percentage frequency axis.

A-

perpendicular drawn from the point at which the line cuts the

ogive shows the median age to be 26.8 years.

4-3-3

The advantage of the median as an averaging measure is that

it overcomes a problem which the arithmetic mean cannot avoid.
Relatively very large or very small figures in a set of values

can ’’distort” an arithmetic mean from being a ”representat ive”

statistic.
4. L) The mode
4.Z/.1

The final measure of central tendency examined here is the

mode.

It is, simply, the most frequently occurring value in a

set of values.

Thus, for example, the mode in a series such as

the following:
2,4,12,4,17,7,4

is 4, since it occurs most frequently.

In graphical frequency

distributions the mode is easily recognisable.

Referring

to Figure 4, the modes are the values of the variable at which

the frequency curves are at their peaks.

By definition, a set

of data may have only one mode.

frequency distributions

fryare loosely referred to as bimodal and

multi-modal respectively.

peaks;

In the former there are two distinct

in the latter, several.

Ji.

4.£.2

The mode is less widely used than the arithmetic mean
It shar.s the median’s advantage of not bein.7

or median.

influenced by very high j>r low values.

But it is not very

useful for further mathematical treatment;

it has no clear

mathematical formula to define it.

EXERCISE 6

The reader should now

&

OP

S’* I

LI

do

£>U

Exercise 6, on page

7^

■

v&hgz,

The measures of central tendency discussed, particularly

the arithmetic mean, are single numbers which summarise one
important property of a frequency distribution.

Another property

of a set of data is its spread or disoersion.

Figure 5 shows the

importance of defining a measure of dispersion in order to
distinguish and compare frequency distributions.

The upper of

the two diagrams shows two frequency distributions having the
same arithmetic mean x, but nevertheless having very different
They have markedly differing dispersions, denoted

appearances.

here by the symbols s

1

and s_.

The lower of the two diagrams

shows, conversely, tworfrequency distributions having different
means, X1 and x2;

but they have an identical dispersion, s.

5^-2 How can a measure of the dispersion, s, of a set of data
be developed?

the range.

Perhaps the most obvious measure of dispersion is

The range is defined as the difference between the

largest and smallest figures of a set of data.

But, whilst

useful for some purposes, the range is determined by only these

two, extreme, values.
in between.

It is not affected by any of-values lying

A good measure of dispersion would take account of

1

Figure 5

-r-r-p—-i

' "T”. p'

. ;..L Ji I

!

J_;. p- -

:

r- i

•; ' •'

; - :. h

i

.... r

! •I

’ V

Diagram showing two symmetrical frequency
distributions having an identical mean x
but two different dispersions,

i

i

and

i

i

X

Diagram showing two symmetrical frequency

distributions having two different means
and xo but the same dispersion s.
0

I
I

i

X1

I

X2

31.

all the observed values of the variable.

5^2- The standard deviation

The most commonly used measure of dispersion of a set of
data is the standard deviation.

s=

/T

V—

X.
1

The formula for this is:

-I)2

(11)

n

It is usually referred to by the letter "s”.

Why, the reader may well ask, this unpleasant-looking
formula?

Let us say it in words:

it is the square root of the

sum of the squared deviations of all the values from their

arithmetic mean, averaged by dividing by the number of values. n.

The fundamental idea, therefore, is to measure dispersion
by measuring the deviations of all the values of the variable

from their average value.

The wider the spread, the greater would

we wish our summary measure of dispersion to be.

But if the actual.

signed, deviations from the mean were simply summed, the total
would be found to be:

arithmetic mean.

zero

This is simply a property of the

Take, for example, the following 5 values

(numbers- of new entrants, in thousands, to the primary level of

education in Togo 1970-1974):
45-,53,56,53,56

The mean of these numbers is 52.6.

If the deviations from these

numbers are listed, they are:

(45-52.6), (53-52.6), (56-52.6), (53-52.6), (56-52.6)
which is:

-7.6, 0.4, 3.4, 0.4, 3.4
Adding the deviations. we see that they sum to zero.

The sum of

33.

the (signed) deviations from the arithmetic menu always equals
zero.

5.2.4

If we simply ignored the sign of the deviations,w

could consider the sum of the absolute deviations:
(7.6 ♦ 0.4 + 3.4 * 0.4 + 3.4)

15.2
and use this, divided by n=5, as our measure of dispersion:
15.2
5

= 3.04 thousands
This is known as the mean absolute deviation of a set of data. Where

lxi - ’I

(all

the mean absolute deviation is:

i.

I
I...

(12)

lall

i»l

71
Note that the Habsolute value" of any number a is written

|a|

It means, in effect, that we always regard it as positive;

any

.

negative sign is ignored.

5.3

The variance

L
5.3.1

This measure has mathematicsJdrawbacks, however, which arise

mainly from our having, somewhat arbitrarily, ignored the signs of the
deviations.

If instead we eliminate the negative signs by summing

the squared deviations and averaging them, we obtain a measure of
dispersion called the variance:

n

(di)2

*
i=l

n
n

(x .-x)
1

i=l

n

2

(13)

Example g:

calculation of the variance

Let us calculate the variance of the figures given above,
referring to new primary school entrants^in Togo, 1970-1974.

We set up a working table as below:
Table

9

$

Xi

(x.-x)

(x^—x)

45
53
56
53
56

-7.6
*0.4
+3.4
+0.4
+3.4

57.76
0.16
11.56
0.16
11.56

i

2(x.~x)2=81.20

Z x;=263
x=52.6

The variance is, from (13^ above:
81.20
5

« 16.24 (thousand pupils)

2

13115 measure of dispersion (compare it with the range
(11 thousand) and the mean deviation (3.04 thousand)^ is,

however, measured in the rather strange units of (thousands of
pupils):

a peculiar and hard-to-comprehend measure!

In order

to reduce the variance to the original unit of measurement —
thousands of pupils - we therefore take the square root and

obtain:
«

(11)

1

i=l

n

which is our definition of the standard deviation as presented

above.

Using the result from Example 8 above:

s = V 16.24
s =

4.03 thousand pupils

Ve have, to summarise, four measures of the dispersion

of the set of 5 figures presented:

r~

range

11 thousand

mean
deviation

3.04 thousand

variance

16.24 (thousand)

standard
deviation

4.03 thousand

x2

/

I'

3.L) Of these alternative measures of dispersion, the standard

r

deviation is, for the majority of purposes, the most useful.
It has the following important properties:
it is straightforward to calculate

D

it uses all the observations
-

it measures dispersion from the arithmetic mean.
This gives a smaller measure than from any other

average, such as the median

in more advanced analysis of frequency distributions.
the standard deviation plays an important mathematical
role
-

L_

it allows comparisons of dispersion to.be made between

different frequency distributions (or within a

distribution as it changes over time).

5.^./

The last point in the list above raises the question

of the comparison of dispersions.

Say, for example, the

earnings received by male and female teachers were being^ compared.

The dispersion of male and female earnings could be an issue, of
some interest to various parties.

It would, however, be

potentially misleading simply to compare the standard deviation
of male teachers' earnings, sffl, with that of female teachers’

earnings, s^.

The difficulty arises because absolute dispersions

would be under comparison^

the mean salary of male teachers

Cx ) could be considerably higher than that of female teachers
m
(x^) -

If, for example,

and

s^

8f

the equality of the absolute measures of dispersion would be

misleading.

What is needed is a measure of relative dispersion.

This

is provided by the coefficient of variations, V, which is defined
as the standard deviation divided by the mean:
V

s
3

(14)

—■

X

Thus, in the case under discussion:
s
m
Vm
xm
and
vf

sf
xf

but

xa

and

xf

77

8

m

sf

vf

V

m

which is a more

comparison.

Generally, the

coefficient of variation is expressed in percentage terms.

Example 9:

calculation of the coefficient of variation

In a hypothetical country, male teachers earn, on average.

95.00 per week, with a variance of $100.00.

females are $75.00 and $64.00 respectively.
dispersion of male and female earnings.

s

m

=

$10.00

®f

=

$8.00

s

The figures for
Let us compare the

In absolute terms:

Sf

m

However, in relative terms, using the coefficient of variation,
V:
6

V
m

m

x 100%

x
m
10

X

25

100%

= 10.5 X
sf

vf

x 100%

Xf
x 100 %

8_
75

*

10.7%

vf

V

m

5^5 Calculation of standard deviation of grouped data

S-O. I

Where data are in grouped form, adjustment to formula
As before in calculating

(11) is necessary in calculating s«

the arithmetic mean from grouped data, the class mid-points must
be found.

Denoting class mid-points by x. and class frequencies
J

by f^,7the formula for s now becomes:
k

U.-7>
2
3

X
j=l

(15)

k

f.
3

j=l

it can be shown that this equals:
f k

X '3*3

d4)

j=l

3=1

n

n
Finally, if

dj

X

- A)

(x
J

where d. are the deviations of x. from 2A arbitrary constant
a
3
3
A, it can be shown that (1^) may be written:
k

z

2

' k

r

2

f .d.
3=1 3 3
n

f .d.
J J

3=1

(17)

n

This last ’’short-cut” formula (17) should be used in
calculations^J'^lx saves a considerable amount of time.

(SEE EXAMPLE ON THE FOLLOWING PAGE)

EXERCISE 7

The reader should now

JLo

Exercise 7 on page 7 6-

&

Example 10:

calculation of s with grouped data

In Table 9 below, years of service of a sample of teachers in
Afghanistan in 1973 are given.

Column (1) groups the years

senitce,

column (2) shows the class frequencies*

Columns (3) - (6) have

been calculated from these original data.

Class mid-points,

have been calculated (column (3) ).

An arbitrary constant,

A = 15, has then been chosen to ease calculation, and column (4)
d. = (x4 - A),
Next, columns (5) and (6) have
w
3
been calculated. Column (5) is (column (2) x column (4) )•

calculated:

I

I

Column (6) is (column (5) x column (4) )•

Finally the necessary

’-**■--* ’ The standard deviation is found to
sums have been calculateo^

I

be:
s » 6.45 years of service.
Table 9:

Years of service of a sample of teachers, Afghanistan
1973

(1)
'Yea.rs

(2)
frequency

X

f.
J

x.

3

d.
3

0 < 6
6 <12
12 <18
18 <24
24 <-30
30 <40

10990
6828
1402
880
668
185

3
9
15
21
27
35

-12
-6
0
6
12
20

Totals

20953

(3)
class mid-points

(4)
(x -A)

(5)

(6)

f .d.
3 3

f.d2
3 3

-131880
-40968
0
5280
8016
3700

1582560
245808
0
31680
96192
74000

155852

2030240

J

^f .d.
3 3

Hence, from (17):

s = /2030240
7 20953

s = 6.45 years of service.

1

Z-155852\2
20953 7

1^0.

6-

RELATIONSHIPS BETWEEN VARIABLES

Ci c

6.1 firjtxoCCA.

6.1.1

Up to now, we have considered techniques for the description

and sununarisation of distributions of a single variable.

As

indicated earlier however, the practical statistician will also
seek to analyse relationships between variables.

In considering

how variables relate to one another, it is valuable to make an

early distinction between relationships involving association and
those involving causation.

In ££-Z.^below we shall consider /A/ee-

important statistics (1) which measure association:
and

correlation.

covariance}

These measures are designed to quantify

how two variables vary, that is change, together (if at all).

No

assumptions will be made or inferred about any causation involved

between two variables.

6.1.2

However, in circumstances in which causal relationships are

hypothesised^the very important statistical technique of

regression

analysis is available for quantifying such relationships.

The methods of

6.2

6.2.1

regression are discussed in 6/7*-

.

The importance of the theory of probability
It was pointed out in the Introduction that there exists a

body of statistical theory of a relatively formal nature founded in
*

the mathematical theory of probability.

to be discussed here.

That theory is too advanced

Nevertheless, it should be pointed out at

this stage that further development than is possible here of

statistical association and regression analysis would be founded

in an explicit probability framework.

Such a framework enables

the statistician formally to incorporate the unavoidable existence
(1)

See para. 6.2.2. below

hl

of errors from various sources into his analyses.
error^include

human

and data-processing.

Sources of

made in the process of actual^cofleetion

Unavoidable errors also occur in sampling.

Much of the formal theory of probability has been developed in

order to enable the statistician to quantify the probable size of

errors resulting from random (1) sampling from populations.

By

sampling we mean collecting data on only a part of a population,
population” not necessarily referring to people, but to any

totality under investigation

r

6.2.2

The statistician will take a sample or samples from a

population, perhaps in the course of a survey, in order to draw

probability inferences from his samples about unknown population
quantities in which he or she is interested (such as means, variances,

etc.).

The unknown characteristics of a population are referred to

as the population parameters.

The known corresponding sample

quantities are called sample statistics (or, briefly, statistics).

FI

Such sample statistics are used in making probability estimates of
the unknown population parameters.

If the sample is taken in a

random fashion, the statistician will be able to make estimates of

L

the population parameters which incorporate quantified statements

about the probability of their accuracy.

6.2.3

Of course, the statistician danot Tzly o/\. random sample

surveys on/y .jv-ir

4aNevertheless, in assessing the

relationships between variables (as we discuss below), he or she
may well make explicit assumptions about the probabilistic behaviour
of the errors in the

(1)

A.

which are to be

By doing so,

A (simple) random sample is one in which each member of the

sample has an equal, non-zero chance of selection.

)

it is possible to improve on simply making point estimates (i.e.,
estimates composed of one, particular figure) of parameters, wk<ck g/ve
no indication of the estimated likely accuracy.

The statistician.

utilising formal probability theory together with certain explicit
assumptions about the probability distributions of the errors

involved, can make interval estimates*

That is, it is possible to

make quantified statements about the probability that the population
parameter under investigation lies in a specified range around
the known sample statistic.

6.2.4

In the remainder of Section 6, we cannot treat the ana lysis

of correlation and regression within an explicit framework of
probability theory.

however, would.

A more advanced treatment of these topics,

The reader should nevertheless bear in mind that

the statistics developed below, such as the coefficient of linear
correlation and the coefficients of linear regression, when applied
to actual data

point estimates of the unknown population

parameters (1).

In our introductory treatment which follows, our

estimates appear exact.

In a more advanced statistical treatment,

they would have attached to them quantified statements of probability

about their accuracy.
6.3

6.3.1

Association between variables
What is meant by a statistical ’’association” between two

variables?

The underlying idea is very simple.

Assume that we have

n pairs of values of two variables, x and y:

(x^).
(1)

(x2,y2),.4x.,yi)

,(x ,
’

n

Unless, of course, the data are from a complete enumeration
(or census) of a population.

(Even then, human error can creep

in, though sampling error is absent).

As discussed in the Chapter on Basic Mathematics, Section 6 on

graphs, these pairs may be plotted on a graph.

The pairs of

values may be obtained by observation over a period of time (time
or-

series

hey may be observed at

point in time (cross-

sectional analysis).
6.3.2

For example, in a time series analysis of the association

between two variables, at each time period t (t = 1, ..., n, where

there are n time-periods), xt and y might be observed, and a
V
pair of values (x. ,y. ) obtained.
t t

These n pairs could then be

plotted on a graph in a so-called scatter diagram.

A specific

example could be to observe, over a period of time, n pairs of
values of the crude birth rate (x) and income per head (y), and

L

plot them on a scatter diagram.

6.3.3

In a cross-sectional analysis, n pairs of values are observed

of the two variables at one point in time.

I

For example, the crude

birth rate (x) and income per head (y) could be observed in n
different regions of a country.

n pairs of values would again

result, and could be plotted on a scatter diagram.

6.3.4

Figure 6 presents the 5 general patterns which may result

from plotting pairs of observations (x^,y^.).

should be noted:
scatter 1

The following points

9

shows a positive association;

as

x increases, so (generally) does y
scatter 2

shows a negative association; as
x increases, y (generally) decreases

scatter 3

shows a perfect positive linear

association

_______
ifeT.-rRjzSEzj

id i 3
IRidate i ■! RRtetetete'
117
iWnr'm
^ d3sd; BlEbBit-E Ei fc

:7-7n: r

g#

4 feWtete# WfeteRfeHte
fewtetw
4e -KOpS
WdLLiS stefeRtete
iLLdtebfeted#
_______
i
dfeifeiW
fe7;7teteaK.-#tes
gSjdLgggg:
glfgtrg^lilBLLrg
________
.
____ teteteteRddfeOte
- hfedfeter S H~ • l-StHfr:; jrH
feRfetefeaks^testetetetegl# BpOfeL
ggteptedtej#Efe##Sd^d:;grwr4-gtg jU4l3
feswijtetegEtedtedtegte^tedliSSSO
III
Lnistj
tetetes# a# j jRteilatee fei##tet
LgS-tL
" teteigRR 3d~s
LEZZ
n rH ItH SB
gtiii
- Rtewte

-—dirts

t

'Ertrf

7#r7pz

t- '•!• H. rTrlr nil

zrrr
IRtetete#fe xg^EEfefe
fww
” -tes
3-S-:
Q^EEEisMis^
tetedpS
OteZJ - ..' THiS#n~3^SrdEd.-ftetej. >*?'>. r ptei
LdtefteSfetetefe
Sfe ggrSS?: J hSsjiriSiarta
p.ii;Sup
.
...........
jsawte-rzr -p.—arww zzz:: -. :teU wzzd-R

r-^- ~ i -! - r-<-T- r- J~44?|'1'^''*‘Zi 7^ —

■ ;3#4te Egtexz::
... fetetebbtes dfenT=fetefete fete
WteW^p.
tetefeWife##
L, ,.-d E; dOtewRRdSSsElhH
fetetetetetedEtetesp.W'*#fe
S#ggsStei#
W&S;r

teste#

■• i ■ •' f' J' <-rr •-*-^
:'>~*'r^‘[ *7

■ktSiSfe-

.

WaWtete
r-3-# -d7*-- ;?■* r;: rBp

i

fflJ-J - --• ■■ '

fe tefe^fegfete i g
SiW
teSst^dte

#s##~#SaItefete
''

Egths

zw ffixB-'r4'-ISdltet:tdbltrr ;

.

L. .■BpSpPPfet Eppdjpk:pPOOw PPg

Tizr^S yferg’ SL#

Sc

SO

r~" ~'

ZsteStetSrSwterF L: J: ::te

ttn W
1 Ojwi
ZlS.jl-.rS F

tetetetefeteW
t
„, BSzSSr-rT r.Stefete
teteteLaitefehte^dsfetetedfegfe
sj#r.u
4r4TSi.fe#p#F#
tefe 7gpr» '7-'.
gteteg-RzgrtteHd
tetefestete##?
;333WB3
n-: ^TrWirzrTTFFrrzqxiii^f ^pipxxq-Trt:
pte-.,. j wd-#-■ wd^dF'
SadteO Ot
SdOdtefete
teRfe'<^gfete Ewtetefebte
Stefe
1 ... dw
tew
iBh#
teWfefefesgte
teteRted
gfebteSgteRtefeas
w
L
#•-##
- zztetefe
,i-| 4-r4frd U-teTy fed- r~! •—tfe;—.!]-^'.p.!.fefe..felU■WL-S&fe7jt—
^*#7- :£

•i-i-LU-Tir

rrE #.7

SEEdd XT4:S T#7

-=4~

±77.r~:
xH .dzd Edited

a:::.

*n
S

o

rddte-b-^rd-.-teHE~-~: rrtetejd- ziite'
WfeteTtefefe;
o
ibfeteteifete
■rr#rpSF#r=tr
tgfefe'gj.
. , fefewte
Spf
tetefetete#teddbMbZdfetefete#W* ■tgL
BL
teSESrjEHH
SXte___
tepgfeRtebbkH7
. Tr^nSTd##### £&7hS
dESfefefe^^wtefeafewteE
teWbRdH^ife
r-r-"- 4 n'l m Him’nMH1
j | 1 i; 1 1 1 f : ■ ;'

#1#

I

■

3 tetegfetetea

.r—rgte:; -jOtgO: ur.:
trzjS
tfBtm.
3Bdte3~.A3^S ■ terd,1:,1- :,J

dteteigst
fefe^|te_____ te1feteggg#rtejfeW

#esesE##

S#

■as
*g~fs

ss^EsrssteEStefete teteteS

.fete

pEfete
SteRstei? ogtefegSiBfesrSdgggSr

L.-

ggStete

s

• i-iJ i-l^-1

fejd -ddStj’l-^ :1 Sfetefete
■ tejteteisJ-te-Mg'te I,.,.,..

M ' 1 f 1 r r ■ p^HgW^n'

teteRfetetefefe^

# 7. r rti s#-

WfteteiW

,■ irl
tefeaStetete

.fetetetefefe-Wfete P~
""'■'tedO fefetefesfeteteteafetetetefete
3—

gdgEE’

gL-ZZ

tetesg
___ ^^bbkb#tetetefeEfete?sb

tetetebfegfe#'fetefe jgpteg&OlB tegbtetitebtefe fegfesfetei
WB
feted rtete# teterdS
■ttefebfetebteEfe fedgdteibtdte^fe
ssgsgsgiH Sr^feSfe s

tedrzr:w # - tetep

tetefefe

““
_■ ICT. L-S st
ffiii li""!

#g#fe

KHt Hyr1

rrr.’ fcz

3

pHnS
3te 331 SSh
gt!EtE3
rwd
•» I jj j-

g

g##§?5E#agS

||ibfeEteg!^Ri£|teg^w.w g#wte#####g

i|iEWaste#

•tzg Hgw7
r.rrrSBS^

| 'tetkgSg# =#teteW#si#g|g

.

j p~E
xQd:
ssaiEg
~~te|#tegte#
Ew##wteite#te
te^ WWtefci-W;
B
siMisggt^O tewteB##i£
__________
__
.
,
L^-dd
7?
3^3
^tefete' ST^SST:
feR##^ pteteWteltefeRte
:#33#g
Wafefefetedtetetete
W####WteBR
WwtegSte# fe#
SfeteM
teteteR gte| gfeteR dw#dwdtetefefe-ij
few tgsttega Sfefe xHtizirr
SteSteg
i#g|igggj^i
j IjWfetetelifetesr 'I l"i S ?*' 1 *! i'. T
fe^tetete Tt trnSt S-H+hn^- itetetete £teS teW#
^tti^LfeteLdte#
OteBteW

4X rr
^|xr-S=4xxh.-t
r-iirp
^-—r

►—LI

. - -..

h i

lidBtei! teter#:■ iHflHteEEK:
;:: ■ L terfer"’

tt•7'|r-tHtHH

h

Tteteates

; tj piSr 'tnrfes

IE#

swgiw

trc7
teteWtefeste
r:

te|d#gRfe
M
-w
##Ww O
Wig;##

7TT33#^.. “L

iSfej#
; tfer#
St:
rSS u#xrr[t:

^fefetestejste-

SrtefeOfeW
?:d#tesfe—W dw Mte
teRfeRRtetetefeF
...,T.teh#Ri#S&tefe
si:;] ttHBWBdteitedd##dRfew
s##te#
~~teWfete :ixSnJSuT
i.ur—:; '7 7~-‘pt--------- r—.................... ........

scatter 4

shows a perfect negative linear
association

scatter 5

shows no, or very little, linear
association

6.3.5

The underlying meaning of association is simply whether

x and y vary around their respective means together.

If, where x^

is greater or less than x, then the other member of the pair y^
also tends to be respectively greater or less than y, we define a

positive association.

For example, in scatter 1, where y*

exceeds y, corresponding values of x^ also generally exceed x.

Similarly when y^ is less than y, x^ is generally less than x.
We have a positive association.

Scatter 2 shows a negative

association, because when x. is above (or below) its mean, y^
i

tends to be below (or above) its mean.

Scattes 3 and 4 show

perfect positive and negative linear associations respectively,
for given changes in x are always associated with a constant change
i. •

in y (and vice-versa).

L

Scatter 5 shows no, or very little, evidence

of any linear association at all between x and y.

6.4
L_

Covariance

6.4.1

How may we measure this fundamental idea of strength of

association?

The simplest useful statistic is the covariance.

This is defined as follows:

n

(\-x)(yi-y)

( S3 )

i=l

n
*

6.4.2

Formula ( If ) tells us to multiplyand then average

the deviations of the x and y values from their respective
means.

Consider Figure 7.

If we apply formula ( /£ ), we will

'O

swaBaaff;
miWSBW
i;E- ••hi: gr
iu: >1H

W iiii

Li
TL Ll

i

M

IBiil^
WtlEsB
iph
1ft
Bl
pfeftSi
MBi
■«|.■|IB&
u lit
I
1
B ®!Mg fe ll felfe ifti |fe 11 g ft1 ft J ft ft ilillililiww
fen
ft
I
I ihffi
Bpfffe
f#ilU
M Ofc I;
Btetosi
1 fes felipipi
ifiiilii
ftiliii
HIS
1
HIBHIS!
liiiiiM «B«S
»BiiB 0
0zfflB 14
piSSil
W
■■'BE'
JI
BBSS"ESI
•aih 11 «fili fellg® wijfc
W
1
B
r
....
n
.
®BHEp5«|WIip
iiife
8i
111!
:
....
g|l
Matt
iiBiiL
WHO
H11WIPL,
ft ft ft® 1 MjSHl111 iSI
1 Fn B f
1
rt fefePl

■

Pl

ill
iii

r

i !th'Pl
!! Pl

li 1 Fl

Mp- Mfenw
iih h nMphMMr
PE

p-‘ p

l-i

-"

Kj Sr p
K ip pglBgl
RSwl

ffimP
mm
msi

TRF K*i
BI

ffi MS

..hi

:it

EH

^LipiL. Bl Ki

PP

-EH
• iffi Hi EE-ii
•

tp Pj; B y| ipiipEHE
;•! itr p f
PlSll
- J*--

hw

gEtLh'ili_
LHii

:■ 11 ...

Rt! L.H hr Ik MpPE

RH
iiiipfeTH

li-U

:• ir

t

KU
!! ! ru

mm
,■ 1
pm LiP

• I •

BE

HH

in! fe M !
il!fe‘u
Rph
?-r4“
t-rri-tp
.:! • •

i J il

r "

Pr

Bl IKfIB

p IrH
! H-i’ tri i- i-t-f-i

J L

--- 't'lf*--------

PRii gi&fenl.. .J.,. .......... _H..

i-*----

.11
;il+ iRplTPImn

.,:.PP

Si
J*!sSs
WP' SB
ft Sift!
HlrHilllS • Ill
ws®
..fe
Ip
3 tfB h Kii s-PM
ip Lil ImP pw
f
ijjjj IB Hip
:*.a
.; t

1

ippllB

iiii!
1 T. ft ft fti ■h■ hi

■•■uLHH tp: ::ti •• -1, •

feKS KiK
"t+ m - •^'FPhH-l-ri-bH-

Hl ItB Ht pt PI -H-u am. ::n _

ggjBgiK Hll'y i QS itJ
4xu ut i n-j;; hnixii
Kl j,?p1 jp

iip

; t * t- ; r r • : r: t r: • ; •

Mil

w:

KM

11 »
OWirll

■R p
fpfi-p p! ptflTH

ii ; p Hi

L :i~L iTh tih-

eifppi

flii: -il
1
HR ^rt
ipn Epp h 1
PPMpppm

."•n|!ir:

Hi:

Ut ifti ijil

44 m

rhi

qt: iziimtn mipijKtjp
.,p .Xi
-Uhi mu
irh -_;r: Em rt H-h-r:1 ‘rri ‘-rij

ppp«ifiSpmE-rn dRfrrd R.H Sf

Wlpp

■ i.h ILL

!-■

4

Figure
F igure 7

...........

g if 01

■ ill.
ilrmi

Kl t

it Hl.! j:'.;

R;

LU X-d
’

I hit •_;..

W;.11;
Li. 1 mrt
thlpft
_L!L
i-L-i■tijt'ty

sB a■IH

liii ftfe’

w

LkIJ'

HE HE
IllflSBIHifh
ap..h p jh

1

iiriikh
nPt

p IJ;; pl Hi.
di H1 Pii
hi! iip III 1Bl PP; ® R i ’[Liiih
, .tnll
1 Up
.1
. .piKp
1 HU
BP
IB hfep^
P<ml
tt ng ih -RH
:4.p;tT5:

pp

STtfit

Ri:

, ,,;,'T-

,uri.j h-’i iR:

LH

:.’!

hR

KF IB
iEL
Pirn Pi
Rt{
t
Bi :m: ii:

it >
P ill

01 PtP
3 H
iffl
•th Hi!
mil Eg EE
lit!

iP
f Pl

-*1- p

h i! TH
i:H
.... •‘•it Hl
shT Hl p P
;,.pg .iK
.it:
ph
1 i:.: ijl7
K&rhp tt:T
1
iii

lai
fl
p p ‘ sMil
1 tii'm p
IP
111
ft
lip
p p ‘K
i»W
ii
‘
:
Ml
HP
1
.idl
pit
Bl 1
ini
IfsSi
E
ER
1
iBIiifiBSB®
fHH
HT llHp
i! t;l M
hH
iKjftp 111 J
'EBBSE1
’iH K:
ihife
PM
p
t-LLiipt. ttibij! m
W
eb
a
M,.
u
.
L e
Ki-ppp
K
Hife
1
Pt
L
I
r’i 1 .
;fesfe
p
iK
p
EliEiEP
SpoM
•:
Pl
;hf TP
m nidi'll
p IL
tRl .l-q EH Lp
;4:.lj
Wffl'figlTtrn^n -H: fpl
PPM :- SL iiH HSPill
_ Ptr
t .1 ‘ip
•p P.
w
St
pm
h?p
;
p
mm:
1 !HI 1 11
h Li
P {I'
p HR
EHHJiii
P
MpP
Pbi
WBH'
pp
fM
is
1
i::'!;.;
rp;th
!L
4B
■ill.,
r
;
1'1
ifiB
mm.
s
E
m
H
i
i
1 w pF
p. _h__ Bdp -KKM
fefp
PkH
71;
RR
id:
’
EHag
EffiEggtiM ■feL.
liftia 1
|- -IK.
1 .ppgfell 81!ipi
pip- 1
MM: fipz
:'!feE
fciB
Ilia
■
iK.
1
iiri th Li
.P
pfibili
g
Mg
PR
lip
rggUpg
u:
Hit
t
p
:.
HR
I.
PI'
1
>&
TFM-ra-V
p
SiiWPl
nftte :Sp
BldtiM 11 fell
K:
mit-m. ip mm
LK
itP
.g
;P#
hl
Silin JI Hr bp.
p-iftiliBo
H
p
Bi
P: HW
1 BE
: :HH RH B; l -RK Hip:
;lpp
Ki fsiHPP P- Ill
....
Pit
11
o
p:
KUi'
PPPRm
RK
i£|; trrj iSL :trt itt:
KfMMB; Br
:1ml
PwmLPi
FHF
PL K* sp llPai
IP
ip
11 TH7 ~g.
41^ Pl ip
M
?t
Bp
: i it
&
■Pl-P mgs ip: pi
I
pp
p
ip
gz
pK
ii
life;
IB
11
p liji li.H
Pih
MP
Hhfp
S
f
11
pip
hl
HP Pi p liPhil'i
^ipp
ii+: St p IP tlfePIE Ph. ife B; TfeH-thH? 1 i'
-P 1
Mg
Ipp
lllip ■■KlKip
E
P; giO Bp
T TT 1. pH:[
K
IP
__
PIKBP- MlWPip
ipgP
B
P
®i
IB
i Hl liR. P:. : ?r
k ^PP
£
UH .P
■
:fa.|.
p:
H
a H
P

lit

pfelli g

p p:

'ddfe

I

1 k.i.i•p nUi iu
1K

Plil

•ii:.

pt t th

oil®,-

■

t:f;

Pp_j
zlELth. -L-l
• f •• !•■

ihr

f

IftHB

it.:

.. .ftti:

■ • • ‘ trp*

711 p Pf
;.H 1 : !.;.-: L_'

•SB
asBiojgfife

I

.... II.

q:-.

t th

h. rp-

'it;

r;:l ft:! :.ll-

tii:

p

iM
*• i-llT

X.p

LLLii

I

ilp .:..

tziix'.;.:.:

~rt

i

l

Lil hit
r.-t ph

t:

tit:

1:-.

j ..■ 11

j;

.ri-.±
.‘th

Pt
.... .,i. P
.t>. ‘ !. ..'H hr,

tT“

TT

•ttr.-p h;

tip

t

4. LX XL X_lv

pp

ziij*

Fr t-tl H ?-,rFri• t I: r‘ -? ♦-►

1:1

TttT

i:i.

- p 1m7

w

1 pt p- ' ;
llPHhlpl
:?! Hph: .-.‘I

I

th

th

PI ■

pH -Fp

EfSja

amw

«iai
Eh

M-i h r.

;*b n;.:

■

Jit

rtt 7'-- *
:-i::

•th! i

• II; TT.. TTTt TttT ‘-hi Hit

••“ ~-H”M mhp: pdL!;i. 42:2 irrf: lit

•■"• ‘?tl Err-

:.h: :

U-

® 1
h

h

pip

-P

'__ 1

,_J

iH

h

I

ME
t:

HR

4^

get positive products in sectors 2 and 4, and negative products
in sectors 1 and 3.

If these answers are susmed and averaged

according to formula (

):

- when all or m^st of the points are in sectors 2 and

4, as in scatter 1 of Figure 6, a high positive sub
is obtained

£

- when all or most of the points are in sectors 1 and

i

3, a high negative sum is obtained

- when some of the points are in 1 and 3, and some in
2 and 4, a small positive or negative sum results.

since the positive and negative items will tend
to ^cancel out.
If we were to calculate only the covariance, a problem would

6<4.3

be that the size of^absolute number which would result would

depend both on the number of observations, n, and the specific

location of the data with respect to the means*

r
L
i

Ideally, we would

like to transform the absolute answer into a relative foim in order
to allow comparisons of linear association in differing situations*
The coefficient of linear correlation

r

6.5

L.

6.5.1

|'

by the standard deviation of the x and the standard deviation of

It can be shown that if we divide the covariance of x and y

the y distributions; and finally average the result, we obtain the

coefficient of linear correlation (1), r:
n

r

=

__
_
(xi-x)(yi-y)

i=l

(

(xi-x)2(yi-y)2

(1)

i

Sometimes called the Pearsonian^or product-moment, coefficient
of linear correlation.

tig.

This expression has the extremely important property that:
r < +1

-1

r is a dimensionless number, i.e., it is a pure number with no
associated units-

6-5.2

In terms of the scatter diagrams presented in Figure 6:
scatter 1

indicates positive linear correlation;
0 <r < 1

scatter 2

indicates negative linear correlatioo?

-K r <0

scatter 3

indicates perfect positive linear correlation;
r = ’f’l

scatter 4

indicates perfect negative linear correlation;
r =*1

scatter 5

indicates zero (or approximately zero)

linear correlation;
6.5*3

It can be shown that, for more rapid computational purposes.
) reduces to:

formula (

n
n

xiy

yi

Uo)

n

2
yi

i=l

6.

Two important points about r should be remembered,

it is simply a measure of^association.
causal relationship.

First,

It is not a measure of a

Two variables may have a very high correlation,

but may have no causal connection whatever.

For example, the

number of pupils enrolled in primary school in Brazil and the
price of coffee have shown a positive correlation over recent

years.

But this is no evidence of a causal relationship^

6^.5” Secondly, r is a measure of linear association.

Two

variables may be highly associated in a non-linear way.

But formula

(/^) will not necessarily indicate any association.
Example 11;

i.

calculation of correlation coefficient, r

Assume that we wish to calculate the coefficient of correlation
between real income per capita and the gross primary level enrolment

I.. •

ratio in a country over a period of 5 years.

Table 10 shows data

for a hypothetical country.

r

Table 10:

real income per capita

and gross primary level

enrolment ratio over a 5-year period

Year

1
2
3
4
5

Real income per
capita (thousands
of units of currency)

gross primary level
enrolment ratio
(%)

x.

i

yi

10
11
14
13
17

61
62
66
66
70

Looking at formula Wo) for r, we see that we need to set
out a working table as follows:

Sb.
Table 11
(1)

(2)

(3)

(4)

X.
1

yi

x.y.

X.
1

10

61

610

100

3721

11

62

682

121

3844

14

66

924

196

4356

13

66

858

169

4356

17

70

1190

289

4900

2
i

Zyi2

875

21177

Vi

i
65

325

Hence, from (-10 ):

4264

r =

(5)

2

2

yi

5(4264) - 65(325)
-

-—

Vg(875) - (65)^[s(2117^-(325)^
0.99 (to two decimal places)
The example shows that there is a positive, and very high,
linear correlation between real per capita income (x) and
the gross primary level enrolment ratio (y).

It does not

of itself demonstrate that x causes y, or that y causes x
(or indeed that they are both caused by some third factor).

6.6
6.6.1

Spearman’s coefficient of rank correlation
In some circumstances it may not be possible to obtain

the precise values of variables, or for other reasons it nay-

only be possible to rank (i.e. list in order) the variables in terms
of size, importance, placing or some other attribute.
be ranked using the numbers 1, 2, .

n.

The data may

If two variables x and

y are ranked in such a way, the coefficient of rank correlation

between x and y is given by:

Si
n
r

where:

d.

i

rank

=

6£ d 2

i

(-ZO

i=l
n(n2-l)

difference between ranks of corresponding values
of x and y

n

=

number of pairs (x i’yi)
n

i

The coefficient calculated from formula (^/ ) is known as Spearman * s
rank correlation coefficient.
correlation. r

Example 12:

As with the coefficient of linear

. lies between -1 and *1.
rank

rank correlation

Assume we wished to calculate the rank correlation between the
proportion of total population living in urban areas (x) and the

pupil-teacher ratio

countries in 1975.

the elementary level (y), in seven

below shows the ranking of the

Table

seven countries for each variable and two further columns giving
the absolute difference d and the squared difference d , between
rankings.
Table 12:

rankings of seven countries by percentage urban population

(x) and elementary level pupil-teacher ratio (y), 1975

Rank

d

d.2

X.
1

yi

i
(x.i -yi)

Afghanistan

6

4

2

4

Congo

1

1

0

0

India

3

2

1

1

Indonesia

4

5

1

1

Nepal

7

7

0

0

Philippines

2

6

-4

16

Sudan

5

3

2

4

Country

i

K ■26
Applying formula (A/ ):

r

. x
rank

r

. =
rank

6(26)
7(49-1)

1
0.54

* j , Gu

nTS

£l

EXERCISE 8
The reader should nov do Exercise 8 on page 7T.

6>7

Causal relationships:

6.7.1

regression analysis

Regression analysis is one of the most powerful tools at

the statistician’s disposal for quantifying causal relationships .
The reader will recall the discussion of mathematical functions in
Section 5 of the Basic Mathematics Chapter, involving two or more

variables.

The general equation of a linear function was introduced:

(JU-)

y = a fb x

In this expression, an example of which was- plotted as a graph:
y

is the ’’dependent” variable

x

^s the ’’independent” variable

a and b

6.7.2

are the ’’coefficients”

The primary function of regression analysis is to estimate

the line which in some sense (defined below) best fits the scatter
of data that has been observed.

In the real world, observed data.

if plotted on a scatter diagram, rarely lie conveniently on a
straight linei

But regression analysis offers a method of

calculating the coefficients, a and b, which are estimates of

the parameters of the underlying functional relationship hypothesisedto have given rise to the observed data points.

The reader will

recall, from the discussion of the general linear function CiZ),
that once a and b have been quantified, the equation represents

a unique straight line which may be drawn as a graph.

6.7.3

Let us proceed by example to explain the fundamental ideas

of regression.

Imagine that you

were investigating

the relationship between two variables:

t = time
and

y

intake rate of pupils into first grade of primary level

Assume you had the following table of data, collected over the 6 year

period 1968-1974:
Table 13:

Intake rate, 1968-1974

t.

Year

1

Intake rate(%)
yi

I .

1968

t1 = l

26.49

1969

to=2

y2 = 26.46

1970

t =3

y3

27.67

L.
1971

1972

t_=5
0

y5 = 30.97

1973

t6=6

y6 = 30.48

1974

I
L_

6.7.4
1

y4 = 29.32

*7

31.74

first step in investigating the relationhip between t

and

is to draw a "scatter diagram".

8.

Two things are immediately evident:

This is presented in Figure

- there is a positive correlation between t and y
- the data points appear to be reasonably well-approximated

by a straight line.

We make the hypothesis that a linear

functional relationship exists between two variables

It is important to note that, although we hypothesis an underlying

linear relationship between y and t, the observed data do not lie

0

Figure 18

(J

............ Scatter diagram of

............... !’

'

•j

1

y i - _

i ■

.i
4"“1 r^1—4—— •
4 . j -i .

n

■■■■!,

Intake rate

11

and time

J—

ICd
I
:

11

•i

!
• -J

I

1

luo

■I

i!
..J. ..

ilo
I/O
X
d

■.

Jo.o

I

L-.

•T

■5 il.o

Alo
Mo
II
►>
1-

0

i

^r

A

-T*

1

-r-

-r-

f

t =
Source:

see Table /3

7

6

time

7

IO

exactly on a straight line.

This is because^ in reality^ other

variables in addition to t have their influence on y.

However, we

make the assumption that their separate influences_are small and tend

to cancel out.

Thus, their net effect may therefore be regarded as

purely a chance variable, causing the random fluctuations of the
observed data points around the true linear relationship.

The

regression line is our estimate of the true, underlying linear

relationship.

r_J The question now arises: how do we actually fit a straight line
to this scatter of points?
fit?

Which is the ’’best” straight line to

These questions, the reader should by now appreciate, are

equivalent to asking:

what are the estimated values of the coefficients

a and b which will give us the best-fitting straight line?

I .

6.7.6

In other words, we wish to estimate the following equation:

I .

y = a + b t

(AJ)

Notice that the equation is not:

t = a + b y

(-2^)

In regression analysis, the dependent variable is on the left
hand side.

The ^explanatory^1 (independent) variable is on the

right hand side.

We do not believe that time is dependent on

the intake rate!

6.7.

How, then, are a and b to be calculated?

One, unsatisfactory

way would be to draw, by eye, that straight line through the
scatter of points which seemed, subjectively, to be the best-fitting

line.

This has the disadvantage that different persons (and even the

same person on different occasions) will obtain different results.

The method of least squares

6.S
6.8.1

A much more satisfactory, and widely-used, method is that

of estimating the so-called regression line by the technique of
ordinary least squares.

The method of ordinary least squares gives that

straight line which, when drawn through the scatter of points, minimises
the sum of the squares of the (vertical) deviations of the points from

the line.

6.8.2

It can be demonstrated, by mathematical techniques too

advanced to be used here, that the values of b and a which give

the line having this ’’best-fitting” property are defined by the

following formulae:
(y.-y)

b =

i=l___________

i(ti’7)2
i=l
a =
where

y

(54)

b t

i = 1, 2,

n

n = number of data points

6.7
6^.1

General formulae of the regression coefficients
When estimating the linear regression of y on x, that is,

the regression line:

y = a + b x

(2a)

the ordinary least squares formulae for a and b are:
h

i-x) (yi-y)
i=l______
(x^ -x)2

(X})

i=l

a

y - b x

(A?)

S’ z
For more rapid calculation, the following equivalent formula for

b may be used:
n

r

n
yi

b =
n

r

n

x
i=l

6.?.2

i

—

i=l 1

(2^)

2
i

It may be noted that the formula for the regression coefficient
e

b is equivalent to:

covariance of x and X
variance of x

that is,

I.
I•

I

■

L
1-

formula (/8)
formula (zj)

Example 13 ■

Estimation of regression line

Let us calculate a and b, given the data in Table 13 above*

shall first estimate b, using (2-7) above.

The following working

table is drawn up:

Table If

(1)

(2)

(3)

(4)

(5)

(6)

t.

yi

(t.-T)
1

(yj-y)

(ti-t)(y1-y)

(tj-T)2

1

26.49

-3

-2.53

7.59

9

2

26.46

-2

-2.56

5.12

4

3

27.67

-1

-1.35

1.35

1

4

29.32

0

0.30

0

0

5

30.97

1

1.95

1.95

1

6

30.48

2

1.46

2.92

4

7

31.74

3

2.72

8.16

9

^(ti-t)(yi-y)

Stj-t)2

27.09

28

1

23

203.13

t=4

y=29.02

Hence, from formula

:

b = 27.09
28.0

= 0.968

and from formula (26):
a = 29.02 - 0.968 (4)

= 25.15
Hence the estimated regression line is:

y = 25.15 + 0.968 t

We

^0)

IQ Interpretation of the estimated regression coefficients

The straight line, y = 25.15 + 0.968t, may now be drawn on

a graph.

This has been done in Figure 9.

only 2 points are necessary.

With a straight line,

Hence the line has been drawn from

the point at which it intercepts the vertical axis, (0,25.15) through
the point (4, 29.02).

The latter is the point of means (t, y).

All

regression lines pass through the point of the means of the variables.
F ’

The line has the property of all regression lines:

it minimises the

sum of squares of the vertical distances from the observed points to
the line.

That is, it is the line which minimises

n

-2

X/yry) •
i=l 1

It is in this sense the best-fitting line

6.ID.2

In the Basic iMathematics chapter we discussed the interpetation

of the coefficients of a linear equation , and if necessary the reader

should refer back to this.

6J0.3

L‘

c

intercepts the axis.

The line has been extrapolated back in

time from the period during which data were observed (t = 1,2, ...,7)

to the point (t=0, y=25.15).

6./0.4

I '

a is the point on the vertical axis at which the line

b is the slope, or gradient, of the regression line.

that in this case it is positive and equals 0.968.

interpreted as saying:

Note

This is to be

for a unit change in the variable t, there

is an average + 0.968 unit change in y.

Thus, for example, as time

changes by one year, the regression line tells us to expect an
average increase of 0.968 percentage points in the intake

L

rate.

In 3 years, we can expect a change of (3x0.968) percentage

points in the intake rate;

and so on.

EXERCISE 9

The reader should now do Exercise 9

oa

%

4

Figure 9
Regression of

F^f
I

-'.FF-jTrpTTHFF

' e^t.r;
-Lt£2M
•q.i/

-drt

r KO

H'.:.

J

f_L__

f;-1
1

.. i—
.■

I. « 1 ■-i-i' I

3 ! J

period OI
of
... pcFJioa

;• ;i i

’

e

Jio

(10^1,.!$)

' -r -:

L_.L

' I ! I ' ■
I

.'. H-r;

•H

period of
extrapolation

L.

-1

Ia

Xi/

observed data

1 I i
o

,.

i. :h-T

rr;'pF

! T‘-

intake rate on time, and
. .i

i

I
X

: regression line;
y ± 25.15 + 0.^68 t

i

X

I

(/iSh)

i

X
i
i

Xl.o
X

i^o

■'t

•

I-

■

I
i

6

!

y~pi)^

I

r
Sources

see Table 13 ctrZ £x<x

F

I

i
i
47
i = time

7

—r

ft

6).
6. I|
6.)LI

Extrapo 1 ation

Figure 9 shows one of the most important uses of the

regression method.

This is its value in forecasting.

example, given the equation calculated

regression line of

For

in Example 13 of the

intake rate on time:

y = 25.15

0.968 t
©

we can substitute in values of t.beyond those observed (i. e

beyond the period t = 1, 2,...., 7).

For example, the regression

line, if extrapolated beyond its period of observation passes
through the point

(10, 34.83)

as is shown in Figured.

For, when t=10 (in 1977):

y = 25.15 + 10 (0.968)

34.83%

G.n»2

This is our forecast of the
intake rate in 1977.
M based
The forecast
two very important assumptions: firstly,
that the observed trend is linear, and secondly that it can be

extrapolated safely in a linear fashion.

The

below on

non-linear curve-fitting discusses the first of these assumptions.

As regards the confidence that can be placed in simple linear

extrapolation, it must be said that this should only be done with

care and thought.

All forecasting, however done , is subject to

error and uncertainty.

Beyond a few years into the future the

uncertainties become so considerable that relatively sophisticated
statistical techniques tend to lose their advantages over informed

guesswork.
6. 12^ Interpolation

6./L*l

The regression line allows us not only to extrapolate but

also to interpolate.

That is, if we wished to have an estimate

of the intake rate at some time within the observation period
other than at those times at which it was actually observed, by

substitution of the appropriate value of t we obtain a value of
For example, when t = 3^,
25.15 * 0.968 (3.5)

T

y

X

28.538 %

This has also been shown on Figure 9.

6.12.2. The reader is reminded that linear regression of a dependent
variable on time as the independent variable is a very important,
but not the only, use of the least squares method.

Regression

analysis deals not only with the estimation of trends in time-series

data, but more generally with the estimation of causal relationships
between any quantifiable variables.
may be estimated.

Further, non-linear functions

And regression analysis is not limited to

functions' involving only one independent variable.

However, we

can deal here only with bivariate regression, in which there is one

dependent, and one independent, variable.

Nevertheless, the more

advanced techniques of multiple regression allow the statistician

to estimate functions of (in principle) any number of variables:
y

6.12.3

=

f (Xj, X ,

(3/ )

x)
n

In fact, practical considerations will usually limit the

analysis to only a few independent variables.

In particular,

lack of sufficient data is almost always the constraint facing
the practicing statistician.

Other things being equal, the more

data (observations) available the better.

More confidence can

b« placed in statistical estimates the larger the number of
observations, n.

6.12.4

This is intuitively reasonable, but can only be rigorously

demonstrated by mathematical techniques in the theory of probability

which are too advanced to be presented here.

As a rule of thumb $

in estimating bivariate linear functions, though it is possible

with as few as 3 observations, little confidence can be attached
to the results unless at least 5 observations are available.

More

are always to be welcomed, particularly if the data are well

scattered.

6.13

Non-linear curve-fitting

6.13.1

Vhenanalysing the relationship between two variables, the

hypothesis that the data, when plotted on a scatter diagram, may be

approximated by a straight line may become obviously unreasonable.

A non-linear function may be called for.

In the Chapter on Basic

Mathematics the idea of a non—linear function was introduced in
5O3.

The simplest non-linear function is
y

U

a + b x

2

(3L)

the graph of which describes a parabola (1).

This function may

easily be fitted to data if the scatter seems to have a parabolic

form.

By a simple transformation

of the variable x2 , we obtain

the linear function ( 2^-), and can apply the least squares

formulae () and (A7 ) for a and b above.

We transform x 2 by

simply re-naming by a symbol in the first power such as w, hence

( 32 ) becomes:
y

a + b w

(J3)

which has exactly the linear form of (^i).

(1)

See Figure 3 in the Chapter on Basic Mathematics for the graph
of y = x4-.

/

61.

6.13.2

Application of the least squares formulae

for b and a respectively gives:

n
(*.-*)
L

X

i=l
n

b

(w -w)
i
i=l

a

y - b w

Having now estimated a and b, we may transform w back to x

plot the estimated, curved, regression line.

2

and

We may say we have

"fitted" a parabola to the data by least squares.
6.13.3

Variables which are growing exponentially over time when

plotted on a graph will show a curve with an ever-increasing

positive slope.

The variable is growing over time at a constant

proportional rate of growth (1) and has an equation:

a bX

y

)

Again, by using a simple transformation to obtain a linear form.
the method of ordinary least squares regression may be applied.

) by taking logarithms:

In this case we transform (

log y » log a * log b (x)
6.13.4

(3$*)

If we now write:

log y = Y

*

log a = A

log b = B

x

= X

equation ( 2^) may be written:
Y
(1)

A

B X

See Chapter on Basic Mathematics, Section 4.

(36 )

6^

which may be seen to be a linear equation.

A and B may now be

estimated by application of the least squares formulae.

First

we may estimate B, and then transform back (by taking its

antilogarithm) to find b.

Similarly we find a.

We have therefore

calculated the least squares regression line:

y

«>

0^)

a b

That is, we have fitted an exponential function to the data.

The

fitted curve may be used for interpolation and prediction, as

described above.

6.13.5

The final non-linear function discussed here is the logistic

function.

The general form of this function may be seen in the

seventh frequency curve of Figure 4 above.

functional form to fit to certain data.

This is a useful

It can be seen that the

curve begins at a low value of y with a slope close to zero.

The

slope increases steadily, then begins to decrease again towards

zero at the ii saturation” level of y.

to behave like this over time.

Certain data may be expected

For example, consider the net

enrolment ratio for primary education.

This is defined as the

ratio between the number of pupils at this level who belong to the
official age-group and the total number of children in this age-group.
In percentage terms, it therefore has a theoretical minimum of 0%
and a maximum of 100%.

It cannot, i.e., grow without limit.

Over a long period of time, it may be expected to develop
approximately according to the

shown in Figure 4.

Eventually

the saturation level is reached beyond which the ratio cannot go.

One general formulation of the logistic function is:
k

I

(3^)

6.13.6

logarithmic transformation, jJi^Aa necessary

After

assumption

oeen made about the value of k, this function

may again be estimated by the method of least squares.

However,

readers who wish to utilise the logistic function in their work

*

are recommended to consult a more advanced text.
*

6.14
6.14.1

Regression:

a summary

Regression analysis is an extremely valuable tool in the

statistician’s kitbag.

However) like any tool, it can be misused.

The most obvious danger is that it is applied in circumstances where

there is no underlying theoretical justification for the causal
relationship which is estimated.

It is always possible to regress

any quantified variable on any other such variable.

But there

can be no useful interpretation of the estimated coefficients unless
the causal relationship under investigation has been carefully

set out and justified from the beginning.

6.14.2

The reader should always remember that, however seemingly

sophisticated the mathematical and statistical techniques are,
the results depend on good original data.

Poor, inaccurate and

insufficient data cannot produce results in which the educational

planner can have any confidence.

The development and maintenance

of a good data-base are the first and last tasks of the statistician.
*•

6.14.3

Despite these warnings, the regression method is the best

available for quantifying causal relationships.
easily understood.

It is reasonably

The method of least square* is an intuitively

attractive method of fitting a line to observed data.

It lends

itself to simple methods of interpolation and extrapolation which
are important at a number of stages in the production of educational

projections.
6.14.4

Regression analysis, like all statistical techniques.

can be misapplied and misinterpreted.

There are choices which face

the statistician in using the technique, to which there are no
**

straightforward, technical, answers.
satisfactory?

For example, is the data-base

Should some data be rejected - or further

information sought?

In deciding on the form of functional

relationships, which variables should be included, which excluded?

Should a linear or non-linear function be fitted to the data?
In making these and other important analytical decisions, the

statistician will always be guided by experience and intuition
as well as by understanding of purely technical methods.

EXERCISE 10
The reader should now do Exercise 10 $rx

/

EXERCISES

1

10

ANSWERS TO EXERCISES

PP*

1

10

PP-

69^9

EXERCISE 1

1.1

Write each of the following as summations:

i)

X1 + X2 + x.. + X4 + X5

ii)

(21 - 4) -<• (Z,2

iii)

X1

iv)

X1

v)

y4 + *5 + y6 * y7 + ^8

Vi)

2
3b 1 2 ♦+ 3b
3b2 + 3b32 + 3b4 2

vii)

(X3 “ y3 ) + (x

' y4}

4CX,

+ x9)

2

2

2

+ X2

4) + (z

+ X3

2

* x.3 o + X4 2 f4 * X5 2 f5

8

1.2

X5f5 + X^f
6 6

3
x.

i=3

1

5

ii)

i=l

3
iii)

(x^ +

41

i=l

12
iv)i^ f

i

v)

x.

1

5
1^2

/

X7f7 ~ X8f8

Write each of the following without the summation signs:

4
i) 51

*

(X5 -

y.) + (x4 -y4))

ix) 3((x,
3

x)

2

+ X4

‘4

viii)

4)

x.1 yi

7o.
1.3

Given
i)

= 3, X2 = 2, X3 = 4 and y
3

iti

= 6, y2 = 4, y- = 7, calculate:

X.
1

3
ii) 1=2 x.i
3

iii) g (x i+ yp
iv)

3

^9

2y*

i=l
v)

X.
1

2

4

7/,
EXERCISE 2

2.1

4

50 students toox an examination and obtained the following scores :

9

21

43

39

44

31

41

27

71

78

61

63

89

76

54

37

57

69

42

66

62

40

0

51

57

57

65

56

88

80

44

64

51

59

48

63

57

44

18

69

52

41

50

58

49

84

79

57

99

57

Divide the scores into 10 equal class intervals.
the frequency distribution.

Write down a column of the

class mid-points

9

/

"12
EXERCISE 3

3-1

Using the table of frequencies you have calculated in
Exercise 2, construct a histograa after having aggregated

the first 3 classes-

Draw in the frequency polygon-

o

73.
EXERCISE 4
4.1

Table 4.1 below gives the

years of service of

a sample of teachers in Afghanistan in 1973.

Table 4.1

Years of service

Frequency

0 < 6

10990

6 <12

6828

12 <18

1402

18 <24

880

24 <30

668

30 <40

185

Total

20953

i

i)

calculate the percentage frequencies
in each class (to one decimal place)

ii)

calculate the percentage cumulative

frequencies (commencing with the 0<6

years of service class frequency)
iii)

draw on a graph the percentage cumulative
frequency polygon (ogive).

iv)

from your graph, what percentage of
teachers do you estimate have given

less than 15 years of service?

of service or more?

/

5 years

7^.
EXERCISE 5

5.1

Using the data presented in

Exercise 4 ,^calculate/

to one decimal place^the average number of years of

service given by teachers.
w

EXERCISE 6

6.1

Use the percentage cumulative frequency polygon you constructed
in Exercise 4.1(iii) to calculate the median number of years
of service of teachers.

*

6.2

What is the class exhibiting the modal frequency in Table 4.1t
Exercise 4?

6.3

What is the value of the mode in the set of data presented in
Exercise 2.1?

I

76,
|

EXERCISE 7

7.1

Using formula ( /7) calculate the standard deviation
the set of 50 examination scores^given in Exercise 2.1^
10

{KfyxroJs

h>

7.2

Calculate the arithmetic mean score. x

7.3

Calculate the coefficient of variation.

I

s

of

11.

EXERCISE 8
8.1

Using the data in Table 8.1 below, calculate the coefficient

of linear correlation^r, between x and y where:
x. = distance of home of pupil i from school (kms)
i

yi = number of days absent by pupil i from school

in a year

Table 8.1

8.2

X.
1

yi

1
3
2
1
4
7
8
3

5
15
4
7
20
23
17
4

Rank the data in Table 8.1 in ascending order.

(Where there

are ties, assign to each of the tied observations the ranks

which they jointly occupy).
rank correlation, r rank*

Calculate Spearman’s coefficient of

EXERCISE. 9
9.1

Over the period 1970-1976, total primary level enrolment

in- thousands in Gabon was as follows:

Table 9.1
*

Total primary
level enrolment
(thousands), E.

1970

1971

1972

1973

1974

1975

1976

t=l

t=2

t=3

t=4

t=5

t=6

t=7

94

101

106

110

114

121

129

u

Calculate, using formula (29) for ease of computation, the linear
regression of primary level enrolment, E^, on time, t.

9.2

Interpret carefully the meaning of the estimated coefficients.

a and b.

19
EXERCISE 10

10.1

Using the equation of the regression line calculated in

Exercise 9, draw it on a graph.

10.2

By linear extrapolation of the regression line, use the

graph to predict E^ in 1977 and 1980.

Check the accuracy

of your answers by substitution of the appropriate values
of t into the estimated regression equation.

SoANSWERS TO EXERCISE 1

1.1

i)

£

x.

ii)

a

z. — 3(4)

i

i

2

iii)

Xi
5

iv)

2
x. f.
i 1

1=1
8

v)

1=4 yi

vi)

3

A
b

1=1

2
1

1

5
vii)

1=3

x

1=3 yi

1

9

viii)

x.

i

lx)

1.2

V

4

4

X.
1=3 1

1=3

yi>

6

8

x)

S xifi

1=7

1)

3
3
X3 + X4

11)

X1 + yl + X2 * y2 + X3 + *3 + X4. + y.4 + X5 * y5

iii)

4 + 42 >43

iv)

f9X9 + £10X10 + f11X 11 + f 12X12

v)

X2y2 + X3y3 +

x.f.
i i

84

4V4 + X5y5

1.3

i)

X1 + x 2 + X3 = 3 + 2 + 4 = 9

ii)

X2y2 + X3y3 = 2(4) * 4(7) = 36

iii)

xi + yl + X2 + y2 + X3 + y3

iv)

2(yl+ y: + y3) =
2

v)

2
2
2
X1 yl + X2 •y2 + X3 ^3 =

¥

2( 6+4+7)

3+6 + 2+4 + 4+7
=

26

34

9(6) + 4(4) + 16(7)

=

182

e

ANSWER TO EXERCISE 2

2.1
Frequency distribution of scores of 50 students:

Scores

Frequency

Class Mid-points

0 <10

2

5

10 < 10

1

15

20 < 30

2

25

30 < 40

3

35

40 <50

10

45

50 <60

14

55

60 < 70

9

65

70 <80

4

75

80 < 90

4

. 85

90<100

1

95

Total

50

w

f?.

ANSWER TO EXERCISE 3
3.1

See Figure/0.
Note that the rectangles of the histogram are centred on the

class mid-points (5, 15,

95).

The frequency polygon.

joining the mid-points of the tops of the rectangles, is

continued at the ends of the distribution so that the
area under the frequency polygon is exactly equal to the
area of the histogram.

is 1.66 (i.e.
to frequency.

’ 3

The height of the first rectangle

) to obey the rule that area is proportional

I"

J
t

L_„4—--4

T.

■

'■

■-1 -

b—

: I

w
i

4.---------- !

I
J
'

■1

-

i

>....

i- -

I. .
\
f

3

r :-

..:... r_

£

<D

o

o'
o

S’
—-j—

o
®
C5
O

4;.l
O

•

I---

GJ

i

»
.1

z.
iF
/

IF
-F--? | =
• -.1

.. r......

• •

-XI-s-. -?
F eaS
, 7 g « §' ■!

tr

r ■

j

4

■

«sF^
2 ..._..

S.S- ;
: ...; i ■
... .

3
o

o

ANSWERS TO EXERCISE 4

4.1

i) and ii):

percentage frequencies and percentage

cumulative frequencies are given in Table 4.2 below:

*

Years of
service

percentage
frequency

percentage
cumulative
frequency

0< 6

52.4

52. 4-

6 <12

32.6

85.0

12 <18

6.7

91.7

18 <24

4.2

9J. ?

24 <30

3.2

99.1

30 <40

Total

iii)

100.0
100.0

See Figure It.

Note that the following points

have been plotted:

(6, 52.2}), (12, 85.0), ..

(40, 100.0).

iv)

See Figure 1I•

The graph indicates that

approximately 89% have given less than 15
years service, and approximately ^6% have
given 5 years service or more (i.e., (100 - 44) %).

/

r" ■

: ?i

; •-

*->Q

..... -j.

•**’ <*

•

~ Q ~7

i

*

I..

Jr

I!
i

"p-7.p".".

rr—

f Lp

V p. o3 :::: o":t

o

i

j

——
-i-

s

—t-

, M-

Q W

,;4p Hp-pL

Lt: d
—..... Q|;

ZJ-...

C
h*--- 59

4—f-h-

I

■!

I

:1
- i

I

I

W4 HL_ _

<A

10

L- . ?- £:; -

w

i

TLj

I

ffc

«

■“

:.Tr :.ij

-

S’ !

i ' MT:' ■■

Iffi
F

-i.4-

_

r- .

•5 j
O

r.

w ■■

------ :r

7<i« r 5
! -" 4"

p; ■

-Wp:

-—-4 4
r.*?

r

[

H-i

--::i-.xs

224^-R

t— ■

i4

■:

• h>$*:
—-e*-:

! <77L::- . _

■r3:a’»2'<
:
‘

'

----- ?_

i

T

-v

r •

i

I
. - 7

Li

—

-7-4--

p 1L

• r • 4- • ■•___ £

I
4

-4
4-4

r i
r

J

L..4..7-

-.j—

i

'

’

-

I

H;4r:

:

•-

.

I

,..<-p

....................................... ■

i--T z-i-••

t

c

S:!
■*-*

O

f>i

;

1

Sa !i

; -:.

◄

4

4-

-F

- rp-Q
• *>'

•

f .

*1
fi
H*
ffl

»•:

-4—- p-h|g'i
ST s>i

. ._J_X;--- u:-..~ —J —
A.

i’-

■:

IB” <»‘

L .t-n:...

•.TT’

ly
L...'

:®' E’-;

*-*

V-V . Hrp-

jZL-L--.:—

?

i

451*^
-.4^2; S« ’I;

!

i
.
L|
|44/:’o L. :

44.0

*

• -XK-Llp--p—
74-''■ 1

_.4 .

-I

.. 4-

I

j

i-.
j.

.

........

i
i
■

___ 1

ANSWER TO EXERCISE 5

5.1

The reader should construct the following table:

Table 5.1

Year^ of
service

frequency
f<

J

class
mid-points
x.
J

f ,x.
J J

Q

10990

3

32970

6 < 12

6828

9

61452

18

1402

15

21030

18 < 24

880

21

18480

24 C 30

668

27

18036

30 < 40

185

35

6475

Totals

20953

0 <

12

The arithmetic mean,
k

itf. x.
5=1 J 3

£

j=lfj

158443
20953
7.6

years

x,

158443

ANSWERS TO EXERCISE 6

6.1

By dropping a perpendicular from the point on the polygon (see

Figure 11, in Answers to Exercise 4) at which a line drawn from the
50% value on the vertical axis intercepts it.

we see that the

median is approximately 5.7 years.

6.2

The first class, (0<6) years of service, shows the highest
frequency, 10990.

6.3

The modal score is 57;

it occurs 6 times, which is more

frequently than any other value.

zlNSWEliS TO EXERCISE 7

7.1

The reader should set out a working table similar to Table 9

A=5o, has been chosen.

in Example

10 .

Table 7.1

Examination scores of 50 students

An arbitrary origin,

x.

(x .-A)
3
=d.
3

f .d.
J J

f.d2
3 3

2

5

-50

-100

5000

10 <20

1

15

-40

-40

1600

20 <30

2

25

-30

-60

1800

30 <40

3

35

-20

-60

1200

40 < 50

10

45 ■

-10

-100

1000

50 4 60

14

55

0

0

0

60 < 70

9

65

10

90

900

70 < 80

4

75

20

80

1600

80 < 90

4

85

30

120

3600

90 <100

1

95

40

40

1600

IZ7f .d.
3 3
-30

2If d^

frequency

class mid-points

f.
J
0 ^10

Scores

Totals

J

J
50

From formula ( r/),

Tk

k f.d2

s

s =

3^1 J 3
n

2

-3=1 3 -1

years.

\

h

2
^•3 3
18300

'0 .

ANSWERS TO EXERCISE 7 (continued)

7.2

Using the formula

4
x

3

4-

-30
50

55

54.4

7.3

n

years.

The coefficient of variation, V

s

. 100) %

X

Kax . loo) %
54.4
V

*

J

x 3^.1 %

w

I
■)

I
I

ANSWERS TO EXERCISE 8

8.1

The reader should set out a working table as below in Table
8.2, similar to Table 11 in Exqm^^

11.

Table 8.2
*

yi

Xiyi

x.

i

yi

1
3
2
1
4
7
8
3

5
15
4
7
20
23
17
4

5
45
8
7
80
161
136
12

1
9
4
1
16
49
64
9

25
225
16
49
403
529
289
16

2^

^yi

y.

29

95

454

1

<

2

2

X.

-5-

Tx.

2

153

-T

Zyi

2

1549

From formula (20)
r =

8(454) - 29 (95)
- (29)^/[8(1549) - (95)^]

. . r = ±0.77

There is a

moderate positive linear correlation between

distance of pupils’ homes from school and the number of days
absent per year.

8.2
9

A

A working table as below should be set out, similar to Table

12 in Example 12.

9Z

Table 8.3

Rank

<L.I

Xi

yi

(x.-y 1.)

d

4
3
1
6
7
8
4

3
5
1
4
7
8
6
1

-2
-1
2
-3
-1
-1
2
3

4
1
4
9
1
1
4
9

2
i

»

7 d.2=33

Applying formula (21):

r .
rank

1

rrank

* 0.61

6(33)
8(64-1)

<

ANSWERS TO EXERCISE 9

9.1

In order to use formula (29), the reader should set out a

working table as uelow:
Table 9.2
i

t2

t

Et

1
2
3
4
5
6
7

94
101
106
110
114
121
129

1
4
9
16
25
36
49

2ZE.

2

28

775

140

t=4

E =110.71

tEt

94
202
318
440
570
726
903

^tEt

V

A

3253

V

for yi), we

From formula (29), (substituting t for X. and E
1

V

obtain:
b

. < b

7(3253) - 28(775)
7(140) - (28)2

5.46

Having calculated b, we may now substitute the values of b,
c and E

into formula (28) to obtain a:
V

a
a

=• 110.71 - 5.46 (4)

88.87

Hence the equation of the estimated regression line is:

Et

88.87 + 5.46 t

‘Hi.

9.2

interpretation of a

Note that when t = 0, ~by substitution into the equation of the

estimated regression line, it can be seen that E^ = 88.87.

That is, we have found a, the intercept of the regression line

on the vertical (E.) axis.

It is our estimation of what
j

enrolment was (in thousands) at time t = 0 (1969), assuming a

backward linear projection of the regression line.

Interpretation of b
b is the slope of the estimated regression line.

The estimation

of b = 5.46 implies that a unit increase in time, t (one year)
gives rise on average to a 5.46 unit (thousand) increase in primary
level enrolment, E., in Gabon, over the period of observation,
t

1970-76.

<6

A

ANSWERS TO EXERCISE 10
10.1

See Figure 12.

The

'

A

may be drawn by passing it through

the points (0, a) and (t, E^), i.e., (0,88.87) and (4,110.71).

4

10.2

In 1977, t = 8.

The graph shows

to be approximately

f32. :T »

*

1980, t

149.

= 11.

The graph shows E^ to be approximately

Graphical methods are inevitably limited in their

accuracy.

Predicted values are best calculated by direct

substitution of the values of t into the regression equation

Thus, when t

E8
i

. . EQ
when

8 (1977):

88.87

5.46(8)

132.55 (thousands)

t

11 (1980):

E11

88.87 + 5.46(11)

“ E11 =

148.93 (thousands)

i-.-P—5P-'-S4-O-

iiiji

br ...

g - qLq:

g

la

t

a IE
■i-

|;:77tr-:: *:

J

.;. I. L..

»

EWW

::77r: & 5

• -Z:-:

!■

iy

-IfZ
a. 2::. WP

'Ba"

w

'
BLBBiBiBjB
-tr-i Bz
LBBBiBBBb
:;:.: ~— b—*1 z: L"7~7 ’ ’ •; '-■ * •
_______
BpZa
'•W
—
•-t• • .....................

B>OB77LB
l' ___ Tea ...
BL-Bi’fBnzrEbT-EBB-B iiiziz?feQ
’zgzi
LhBLiBIzaKW

Zzpi;

B-BB
Z-TUg WEO WEW[WI|tWE
BLZZ WjW WW-WhWWW

zigg ::::.fSi7 WEgiWiWW
- ■ B:Z

zEWpW£-EWW

EiOiEi^^
,
ipigfeWWP Wsag WwW

pFb“P;ZEZi: V Wt^EE iz^Vzgg

E®

:z:el bL

®w ;"rH

________,

77 7 Z77|b.77.££r: ^77:.

^lttbtxtx-zbtzbb

ggrEizIg
—pbfzz
—"
"B WgFFb:!, Azzzzzz
P^L^zzfzzW
LpgWgbE BBBB

.Bi IpZBBiipB
weweew ■:B„
..
.,..

zgg.gznrg

::-7T777g{.L.: rxr:: Z LfesgTTTZZZZZZTrZTTrT "Z-fe-Tt TrTTtTzZ agg-F rTTt.ZH'
BBI :;BX 'BBIBB..’ |:rr7tZr7
f>:^bi
BBBBBbfeF ^ —ZZ: L-zzTiZzxrrTZTL-ZjZxZiinTrxritf 7g.zizmiinm4itSH:I :
•BB nBLBBIBB
1
EqE BB- ZBifeBZblhbbbb lbllbbb
IZZiTbifBzb': 1:43"::’ jiirl:
zfeibsiM z^
EEiE BB BBLLBfeigfe' ::7:p.7~4:b:r_'j
i
Z
z
L
b
BBpiaB
...zWzWL1: b BiiBiBB
ZB HZ :7.Tf4 77.:t:: r: L"
ZfeBBiife BjB
■ps.
bfeWfe
,Lj
^iLL
BBiiBiB
:n
.
_
_
lEiB yBipL ilB
Bl
T ._. .7. ...
‘
Bi.E
bITb’b
~
i
iBjLiBiB
EtW
WIbz -IT
Zb bb
.bi:
-e
•t;:p
___
ZZ
e
Zx
IP:
F;::::
i
:,
r
:p:
■:
yBZBB -UJ
<P::gt:.:
BLBb
B
i-sfe Bb •aizIzz^zlE-EzE •ZgilE® iBisB oLBB
WSP WE <r ZyB
e^7
4
”
r:::b:::
B BbB> r:i oiBB■pg
BBi HB; Lbb bl -zzfej-jWgzEiz
• -7-B'i
:;::rr:> zzjgi
Q-cbz
t;
CO-pb:
BBBfelBg tLfZL
BiB BB iiMifrbivii ' ZZZZ :Zi:; 7777. B'-ZZ BL :Tq~UH:::: g Ly [BriB
y-.bBB
BBI
t—u^x
BBbBB blIbL bbb. a WWW
LOlliBiB iBoBBiB LBb pBlLB
g i-bEb? 7-bhrb
gWEE n:j:
BbBb
-i-fHZn?—.~
Lb bi.: BiBLgg
japz?
i BlLgzLHiiBB zb;
BfeBL
77: -.: i •
-iZizzTzz-rr: bb
BilgBB
BLLBii
tWW. r
g-'Ez H
::Z 'ZZ|Z'ZZ'--’ 7 .:■ i egfi
auEB’.;; :r^:?
BpiLBiLsB £££ zzfeX
b BBI
**1
frt'p^
LZ-fez
::i
Zb
WgigWa
Bi gl
firi? wpOew
:77 : 77 L h 7
.ft.77.g-H-t
ZHIpiEl 77 77 : r-B.77.iZ7.-. 1
HZFL?
7 ' - b-B'■SlasSaat.
!.ZZ;g(-EPi ^<rEI lz izz ;.7 77:p"7p?iZ
EEizEEEEEE BBBBfe; ytn •H4.H:
’
0
-U :'.-~ HHlgr bblLg4g
bb i ife BB
iii-i IBbBbz zzfe ggffgfe^|sir;
•r::
7i.?rr. jo 7^zr ZrzfbZ •Bb. bbLlBb bB zBbLl
IZbW
BBS2B
g
‘bit
BBipsliitBp ■BL Fbi;
:z^
feZfSLZ- 774.7 T7 -77 Z I-'- ZT BT-Z ~tZ tZ: S-*^’.' ■ /{Efife
SBB
•ZZFbl-i ■
EE^IZLZ zilWzzggWjzz. B s [L; B ipg pB
Z|gB iP
L
lleEbIl—
.
.^..bZHbB
BZiLBBlBH
zhz
^I
z
E
zzEWW
lBbl
E
BBBzfejjfeBi
XEiM
’: '"it: Ttr!; r• • rriT;:: l: rr
:::: : :L. ::
: : :
LLBB;
WWEW
BLLL
i
WB
O»- '——
.':
Hy:: Fr|if ,
Lbbbb
Z^bb: .bbhb^PL
b
I; K-felg ..•rrrrr:
Pina LizbBBBBZ
b
iEEEEE
■■'■I.B-.feb:
birrHr
’
W77
g;-:
e*
'
Ww
■h!b BIB
zzE
y-. 7 :
e e
■gjW hTizZHZibziiBprWh;; inn
.:Z Eg: L:
ir:
:;.7p-7lii7u7i
QB7
Bifiiii
.BL
t^B
aaial t: ZZr;:~
iBU
r
^rfer
EW'EEW Bt 'WWEEEEE BiBI
® Bii
BgifeBLfeBb
LlT
siB
:: ::.t.-.7
I
7.
bi'
7771:7:: iziEzL
ZZ—.IWWala
yisS-SiBipE B iiiiBiliZ
■ : Z: f ■
KrErU:::
■4^'
HZrT
Z^gbPiZ
j T ■
17 777
: 7-:!i. j-TTj-BrJ::- T-fi-H:
ZHg^IiBBLi -BB 7!!rB:
I::
r
:jB
B:i:7:
::.
N:A,
ZiiBZ
EZbZ
Zbiii
I__ L_ ZJ.L
: cn
.•■■EE?: BBji-iB
gLtrt.:.
LB |!
:: 01..
BiB I
::: 77:71
iiUZbl
iWEE
I
ZZIZZ
[EETEE
B7
iiiifiiii >
-^FzZ
■fl r 7-7 H z:'
I
r ~ 7-: r :• EghEE
WeW
Biii
BZ^LBBiB IBB
■4
ZB
BB?
LiiB
plifeiiM ibB
L"
4¥
:!::
:r
:Bzz
Li EB
i\L Lbb!!-ZBiilBBBH
.......
Lb
lb. |::::j::::|;:
A
b
i
|e
217 Hh Hi; 77 HBiii
X
Bpl iibiiz®
::L
'■
B
"b iiiiiiii:
ZL
Zbii
J; . iiiiiiiii
iBBi :7::!!--|:gjZ::
tbWbt;:: i "■i;.
BBB
bLhibii!
7 Zb
BBng Zg
WibBLW

ar •’ bit g
Q. QU....

zfta

tzt i

1

K-

W

H

tttt

■ -•

■ BifeStefeiB
1

ilO

•-♦ *• L F-*— r

zg®.

zh

I

■

t

ww

EzM

H
I

rffe re

Wb

E

jg

I

“z“SB1

1

*|:’

B

1 g
g

i b

I■

BE

EL

_7_» ' '

y=

L
1® W^WWI

-• • ’r • • k •

z

'•• '1 ••\

BBI

I

I

ww

L -4
—lZZlbl
:

* ‘ < -f•-

r

iffiffi
ZB

nt

Lit
lbB

L

r
Lift
____ Bfc B

t

I

H ■w
w

I . yZ

life

®R

. co i .::: i

: EEIb:.: j;j --zIB-J! BipBiLi BHB

Position: 5313 (1 views)