11465.pdf
Media
- extracted text
-
f
CRHP
Distribution : limited
Paris, September 1978
Original : English
UNITED NATIONS EDUCATIONAL
SCIENTIFIC AND CULTURAL ORGANIZATION
I
TRAINING SEMINARS ON STATISTICAL METHODS
FOR PROJECTING SCHOOL ENROLMENT
Basic Background Material
Book 1
I
Basic Mathematics
Unit II
Basic Statistics
Unit
(Provisional version)
4
I
i
Education Projections Unit
Division o£ Statistics on Education
Office of Statistics
I
TRAINING SEMINARS ON STATISTICAL HETHODS
FOR PROJECTING SCHOOL ENROLMENT
Basic Background Material *’
Book 1
I
Basic Mathematics
by Robin Shannon
Unit II
Basic Statistics
by Robin Shannon
Unit
(Provisional version)
1) ThXA wosdi
be.e.n p/LCpa/ieci iact/ix.n t/ie
tho. ptiOjZcX • Na^conoL
Tmawjlq
on
Methods uiuth. Spe^aZ R^eAe.nee to P^ojcc^uig
Sehoot EnA.obnent (INT/76/P22), betng eaMted out by the Unesco O^tee o^
Stottsttes, Posits, wtth fitnanctot support ^om the Untted Nottons Fund ^oa.
Poputatton \cttvtttes (UNFPA).
PREFACE
One of the major problems which Unesco statisticians have faced in
conducting national training seminars on statistical methods for projecting
school enrolment (nine of which have been organized in developing countries
over the last three years), has been the incomplete knowledge of many parti
cipants as regards basic concepts in subjects that are pre-requisites for
drawing full benefit from the seminar’s programme. This has been mainly the
case in areas such as mathematics, statistics, demography and data collection
methodology.
It should be recognized that it is often difficult, and expensive, to
find suitable textbooks covering the minimum ground in the above-mentioned
areas, and which also make particular reference to problems in education and
population. As it has been indicated, such reference materials are indispens
able for a sound approach to quantification in education and population.
For these reasons, the Unesco Office of Statistics has undertaken to
make available a set of relatively short, clear and practically oriented
background units with the major objective of giving participants the possibi
lity of revising their knowledge on selected items before as well as after
the seminars, thus enabling them to maximize their involvement in this training
programme.
The present volume (Book 1) contains two units, i.e. Unit I Basic
Mathematics, and Unit II Basic Statistics. It is issued in a provisional version
with a view to incorporate possible improvements in the final version to be
issued at a later stage. Readers may kindly address any comments to the Division
of Statistics on Education, Office of Statistics, Unesco, 75700 Paris (France).
This background material is therefore a general complement to serve as
introduction to the basic paper which is normally prepared for each seminar,
demonstrating how to analyze and project available population and education
data for the country concerned by means of simple statistical techniques. As
such, each Unit is conceived as a self-contained document. Attention should be
drawn to the fact that the various subjects treated in this background series
are directed towards the officials involved in practical work, and therefore
the theoretical foundations (already extensively covered in authoritative
textbooks) of the areas are in no case given priority.
Similar units are under preparation covering such topics as Basic
Demography, Educational Statistics, Education Projection methods and Statistics
on educational finance.
We hope in this way to facilitate the work of educational statisticians
and planners in developing countries.
Division of Statistics on Education
Unesco Office of Statistics
Paris, September 1978
A
UNIT I - BASIC MATHEMATICS
by Robin Shannon, Lecturer
Department of Economics,
University of Newcastle-Upon-Tyne
(United Kingdom)
CONTENTS
iii
Introduction
SECTION
SOME FUNDAMENTAL CONCEPTS AND
OPERATIONS
1
Symbols
1
3
1.3
Brackets
Positive and negative numbers
1.4
Factors
6
1.5
Powers of numbers
7
1.6
Simple equations
9
2
FIRST STEPS IN DATA ANALYSIS
2.1
Absolute numbers
12
12
2.2
2.3
Rates and ratios
Significant figures and rounding
13
2.4
Proportions
18
2.5
Percentages
19
2.6
Changes in variables over time
19
2.7
Rates of growth over time
3
LOGARITHMS
25
3.1
The idea of logarithms
25
3.2
3.3
Tables of logarithms
Tables of antilogarithms
27
3.4
Use of logarithms in multiplication
and division
29
Use of logarithms in finding the powers
and roots of numbers
32
4
THE AVERAGE ANNUAL GROWTH RATE
35
4.1
Calculation (continued)
35
4.2
Use of average annual growth rates
Time taken for a variable to increase by
a given magnitude or proportion
35
1
1.1
1.2
I
SECTION
SECTION
3.5
SECTION
4.3
4
15
29
36
ii
Page
SECTION
SECTION
5
5.1
MATHEMATICAL FUNCTIONS
37
Introduction
37
5.2
Linear functions
40
5.3
Non-linear functions
42
6
GRAPHS
42
6.1
Rectangular co-ordinates
Plotting the graph of a linear
function
42
6.3
Interpretation of coefficients
47
6.4
Plotting the graph of a quadratic
function
50
6.5
Plotting graphs of observed data
51
6.6
Ten practical hints on drawing
graphs
56
6.2
43
EXERCISES AND ANSWERS
Exercise 1 (Suggested to reader on p. 9)
59-60,
73-4
61,
62,
75
Exercise 2 (
tt
it
p.12)
Exercise 3 (
it
it
p.18)
Exercise 4 (
it
it
63,
77
Exercise 5 (
it
it
p.19)
p.21)
64,
78
Exercise 6 (
it
it
p.32)
65,
79-80
Exercise 7 (
it
it
p.34)
66,
81-2
Exercise 8 (
Exercise 9 (
it
it
67,
83
it
it
p.36)
p.37)
68,
84
Exercise 10 (
it
it
p.42)
69,
85
Exercise 11 (
it
it
p.48)
70,
86-7
Exercise 12 (
it
it
it
it
71,
72,
88-9
Exercise 13 (
p.51)
p.57)
76
90-2
ANNEXE
Table of^common logarithms
93
Table o t^common antilogarithms
94
- iii
Introduction
Many people find mathematics a forbidding - even frightening subject.
to abound.
Strange and incomprehensible words and notation seem
Numbers seem sometimes to appear from nowhere - only
to disappear again for no apparent reason.
Complicated relationships
may be presented which seem designed to mystify the reader rather
than to enlighten him. In the face of this the puzzled reader
may decide simply to abandon his efforts to understand.
This is
a sad irony, since the essential reason for using mathematics is
to further understanding.
The purpose of this chapter is to demonstrate to the reader
that the techniques of basic mathematics presented here, assuming
careful study of the text and diligent completion of the exercises,
are well within his or her understanding.
However, it must
immediately be stressed that this chapter should not be regarded
as a complete substitute for a general introductory mathematical
textbook.
It is rather a presentation of the basic mathematical
concepts and techniques which an educational planner or administrator indeed anyone concerned with the development and monitoring of
educational systems - will almost certainly encounter in his
or her professional work.
This means that a number of mathematical
methods covered bya^any general textbooks will not be found here.
For example, a general introductory textbook would almost certainly
introduce the reader to the solution of quadratic and simultaneous
equations, the ideas behind irrational and complex numbers, linear
iv -
algebra, concepts of limits and continuity, the calculus, and other
subjects.
Certainly, some of these more advanced mathematical techniques
are utilised in relatively sophisticated analyses of educational
data.
Use of all, however, assumes a mastery of more fundamental
methods, manyof which are presented in this text.
*
Thus, this is a presentation designed essentially for
practical professional men and women in the field of educational
planning and administration.
It is not designed to make the
reader a trained mathematician!
It is assumed that the only previous training in mathematics
which the reader will bring to this text is a knowledge of the
basic operations of addition, subtraction, multiplication and
division.
A number of readers, no doubt, will find sections
of this chapter already familiar.
Such readers may safely omit
these sections, and spend their time on any unfamiliar, or only
half-understood, concepts and methods.
urged:
do the exercises!
a proper understanding.
But all readers are
Practice is essential in developing
There is no substitute for it.
The nature of mathematics
A considerable part of educational planning, monitoring
and control is concerned with various types of relationships
between quantities.
Some relationships may be purely definitional:
for example, a "pupil-teacher ratio” is a number stemming directly
from its definition.
Other relationships may be behavioural:
example, the number of students applying to follow a particular
course of higher education will depend, in ways which may be
for
- V -
specified, on the behaviour of individuals with respect to various
factors.
for
And some relationships may be purely technical:
example, the maximum number of schools that could possibly be
constructed in a given time period will depend on the limited
resources of labour, capital and finance available.
The role of mathematics is essentially to analyse the
These
structure and logical consequences of such relationships.
4
relationships may be considered singly, or they may need to be
considered together.
For instance, in attempting to project
the number of pupils who will be enrolled at a particular level
of education at some time in the future. a considerable number
of relationships between quantities will have to be considered
simultaneously.
For example, how many children of a particular
age—group do we expect there to be at a specified time?
Of
these, how many do we expect to have entered the system?
How will
these children be distributed across grades?
What number of
pupils will be promoted from grade to grade, how many will repeat
their grades, and how many will leave the school system entirely
(dropout or graduate)?
Mathematics can help us to uncover the logical consequences
of such relationships.
If our initial observations, or assumptions.
are wrong, mathematics alone cannot help us.
a tool of analysis;
interpretation.
For it is above all
an indispensible tool in planning and
It cannot, any more than any tool, conceptual
or physical, tell us what we should do.
discover what we have done;
But it does help us to
and what we can do.
1.
!•
SOME FUNDAMENTAL CONCEPTS AND OPERATIONS
1.1
Symbols
1.1.1
In many instances in this text we shall use symbols rather
than definite numbers.
The use of symbols will allow us to deal
with general expressions and general results.
be utilising the basic concepts of algebra.
We shall, in fact,
Algebra is best
understood as a generalisation and extension of arithmetic.
As
pointed out in the Introduction, it is assumed that the reader
is familiar with the fundamental operations of arithmetic.
These are the operations of addition, subtraction, multiplication
and division.
All these operations are used in algebra in
essentially the same manner as in arithmetic.
However, in
algebra, as new processes are developed, new symbols are
introduced to help the operations.
1.1.2
To clarify the use of symbols, we consider the following
extremely simple example.
f-
Assume that in a certain primary
school there were 559 pupils distributed across 13 classes.
What is the average class size?
i
In this particular school, it
is clearly:
559
43 pupils
13
In order to generalise this simple example of the calculation
of average class size to any school, let us assign symbols
to the three different numbers.
Let the letter ’’P11 represent ’’the total number of pupils it
Let the letter ”C” represent ’’the total number of classes”
Let the letter ”A” represent ’’the average class size”
2«
Thus we may write:
A
=
P
—
(1)
This is our general formula for defining average class size.
In our particular example above,
A
43
P
= 559
C
13
The general, algebraic formulation(1) applies to any school.
Average class size A may always be calculated, given values for
P and C.
1.1.3
The reader may verify for himself or herself that the
formula(l) obeys the four above-mentioned elementary operations
of arithmetic.
For example, let us multiply the formula by 2
on both sides of the equality sign:
2A
2
2A
2P
C
P
C
(2)
Notice that the usual multiplication sign, ’’x”, is often
conveniently omitted in algebra.
Sometimes it is replaced
, but in this text the convention is adopted
by a dot.
that the absence of any sign between two adjacent symbols
implies that they are multiplied together.
1.1.4
Continuing with our example, note that the term 2A
above denotes a multiple of A and the number 2.
This
♦
number 2 is known as the coefficient of A.
A coefficient
may be a definite number, as here, or may itself be written
as a letter representing the number.
bA
bP
C
Thus we may write:
4
3.
where b, the coefficient, equals 2 in our particular example.
1.2
Brackets
1.2.1
It is often the case that an algebraic expression, or
part of an expression, has some operation (e.g. multiplication)
to be performed on it as a whole.
For example, we might wish
to write, in algebraic symbols, ’’Three times the sum of w and y”,
w and y symbolising particular variables.
yjote that a variable is a symbol which can take on any of a
set of prescribed values.
1.2.2
If we wrote the following expression:
3
x
w
(3)
y
it would not be clear whether 3 should multiply w alone, or both
Thus we use ’’brackets” to enclose the part which is
w and y.
to be operated on as a whole.
We write:
(4)
3(w + y)
Brackets tell us of the order in which operations are to be
performed.
coo./J
fa#
For example:
2(x + y) - z
means that first we calculate the sum of x and y, then multiply
by 2, before finally subtracting z from the total.
1.2.3
Brackets are essentially a convenient way of grouping
symbols or numbers in order to perform operations upon them.
In transforming expressions with brackets into expressions
without, certain rules must be observed carefully.
removing the brackets from the following expression:
x(y * z)
(5)
Thus, in
4.
each term within the brackets must be multiplied by x:
x(y + z) = xy + xz
(6)
Similarly,
(w + x) (y * z)
(7)
is equivalent to:
w(y + z) + x(y + z)
= wy + wz + xy + xz
(8)
Similarly, when division is performed, each term within the
bracket must be divided by the number outside the bracket, thus:
(y + z) = X + £
X
(9)
XX
Note that the division of (y ♦ z) by x is precisely the same as
the multiplication of (y + z) by
1.2.4
.
In performing the operations of addition and subtraction
in conjunction with the removal of brackets, two important rules
must be observed.
Firstly, when a positive sign goes in front
of the brackets, the signs of the terms within the brackets
remain the same.
Secondly, when a negative sign goes in front of
the brackets, the signs of the terms within the brackets change.
Thus, for example
x + (y + z) = x + y + z
x ♦ (y - z) = x + y - z
x -(y+z)=x-y-z
X
1.3
1.3.1
-(y-z)=x-y+z
Positive and negative numbers
Corresponding to every positive number (signed +) there
is a negative number (signed -).
In effect, a negative number
is a number which in its meaning and effect is opposite to a
9
5.
positive number.
All that we need to know for our purposes
are the fundamental rules for the operations of addition.
subtraction, multiplication and division.
1.3.2
In the addition and subtraction of positive and negative
numbers, where x represents any number,
(+ x) = 4x
«• (- x) = -x
- (♦ x) = -X
- (- x) = +x
It can be readily remembered that like signs give a positive
result, unlike signs a negative result.
ThuS' as examples of
the above general rules:
(+4) * (+3) =
=
(+4) ♦ (-3) = +4 -3 = +1
(^4) - (+3) = +4 -3 = 4-1
(+4) - (-3) = +4 +3 = +7
1.3.3
In the multiplication and division of positive and negative
numbers, where x and y represent any pair of numbers, if two
numbers have the same sign the result is a positive number.
the signs differ, the result is a negative number.
multiplication:
(+x) X (+y) = + xy
(+x) x (~y) = - xy
(-x) x (+y) = - xy
(-x) X (-y) = + xy
and in division:
x
y
(+x) ♦ (-y) = - x
y
(+x) 4 (+y) s +
(-x) 4 (+y) = - x
y
(-x) + (-y) = + x
y
Thus in
If
6.
Hence, for example,
(-4) X (-3) = +12
and
(-4) ♦ (+2) = -2
1.4
1.4.1
Factors
We have seen the value of symbols, their grouping into
9
brackets, and the rules which need to be applied in operations
in positive and negative numbers.
important operation in algebra:
algebraic expressions.
We may now consider an
finding the factors for
To understand what is meantby this,
consider again the expression (5) in 1.2.3 above.
It is
the multiplication of two factors, x and (y + z):
x (y + z)
(5)
This equals:
xy + xz
(6)
Now we ask the question:
do we factorise it?
in each term.
if we started with xy + xz, how
To do this, we note that x is a factor
We therefore say that it is a factor of the
whole expression.
To find the other factor we divide each
term by x and add the quotients, y + z.
Hence,
xy + xz = x (y + z)
(10)
We have thus factorised expression (6) into its two factors.
x and (y + z).
1.4.2
Similarly the reader should consider expression (8) in
A
1.2.3 above;
wy ♦ wz + xy + xz
(8)
7.
How might this expression be factorised?
The second two have a
two terms have a common factor, w.
common factor, x«
Note that the first
Dividing the first two terms in (8) by w,
Thus the factorisation of the
we obtain the factor (y + z)>
first two terms gives w (y + z).
Similarly we see that the
factorisation of the second two terms in (8) gives x (y + z).
We now have:
(11)
w (y * z) + x (y + z)
We note that there is a common factor, (y
terms of (11).
z), in both
Taking this common factor out, we obtain the
two factors of expression (8), and get:
(7)
(w + x) (y + z)
1.4.3
Not all expressions can be factorised.
For example:
xy * vw
I
is incapable of factorisation.
Further, other expressions
require more advanced methods for their factorisation than
are necessary for our purposes.
1.5
1.5.1
Powers of numbers
The product of equal numbers is called a power.
Hence:
(2 x 2) is called the second power of 2, or the square of 2
(2x2x2) is called the third power of 2, or the cube of
2; and so on.
Genexolising, where y is any number,
(y x y) is the second power of y, or the square of y, and
may be written y
2
(y x y x y) is the third power of y, or the cube of y,
and may be written y
3
8.
if y is multiplied by itself n times, where n is any number,
we obtain the
power
power of
of y,
y, which
which may
may be
be written
written y
y11.•
The symbol n is known as the index or exponent#
1.5.2
In multiplying two powers of a number, the rule to follow
is that the index of the product is the sum of the indices.
Thus, for example:
y
2
x y
3
= y
2+3
= y
*
5
This may be seen because:
(y x y) x (yxyxy) = (yxyxyxyxy)
= y
1.5.3
5
In dividing two powers of a number, the rule to follow is
to subtract the index of the divisor
dividend.
from the index of the
Thus:
y
5
+ y
3
= y
5-3
= y
2
This may be seen because:
(y x y x y x y x y)
(yxyxy)
= (y x y)\ = y2
Note that, for example,
y
3
-2
5
3-5
= y
* y = y
a,
where - 2 is^negative index.
This is simply the reciprocal
of the positive power of the number;
-2
1
or generally,
y
-n = —
1
n
y
thus
9.
The square of any number is positive, whether the number
1.5.4
is positive or negative.
If the operation is reversed, and the
square root of a number is required, it follows that the square
root may be either positive or negative
To understand this, note that '
(♦ y) x
U y) = y2
(- y) x (- y) = y2
and
2
If we require the square root of y‘ we use the
sign
"plus
or minus", thus:
r 2
y
= 1y
Sometimes rather than the square root sign,
, the index
Thus:
| is used.
X
X
In general, the notation for the n
th
root of a number is written:
nr“
Vx
or equivalently:
2
x
n
Exercise 1
The reader should now do Exercise 1, on pageS^J
1.6
Simple equations
p
1.6.1
In 1.1.2 above we introduced a simple formula(l), A =
This formula is in fact an equation:
of the left hand and right hand sides.
a statement of equality
The concept of an equation
10.
is a central one in mathematics, and it is important that the
reader should be fully familiar with the various operations
which may be made on an equation in order to solve it.
By
’’solving” an equation, or finding the "solution”, is meant
the process of finding the value of an unknown which satisfies
the equation (maintains the equality of both sides).
r
1.6.2
Let us take a simple example.
Suppose you were informed
that four times the salary of a newly-qualified teacher was
paid to a headmaster, whose salary was #500 per month.
What
is the newly qualified teacher’s salary?
Let the^salary (the unknown number) be symbolised as S.
We
can now formulate a simple equation:
4S
500
(12)
The solution of this equation requires simply that we divide
both sides of the equation by the coefficient of S, and obtain:
Q
500
125
(13)
s
=
“
The solution to the equation is that the newly-qualified
teacher’s salary is #125 per month.
1.6.3
Generally equations are not so very simple as this onel
Equations may consist of complicated expressions on both sides
of the equality sign.
However, correct use of various
operations will enable us to find the value of the unknown
symbol.
The reader should learn how to apply two basic rules
in the manipulation of equations:
i) if the same number is either added to, or subtracted
from, both sides of an equation, the two sides remain
equal
lie
ii) if both sides of an equation are multiplied or
divided by the same number, the two sides of the
new equation will remain equal.
If the multiplier or divisor
is negative, both sides change signs.
1.6.4
Let us use these simple rules to solve an equation.
Assume we wish to solve the following equation for the unknown.
x:
i
3x + 7 = 5x
5
(14)
The basic method used is to collect terms involving the unknown
on the left hand side, and other terms on the right.
Thus by
using the first rule in 1.6.3 above, we may subtract 5x from
both sides and also subtract 7 from both sides of (14), obtaining:
3x - 5x = -5 -7
-2x = -12
(15)
Now dividing each side of (15) by -2, we obtain the solution
to equation (14):
x
6
To verify that this is indeed the correct solution, we may
substitute x = 6 into the original equation (14):
3(6) +7 = 5(6) -5
confirming that the value x = 6 does satisfy the equation (14).
1.6.5
We have seen therefore that transferring her Kt?
from one
side of equation (14) to the other changed their signs in the
process (the first rule in 1.6.3).
Division of both sides by
the negative number, -2, changed the signs on both sides of
the equation (15) (the second rule in 1.6.3).
12.
EXERCISE 2
The reader should now do Exercise 2, on page 6 I .
2.
FIRST STEPS IN DATA ANALYSIS
2.1
Absolute numbers
y
2.1.1
The data that emerge from the data-gathering processes
(censuses, surveys, etc.) are very often expressed in actual, or
absolute, numbers.
Simple presentations of the actual absolute
data is of course sufficient for some purposes.
For example,
we may wish to know how many births occurred in a particular
country, or how many pupils entered the school system,
in a particular year.
Many such questions might be envisaged.
A basic method in the preliminary analysis of data is presentation
in the form of a time-series.
A time-series may be defined as
a set of ordered observations on a quantitative characteristic
of an individual or collective phenomenon taken at different
points of time.
Although it is not essential, it is common.
and helps interpretation, for these points to be equidistant
in time.
For example, much educational data ace,published
'0v\ an annual basis.
Example 1 presents an annual time-series
of the absolute numbers of pupils enrolled at the primary
level in Niger over the period 1972 - 1976.
13.
Example 1:
An annual tine-series of absolute numbers
Absolute numbers of pupils enrolled at the primary
level, Niger, 1972-1976
1972
1973
1974
1975
1976
94,500
100,892
110,437
120,984
142,182
2.1.2
It can be seen immediately from the ordered time-series
in Example 1 that primary enrolment was increasing in Niger
over the period 1972-1976.
However, these absolute numbers
alone cannot inform us how other variables (1) relate to^or
In data analysis we often need to
explain, this increase.
consider how a particular absolute number - or set of numbers -
relates to other numbers.
This introduces the general concept
of rates.
2.2
2.2.1
Rates
There is a great variety of different types of rates.
Most readers will have heard, for example, of currency exchange
rates, rates of growth, rates of tax, rates of interest, and
(if involved in making educational projections), rates of
promotion, repetition and dropout.
The reader can doubtless
think of many other examples.
Ol
(1)
For a definition ©invariable, see above, para. L J. I
14.
2.2.2
What do all these apparently very different specific uses
of the word “rate” have in common?
ratio.
The answer is:
the idea of a
A ratio is a quotient which indicates the relative size of
one number to another.
Example 2:
Ratios
Ratio of enrolment at all levels of education to population aged
7-24, Indonesia.
*
1971
1973
1975
1.
Enrolment at all levels
18,411,827
19,869,957
21,872,075
2.
Population aged 7-24
45,530,688
48,832,242
52,282,907
3.
Ratio:
line 1
line 2
0.4044
0.4069
0.4183
2.2.3
Example 2 shows how, over the period 1971-1975, the so-called
“overall enrolment ratio” (1) developed.
had risen to 0.4183 by 1975.
It was 0.4044 in 1971 and
It can be seen that this particular
type of ratio, like all ratios, is obtained by division:
line 1 by line 2.
here, of
Thus in 1971 the ratio of 0.4044 was obtained by
the following quotient:
enrolment at all levels in 1971
population aged 7-24 in 1971
18,411,827
45,530,688
0.4044
'The ratios for 1973 and 1975 were similarly obtained (the reader should
check the answers).
2.2.4
If in fact the reader calculates the ratio 18,411,827 ,
45,530,688
he or she will find that the calculated ratio equals 0.4043827, when
taken to 7 ’’decimal places”.
Note that the number of “decimal places”
refers to the number of figures after the decimal point.
(1)
The figure
Enrolment ratios are usually expressed in /form of percentages,
a concept explained in 3
below.
A
15.
given in this example, 0.4044, has been reduced from seven
to four decimal places:
we say that it has been "rounded11
to four decimal places.
2.3
Significant figures and rounding
2.3.1
In principle mucA of the Ttxw
which we
in
educational planning could be made perfectly accurate:
in
practice, errors from various sources enter into all the
stages of data collection and processing.
Perfect accuracy
would be extremely costly to attain, and the high costs
would not, in general, be justifiable.
And even if we could
overcome all the practical problems of accurate measurement,
we would still frequently prefer to approximate our data.
For example, instead of saying that the population aged 7-24
in Indonesia in 1975 was 52,282,907 (see Example 2 above),
we may say that it was approximately 52 million.
In making
such an approximation, we should define the degree of
approximation.
Thus we could express the above result as:
52,000,000
+,
500,000
or
52 million to the second "significant figure"5
both expressions meaning the same thing.
Where a decimal
point is involved, the zeros needed to locate the decimal
point are not counted as significant figures.
2.3.2
Numbers which result from accurate countings are
exact and so have an unlimited number of significant
figures.
Given the existence of errors in data collection
however, a number such as 52,282,907 may have an uncertain
number of significant figures.
There can be no absolute
rule, therefore, for deciding on the "correctnumber of
16.
significant figures.
The appropriate number to work with will
often depend on the particular circumstances:
your knowledge
concerning the accuracy of the sources of the data, and the
uses to which they will subsequently be put.
2.3.3
The main danger in using approximate figures lies
in giving the impression of a greater degree of accuracy
than is actually justified.
If, for example, we were to
add the following numbers, representing, perhaps, populations:
762
(accurate)
1,900
(to the second significant figure)
123,000
(to the third significant figure)
125,662
the answer, 125,662 is misleading.
For the second figure, 1900,
could have been anywhere between 1850 and 1950;
and the third.
123,000, anywhere between 122,500 and 123,500.
So, rather
than exactly 125,662, the answer could have been between a
minimum and a maximum:
762
762
1,850
1,950
122,500
123,500
125,112
and
126,212
The difference between the two extreme possibilities is 1,100.
The original answer should have been better expressed:
125,662
2.3.4
+.
550
Whether we wish to approximate our data because of
known inaccuracies or because we simply wish large numbers to
be more readily digestible, we must decide on a method for the
process of •’rounding”.
Assume, for example, that we wish
17.
to round a number such as 3.67 to one decimal place.
The result
of rounding is 3.7, since 3.67 is nearer to 3.7 than to 3.6.
Similarly, 103.8135, after rounding to the nearest hundreth,
that is to two decimal places, would become 103.81, since
¥
103.8135 is closer to 103.81 than to 103.82.
If however we
were faced with the number 103.815 and wished to round it to
2 decimal places, we would be in something of a dilemma:
for 103.815 is just as close to 103.82 as it is to 103.81.
It has become a useful convention to round in such cases
to the even integer preceding the 5.
This practice is
useful in reducing cumulative rounding errors when a large
number of operations is involved.
Thus 103.815 is rounded
103.825 is also rounded to 103.82, and 103.835
to 103.82;
is rounded to 103.84.
Rounding to the nearest million,
16,500,000 would be 16,000,000;
17,500,000 would be
18,000,000.
2.3.5
Adding a set of rounded numbers, as we saw in 2.3.3,
inevitably involves a degree of error in the final result.
This cannot be avoided, but should never be entirely
overlooked.
Percentage figures are very often rounded to
The (percentage) figures in
one or two decimal places.
the first column below have been rounded to one decimal
place in the second column:
%
%
40.55
40.6
30.35
30.4
29.10
29.1
100.00
100.1
18.
We see that the original correct total of 100% has become 100.1%
in the second column, due to the rounding process.
This
cumulative rounding error inevitably occurs fairly frequently.
The second column total should not be written, incorrectly,
as 100.0^ but a footnote or comment placed in the table
*
mentioning the occurrence of rounding error.
Example 3:
Rounding numbers
Each of the following numbers has been rounded to the
(Remember the convention of
indicated accuracy.
rounding to the even integer preceding a 5).
Rounded to
Original number
7.5001
Result of rounding
8
nearest unit
48.6
it
-3.674
ii
hundreth
-3.67
7.9283
ii
thousandth
7.928
9,499
ii
thousand
9000
9,500
ii
thousand
10000
-10,500
ii
thousand
-10000
ii
million
17,000,000
16,500,001
49
ii
Exercise 3
The reader should now do Exercise 3, on page
2.4
2.4.1
•
Proportions
A proportion is a ratio relating the magnitude of a part to
its whole.
Hence a proportion, P, must lie between zero and unity:
0 < P
$
1
For example, a part of an enrolment total might be expressed as a
proportion of its whole.
The example below shows for each year
the proportions of all children enrolled at the primary level in
19.
Gabon over the period 1972-1976 who were female.
The proportions
have been rounded to four decimal places.
Proportions
Example 4:
Female proportions of total primary level
enroIment , Gabon
1972
1973
1974
1975
1976
1. Female enrolment
50,505
53,401
55,354
58,995
62,736
2. Total enrolment
105,601
110,466
114,172
121,407
128,552
3. Proportions: line 1
line 2
0.4783
0.4834
0.4848
0.4859
0.4880
2.5
Percentages
2.5.1
It is common practice to express proportions in percentage
form.
A percentage is a proportion in a hundred.
Thus to convert
proportions to percentages, the proportion is multiplied by 100.
Example 5:
Percentages
The proportion comprised by girls in primary level enrolment in
Example 4 is given in 1976 as 0.4880.
By multiplying this
proportion by 100, this may be expressed in percentage form:
0.4880 x 100
48.80 percent
often written
48.80%
Exercise 4
The reader should now do Exercise 4, on page 63.
2.6
2.6.1
/n
variables over time
In analysing educational data, educational planners and
statisticians very frequently use time-series analyses
We have
20.
already been introduced to the concept of a time-series.
is the presentation of a series of data ordered over time.
It
The
observation of the manner in which particular variables grow,
stay constant, or decline over time can help in describing
and explaining how educational systems have behaved in the past.
They also play their part in predicting how variables might behave
in the future.
2.6.2
An important extension of the fundamental idea of a rate is that
of a rate of change over time.
In Example 1 we saw that primary
level enrolment in Niger was growing over the period 1972-76.
what rate did it grow?
variable rate?
At
Did it grow at a constant rate or at a
Can we develop concepts to express simply and clearly
how variables change in magnitude over time?
We now discuss the
basic techniques available to demonstrate changes over time.
2.6.3
An initial distinction should be made between absolute and
relative changes over a period of time.
We shall use the following
symbols in analysing the data already presented in Example 1.
P
= primary enrolment in the initial year
n
= number of years in the period
o
P n = primary enrolment in the nth year
21.
Example 6
absolute growth in primary enrolment from year 0 (1972)
to year 4 (1976) is simply the
difference between
enrolment in the two years:
142,182 - 94,500
(Pn - P )
o
Ob)
47,682
The relative, or percentage, growth in enrolment may be seen
as the ratio of the absolute growth over the period expressed
as a percentage of the initial enrolment figure:
(l,„ p
=
2.6.4
100%
X
o
47,682
94,500
x
50.46%
(to two decimal places)
('7)
100%
The reader should note carefully that the percentage growth
over the period is the absolute growth expressed as a percentage
of the figure in the initial year of the period, not in the
final (or any other) year.
Thus, for example, a 100% growth over
a period would be correctly interpreted as meaning that the
absolute figure at the end of the period was double that at the
commencement•
Exercise 5
The reader should now do Exercise 5, on page
2.7
2.7.1
Rates of growth over time
We have introduced the ideas of absolute and relative growth
22.
in variables over a period of time.
rates of growth over time.
We now consider the question of
That is, we ask how we can measure the
rate of growth per time-period.
2.7.2
Returning to Example 6 above, we saw that the absolute growth
in primary enrolment in Niger over the 4-year period 1972-1976 was
We may define the average annual absolute growth as:
47,682 pupils.
P
n
- P
(li)
o
n
which, in this example, is:
142,182 - 94,500
4
= 11,920.5 pupils per year
The reader should note, of course, that this average annual absolute
growth was never actually observed between any years.
average:
a statistical artefact.
It is an
Nevertheless, like any summary
statistic (1) its value lies in its comparability with other
similarly calculated statistics, for example, over different time
periods or across different countries.
2.7.3
If a variable - such as a population - grew by a constant
absolute amount each time-period (say, a year) it would, if plotted
on a graph, describe a straight line (i.e. it would display ’,linearH
(2) growth).
In general, variables such as populations do not grow
(or decline) in a linear fashion.
Any constancy in the pattern of
change is more likely to be seen in a relative sense.
Populations
are more likely to show a constant proportionate, or percentage,
growth per time-period.
Thus the planner or statistician may often
(1)
See the discussion of summary statistics in Se«ir\on
(2)
Linear and non-linear functions are discussed in ^Sec-Won
.
23.
be interested to discover by what percentage, on average, a variable
grows per year over a period of time.
In addition, percentages
are more readily comparable one with another.
This is true both
for comparisons made for different time periods for the same
variable and, more importantly, when comparing the growth rates of
different variables, which may not even be measured in the same
absolute units.
2.7.4
The measure which expresses the average percentage by which a
variable grows per year over a period of time is the average annual
It is important to note immediately that this figure
growth rate.
is not obtained by dividing the percentage growth over the whole
period by the number of years in the period.
It would be wrong,
for example, to say that the average annual growth rate of the enrolment
data used in Example 6 above could be obtained by dividing 50.46%
by 4.
The reason for this will become clear after we have examined
how the average annual growth rate is in fact calculated,
To see
r..
this, we shall use the previous notation Pq, P^ and n, and
introduce "r”, the average annual growth rate, expressed in proportional
terms.
2.7.5
Assume a population grows by a constant proportion of r
per year.
If for example, it grew by one tenth (i. e., 10%), each
year, r would equal 0.10.
At the beginning of the period under
consideration, the size of the population is P^.
elapsed, the population has grown to size P^:
P!
P
pi
p
o
o
P r
o
(1 + r)
After one year has
24.
After another year had elapsed, the population has grown to size P^:
\
P2
P1
plr
P2
= P1
(1 + r)
Pf
1
= po
(1 + r)
P
= PP.
o
(1 + r)
2
after n years of constant
annual proportional growth r
P
P
n
If we know P
(1 «■ r)n
o
and P
o
(and of course n) the question now becomes,
n
A little algebraic manipulation^shows us that:
how do we find r?
(1 * r)
, .
the size of the population will be :
P
n
=
n
P"
o
(1 + r)
"/P
. / _n
r
7?
V* po
’
1
o
Average annual growth rate
Example 7:
Using the enrolment data in Example 6, let us calculate the
average annual growth rate, r, over the four year period
1972 - 1976.
P
o
P
n
We have:
= 94,500
= 142,182
n = 4
Hence applying formulaabove, we find:
r
V142,182
Y 94,500
1
(AJ?)
25.
2.7.6
How do
calculate this number?
The problem we face
is that it involves calculating the “fourth root” of
P
n
a
P
o
The fourth root of this ratio is that number which, when
multiplied by itself successively four times, equals the ratio (1).
♦
For example, the fourth root of 16 is 2;
But this is an easy example.
because 2x2x2x2= 16.
Methods for calculating roots do
exist, but are very time-consuming unless we use the technique of
’’logarithms”.
We shall therefore explore this technique before we
return to finish the problem of calculating Itpll in Example 7 above (2).
3
I.
LOGARITHMS
3.1
The idea of logarithms
3.1.1
To the reader who has never used logarithms, the sight of a
table of logarithms, with column upon column of dry numbers, often
seems forbidding.
Yet a little practice in their use has a great
pay-off in terms of time saved in future calculations.
!
It is, the
reader may be assured, a small V\(L extremely worthwhile investment
time to master their use.
of
Readers familiar with logarithms
may omit this section.
0/L
3.1.2
Let us proceed by^/example.
Consider the number 100.
This,
of course, equals 10 x 10, which we have seen (in 1.5 above) may
_2
be written as 10 .
Similarly, consider the number 1000.
3
10 x 10 x 10 = 10 .
We now recall what happens to the powers when
we multiply 100 by 1000.
This equals
From above, we see that this may be
written:
io2 x 103
(1)
See paras. l.S^l- 1.5.4 above
(2)
Readers with access to electronic calculators should note that
results obtained using calculators may differ slightly from those
using logarithms.
I
This is because of rounding errors involved in
using four figure logarithms, and rounding performed by calculators
26.
which equals:
105 (= 10^000)
The reader will note that we have added the powers.
Now let us define the logarithm (”to the base 10”) (1) of
3.1.3
*
100 to be the power to which we must raise 10 to give us 100:
is, 2.
that
Thus we have,
lo«10
100
2.0000
we shall consider here "four figure" logarithms, the four figures
(the mantissa) referring to the number of figures after the decimal
point.
Similarly, if we ask ourselves to which power we must raise
10 to give us 1000, the answer, as we have seen is 3;
lo810
Now consider what
1000
hence,
3.0000
when we add these two logarithms:
loE10 100 + log10 1000
2.0000 + 3.0000
5.0000
What number has a logarithm of 5.0000?
the "antilogarithm’1 of 5.0000?
What, in other words, is
_5
The answer is 10 , that is, ICQ000.
Thus we have performed the multiplication of two numbers by adding
their respective logarithms and then taking the "antilogarithm" of
the result.
3.1.4
We see that the great advantage of the method of logarithms
is that the simple process of addition replaces the relatively
complex process of multiplication.
And, as we shall see, the method
also copes with division by the relatively simple process of
subtraction.
(1)
Logarithms may in principle be based on any number,
with a base of 10 are known as "common" logarithms.
Hiose
27.
3.2
Tables of logarithms
3.2.1
To master the use of logarithms requires a little practice -
and of course, a table of logarithms.
The reader should now turn
to the annexe, where he or she will find a table of common
logarithms followed by a table of antilogarithms.
3.2.2
We have seen that the logarithm (to the base 10) of 100 is
2.0000.
«...is easy to
, see, .because we all
.. ,know that 100 = ^2
This
10 •
But what is the logarithm, purely as an example, of 3.642?
of 49.43? or of 111.1?
indispensible •
or
It is here that the table of logarithms is
Most such tables are four-figure tables, and these
are adequate for the majority of purposes an educational planner
would have.
3.2.3
The question
Consider the meaning of the logarithm of 3.642.
we are asking is:
give us 3.642?
what is the power to which we must raise 10 to
Because that power is defined as the logarithm
(to the base 10) of 3.642;
10loE103-642
that is,
3.642
The table of logarithms provides us with the answer.
table and look down the first column:
numbers from 10 to 99.
Turn to the
it is a column of two figure
To find the logarithm of 3.642, look down
the column until you come to n36H.
Now move along this row, across
the columns, until you come to the first column headed ”4M;
you
will find that the four figure number in row ’’SO”, column ,,4", is
5611.
We have not quite finished yet as we wish to allow for our
final figure, 2.
to the right.
(36).
You will see a second set of columns headed 1-9
/
Look under column 2 against the same row
The number ”2” appears.
have so far obtained, 5611.
This should be added to the figure we
We have now obtained log
10
3.642:
28.
0.5613
1Og10 3.642
That is to say, from our definition of logarithms to the base 10,
100.5613
3.2.4
3.642
Let us now find the logarithm of 49.43.
Preceding in the
same way, you should find:
1.6940
logl0 49.43
The reader will ask:
decimal point?
why does the figure ”1” appear before the
This figure must be supplied by you? the user of the tables.
The figure is known as the ’’characteristic”, or ’’index” of a
logarithm.
The rule to be adopted is this:
The characteristic of any number greater than one is positive,
and is less by one than the number of figures to the left of
the decimal point.
Thus, in our first example, the number 3.642, the characteristic of
the logarithm is 0, because there is only one figure to the left
of the decimal point.
1.
In the second example, 49.43, the index is
In our third example, 111.1, the index is 2.
The reader should
confirm for himself that:
logio 111.1
3.2.5
2.0457
How do we deal with a number with no figures to the left
of the decimal point, for example, 0.2327?
The rule to follow
is this:
The characteristic of a number less than one is negative, and
is greater by one than the number of zeros which immediately
follow the decimal point.
Thus the reader should satisfy himself or herself that:
*
29.
loS10 0.2327
1.3668
And, for example, that
logiO 0.0037
3.5682
(The figures 1 and 3 are to be spoken as “bar one” and ”bar three”).
3.3
Tables of antilogarithms
3.3.1
We have seen that when multiplying numbers, we add their
logarithms.
We thus obtain another logarithm:
this represent?
what number does
To find out, we TnO-J use the table of ’’antilogarithms”
Take, for example,
2.0457
Of what number is this the logarithm?
To find this, ignoring for
the moment the index figure 2 (for this simply tells us the position
of the decimal point in the number we shall find), scan down the
first (two—figure) column of the table of antilogarithms.
On reaching
*\04”, move across the table until you reach the first column headed
”5”.
You will see the four figure number, 1109.
to allow for the final digit, 7.
You still have
So move across to the second of
the columns headed ”7”, where you see the number 2.
As in the table
of logarithms, this is added to the four figure number you have
already obtained, 1109.
Thus you have found that the antilogarithm
of 2.0457 is:
111.1
Note the position of the decimal point.
It has three figures in
front of it because the characteristic is positive and equals 2.
3.4
3.4.1
Use of logarithms in multiplication and division
Wen multiplying numbers, we add their logarithms.
dividing we subtract their logarithms.
(l) lb io
"ik reverb \
When
•
30.
For example,
1000
10
io2
io3 x 10"1
So in order to divide 1000 by 10, we subtract their logarithms:
loS10
/ 1000
< 10
10g10
1000 - log1Q
10
3.0000 - 1.0000
2.0000
Looking up the antilogarithm of 2.0000, we find that it equals:
100.0
which, of course, is the correct answer.
Example 8:
Use of logarithms in multiplication
We shall multiply together the first three numbers we discussed
above, that is, find the product of:
3.642 x 49.43 x 111.1
To calculate this product using logarithms, it is good
practice to set out the data in two columns as below:
logarithm
number
3.642
0.5613
49.43
1.6940
111.1
2.0457
(add)
4.3010
Adding the respective logarithms, we see .that they total 4.3010.
Of what number is this the logarithm?
table of antilogarithms.
Look up 0.3010 in your
You will find the number n2000”.
Where should the decimal point be placed?
is 4;
The characteristic
this means, following our rule above, that there should
be five figures before the decimal point.
20000.0
Therefore the answer is:
31.
3.4.2
Let us now consider the following product:
0.2327
0.0037
x
Recalling our rule about the indexes of numbers less than l,we
may write:
4
number
logarithm
0.2327
1.3668
0.0037
’3.5682
(add)
4.9350
Proceding as before, you should now look up 0.9350 in the table of
antilogarithms, and find the number 8610.
4;
The characteristic is
you must therefore place three zeros after the decimal point
before the first digit.
The answer is therefore:
0.0008610
3.4.3
It should always be remembered that the "bar” over a
characteristic means that the characteristic should be treated as
a negative number.
x
106.4
Thus if we wish to find the product of:
0.0039
we should write:
number
logarithm
106.4
2.0269
0.0039
3.5911
(add)
1.6180
The product is found by looking up the antilogarithm of 1. 6180:
0.4150
\<3 O
32.
Example 9:
Use of logarithms in division
Let us divide 106.4 by 0.0039:
106.4
0.0039
We must now subtract the logarithm of the denominator (the
figure in the lower part of the ratio) from the logarithm of
the numerator (the upper figure):
number
logarithm
106.4
2.0269
0.0039
3.5911
(subtract)
4.4358
We find our answer is antilog (4.4358):
27280.0
Exercise 6
d.o
The reader should now
Exercise 6 on page
b T.
3.5
Use of logarithms in finding the powers and roots of numbers
3.5.1
Another extremely valuable use of logarithms is in rapidly
finding the powers of a number. Consider a simple exercise:
3
what is the value of 10 ? We note that log-^Q do3) = 3 log10 10:
1OS10
(io3)
lo®10 (10 x 10 x 10)
ioglO 10 + loK10 10 + loK10 10
3(1°Sio 10)
3(1.0000)
We find antilog
10
3.0000 = 1000.
4
Similarly, if asked to find, for example, the value of 3.724 ,
33.
we would recognise that;
4
loS10 (3.724*)
=
4(1oB10 3.724)
=
4(0.5710)
2.2840
Looking up antilog 10 2.2840, we find that the answer is:
192.3
3.5.2
How would we find, for example, the nth root of a number?
By dividing the logarithm of that number by n, and proceeding to
find the antilogarithm.
Example 10:
Use of logarithms in calculating roots
What, for example, is the value of r in Example 7 above?
4 /142,182
r =
'V 94,500
1
We proceed by finding the logarithms of the numerator and
denominator and subtracting the latter from the former, giving
us the figure 0.1775.
antilog
(
We
Taking the antilogarithm, we find the answer:
obtain 0.0444.
r
This figure is now divided by 4.
h toi
10
1.108 - 1
0.108
number
logarithm
142,182
5.1529
94,500
4.9754
4
0.1775
0.0444
(subtract)
34.
3.5.3
For our final example of the use of logarithms, we
calculate the root of a number lying between 0 and 1 (note
that the logarithm of 1 is defined to be zero, and that
logarithms of negative numbers do not exist).
We have seen that
logarithms of numbers between 0 and 1 have negative characteristics.
How do we divide a negative characteristic by the given value
of the root?
The technique involves splitting the characteristic
into two parts.
The first part is negative and is chosen to be
exactly divisible by the value of the given root.
The second,
compensating, part is positive and is placed against the mantissa
of the logarithm.
3.5.4
In the first of the two examples in Example 11 below,
2 + 1 is written for T, so that the negative part can be divided
exactly by 2..
In the second example, 3+2 replaces 1 in order
that the negative part should be exactly divisible by 3.
Example 11:
Use of logarithms in calculating the roots of
numbers between 0 and 1
1
Calculation of the square root of 0.56
The logarithm of 0.56 is T.7482.
by 2 We rewrite it thus:
2, we obtain:
1
2
In order to divide this
+ 1.7482.
After division by
+ 0.8741, which we may write as 1.8741.
Taking the antilogarithm we find the answer:
2
0.7484.
Calculation of the cube root of 0.29
The logarithm of 0.29 is T.4624.
by 3, we rewrite it thus:
3, we obtain:
In order to divide this
3 + 2.4624.
After division by
T + 0.8208, which we may write as T.8208.
Taking the antilogarithm we find the answer:
Exercise 7
The reader should now do Exercise 7 on page
0.6619.
35.
4
THE AVERAGE ANNUAL GROWTH RATE
4.1
Calculation (continued)
Using formula
We may now return to Example 7.
4.1.1
developed
in 2.7.5, we found that the average annual growth rate of elementary
enrolment in Niger over the period 1972-76 was given by the
expression’:
4 142,182
r = a/
94,500
(A3)
-1
After introducing logarithms, we saw in Example 10 that r was
calculated to be (1.108) - 1 = 0.108.
This tells us that over the
time-period, enrolment grew at an average annual proportional rate
of 0.108.
4.1.2 Usually this figure is expressed in percentage terms.
The
reader will recall that a percentage figure is obtained by multiplying
a proportion by 100.
Thus, the average annual percentage growth
rate of primary level enrolment in Niger over the four-year period
1972-76 was:
r = 10.8%
4.2
Use of average annual growth rates
4.2.1
Such growth rates are widely used in comparing the growth
(or decline) both of one variable over different time-periods and
of different variables.
a constant
It is worth reminding the reader again that
rate of growth. r, does not mean that the variable
will increase by a constant absolute quantity each year.
It means
that it augments itself by a constant proportion (or percentage)
each year.
This implies that the absolute increment each year will
increase over time, as the base on which the constant proportional
growth is calculated each year is itself steadily increasing.
36.
(The reader may benefit by carefully re-reading paras. 2.7.3.*2.7.5.
above).
Exercise 8
.
The reader should now do Exercise 8 on page
4.3
Time taken for a variable to increase by a given magnitude or
proportion
4.3.1
Frequently, the following sort of question may arise in
if a variable, for example population, continues
educational planning:
to grow at its current average annual rate, how long will it be
until the variable is half as big again?
or has doubled?
The same
of
At the present rate of
sort of question could be asked^enrolment.
growth, how long until enrolment has doubled?
4.3.2
or trebled?
The answer to these question may be obtained by consideration
of formula (Al ) again.
P
P
n
Previously,
o
(1 > r)n
(if)
we knew P , P and n and sought to calculate r.
n
o
Now,
the problem is that we know P' » P and r, and seek to find n, the
o
n
length of time it takes P o to grow to size P n at a given rate r.
4.3.3
To find the value of n in terms of Pq, P n and r, the most
straightforward method is simply to take logarithms of
l°g Pn
iog Po
n log (1 *
)
Re-arranging, (recall the “Ku/cJ described in /-6<^?above)
n log (1 + r)
n
=
log Pn - 1OS PO
= 1Og Pn ~ 1Og P°
log (1 + r)
=
1OE(>)
n
(^)
log (1 + r)
):
37.
Example 12:
Calculation of n
In the ten years from 1965 to 1975 the total number of persons
enrolled in Africa at the first, second and third levels of
education rose from 29.9 million to 52.9 million.
This
represented an average annual increase of 5.87% » If growth
were to continue at this constant rate, how long would it be
after 1975 that African enrolment was 50% greater?
To
answer this, note that when enrolment is 50% greater in n
years* time.
P
P
n
1.5
o
Therefore applying formula
n
(»
log 1.5
log 1.059?
0.1761
0.0248
7.10 years
4.3.^ Example 12 shows us that^ in 7.10 years’ time from 1975 f
enrolment at the first, second and third levels of education in
Africa will be 50% greater, assuming a constant average annual
growth rate over the period of 5.87%.
Exercise 9
The reader should now do Exercise 9, on page
5
MATHEMATICAL FUNCTIONS
5.1
5.1.1
Introduction
Why should a practical person interested in educational
planning be concerned with seemingly highly theoretical and abstract
matters such as "non-linear functions" and other concepts discussed
38.
below?
it is not, perhaps, immediately obvious1
are fundamental.
But the reasons
The planner (or the analyst of events in the
past) is essentially concerned with relationships between variables.
One initial problem is to specify these relationships.
To
’’specify” means to decide which variables we believe depend on
which other variables.
A second problem is actually to ’’estimate”
these hypothesised relationships.
Just how, for example, does
population depend on the other specified variables?
What are the
magnitudes of their separate contributions to changes in population?
This issue of actually measuring statistical relationships between
variables is a topic best dealt with in a statistical context, and
is more fully discussed in
IM
Ad
hj
KOH
Oh
/Kt,
1-0
39.
5.1.2
▲ variable ia a symbol, such as X, Y, g, K, u, which
can take on any of a prescribed set of values.
This set of values
is called the domain of the variable.
5.1.3
Thus, for example, a promotion rate from one school grade
It could (in
to another is a variable, and could be written ’’p”.
principle) take on any value between 0 and 1, which is therefore its
We could write its domain as follows:
domain.
0< P £
where *’
it
1
means ’’less than or equal to”.
’•greater than or equal to”.
The symbols
Similarly
and
means
mean ”less than”
and ”greater than” respectively.
5.1.4
If a variable can theoretically take on any value between
two given values, it is called a continuous variable.
Otherwise it
is called a discrete variable.
5.1.5
The concept of a function is an extremely important one.
since it is the nathenatician's (and statistician's) way of expressing
relationships between variables.
Formally, if
the values of a variable Y depend on the values of a variable X,
we may say:
”Y is a function of X”, and write in general:
Y
Cts-;
f (X)
This expresses the idea that, to each value which a variable X can
take on, there corresponds one or more values of a variable Y.
reader may come across other letters than ”f”
, etc.
They have just the same meaning.
of possible instances of functional dependence.
such as F,
There are a multitude
We have already seen,
for example, that primary level enrolment in Niger may be seen
as a function of time, t;
E
f(t)
we may write:
The
40 •
That is to sayy
enroIment
dependent" on the variable tine.
is a variable which is "functionally
Note that we do not write it the
other way round:
t
627)
f (E )
3
We do not attach any meaning to the idea that time depends on enrolment
levels1
5.1.6
Formally, when we write
Y
3
f (X)
we define Y as the dependent variable and X as the independent variable*
Sometimes, especially in statistical analysis, X is called the
explanatory variable*
Linear functions
5-2
5.2.1
Functional dependence between two variables is frequently
suggested by a table (as in Example 1 above).
Where
there is an exact mathematical correspondence, it m^y be shown by .an exact
linear equation connecting the variables, such as
Y
=
3X - 4
L^)
This equation is a particular linear function connecting Y and X.
It gives, in effect, the rules which govern the linear relationship
between X and Y.
If we know the values of the variable X, the
equation shows us how to find each corresponding value of Y.
This
function tells us:
"given the value of X, first multiply it by 3, then subtract 4,
and there results the corresponding value of Y".
Thus, for example, when X = 6, we may write:
Y
which is to say,
=
f (6)
41o
Y
s
A Y =
3 (6) - 4
14
In other words$ the value of the function is Y = 14 when X = 6.
Sinilarly, e.g., as the reader should verify:
when
Y = -4
X « 0,
X = -2.9, Y x -12.7
Y a 296
X = 100,
and so on.
5.2.2.
There are many varieties of functions and no attenpt can
be made here to investigate any other than the sost basic.
The
linear function in two variables, which we looked at, is perhaps
the simplest.
It is ’’linear1’ because if drawn on a graph (Figure
seen to be
2 below) it is
5.2.3
a straight line.
The general equation of a linear function (a straight
line) is usually written:
C^o)
Y = a + b X
In our example above, see
),
a = —4
b = 3
There is an infinite number of possible linear equations, since both
a and b can in principle take on an infinite number of values.
and b are known as the coefficients of the equation.
a
We shall see
later, in considering graphs of functions and in the chapter on
statistics, that a and b have important practical interpretations
in analysing relationships in the field of education.
5.2.4
The concept of a linear function may be extended to two, ,
or more, variables.
For example:
s
f (X, Z, K)
Y
42.
is a function with 3 independent variables.
That is, it stands, for a
situation in which Y depends on three different variables.
A particular
instance of this function could be:
(31)
Y=4+2X-3Z+4K
Again, this may be seen as a set of rules for finding Y, given
particular values of the variables X, Z and K.
5.3
Non-linear functions
5,3.1
The general equation of a non-linear function of the
’’second degree” is:
Y=a + bX + cX
2
(33)
'Hie presence of a squared term in X makes the graph of the function
curved (in fact it is a parabola).
Equations in which the highest
power of an independent variable is 2 are known as quadratic equations.
Perhaps the simplest non-linear equation is:
Y = X2
which is shown graphically below, after the reader has been introduced
to the concepts of rectangular coordinates and graphs.
Exercise 10
Exercise 10,. on
The reader should now do
•
6. GRAPHS
6.1 Rectangular co-ordinates
6.1.1
Functions may be depicted graphically.
Y = a + bX
A linear function
(lo)
in which there is one explanatory variable may be easily drawn.
given a and b’s values.
Consider two mutually perpendicular lines,
43.
XZ OX and Y^OY, intersecting at 0
in
called the X and Y axes, respectively.
Figure 1.
These lines are
They should be scaled as
appropriate for the variables under consideration.
6.1,2
Point 0 is called the origin.
By convention the Y is
the vertical, and the X the horizontal, axis.
▲gain by convention
the X axis is scaled negatively to the left of the originv positively
to the right.
The Y axis is scaled negatively below the origin
and positively above.
6.1.3
Consider any point, P.
If perpendiculars are dropped
fron the point to the axes, the value of X and Y where the perpendiculars
neet the axes are called the rectangular coordinates of P (or often
simply the coordinates of P).
The coordinate X is sometimes called
the abscissa, and of Y the ordinate of the point.
if we look at the point P
1
in Figure 1
Hence, for example.
, we see that:
the abscissa is 2
the ordinate is -4
the coordinates are therefore (2, -4)
Similarly, the coordinates of points Pg, P^ and P^ are (3,3), (-2,1)
and (-5, -4) respectively.
written (X, Y);
The great
Notice that the coordinates are always
the X (abscissa) value coming first, by convention.
usefulness of this technique is that, giyen the
coordinates of a point, we can ’’plot” it on the figure.
6.2
Plotting the graph of a linear function
6.2.1
Let us see how the linear function we have discussed may
be plotted:
Y = 3 X - 4
(^)
We shall plot the graph (see Figure 2) over the domain:
44-;,
1L
blLM-
drddd
-B L- .'J'•'••!-■■•
11
HHi-' aiB 'MB
Ed -4<
U: ’h
“te
M
17
— -
LHLi
■ i;:l:--f
-4.: 44
”<■
t
4;^
BR:W
71
M. .
' wl-
TbhntrriF
L.M1 'M 4:4:4 SiijO_Ldi:d4 i
4i
I /i
, I—I. . • . . .»• -< • -
I
I
>
‘ -
-711 K-trrii ■I’:!-
:
"T7
hiMlSii:L
. ^ir-;T
w
.1 :■ -.
BiLiB
4
. t■ r
-
■
I
L J.
r.
L—-
MK.!d.iM;.'.L. j...
L
I
I
I
.■..,.44 '
• JMLL;
:
■ B: - M '
LMbb
“
ddL..--'-'*
XEEJE
*r1-*
j
<.
bH4: MM;:' 4
L-lEB-
■ '■[LlS:}1'' '
MMUilld
;.:r
bLL■Ml .....MMM
.Lhr^-4 BB : H L: bb '■
11LL.
1M-B
-Ml.:;B-M
.^u:^ MlHMMii
4^.1M4L
Lv^. 4
I
H!
■
..
.
03
M !
42£.4- ...
_ _lL'IlIl 7, :.i
M-BS
W
Mild.
T-
; ._ .
-•
i:
- ■4
_!M|. : . .
i‘
-■i~
tii-B:
;nB
- ---------
!..:',44; -
r ’w’- fiL
•i
..uj.
; ...... j. ... .
■
i.d
11144:
IImI
B-i— BE-i 4 ■,
'.
l.E: :l’i
41;".r . ; .:|
.cr
---- ----- - i;::?::::. 1.. .. .
fOt-TTFHTl-^
4 ■: ■■ L
■’”■ r:T~t------ r
.tl ; - 1
* <
Ml d
'7'1.
' r';
‘
......
rt1-
y
;
444'!-
4£
.-L-
: ■
-r-t—;■
M4 :
L
. -1. - ■
■
Mr.:’ /•iLMirylLi;
|»-B! •••
Ldd.L4
B'd-R: i: P 'r
IE
Ed BibE^LF .-.
-4 ~^.
.
---- llli.-'
EbEE
l .-LLj
• • • •• -h
HMd-ld-M--- ;d .......M—-L-M MM“ 1:
UiaiE
OE
ME-
It:
I
. :1 U’
Edid ' ■
•
P ’ '- “L
Era
.
:• . .:: -Lu--M.-.:?
MLiLlM-^d
e
|lM4jd m
'S
LM
U -—
►*
3
? 3
IMP
ddw
*4
»-*
•1
:44:Ii4
4-44 1..J1. LHad i
sc*
O
ra
:
B.
-M
«
> !S34
„....,3SE
r.M,LB‘MH-41
■
’
■ ..MLB;
d*d‘: j fc~‘ *}
1MW4
t.
<♦
“
■ ‘ ’i-
IS®
- ' ::4r-—
rK
”1
O
M—4— — -H
'; |.'..!.'_11 Z.»l—.. . fl- •
»
o
9
M' 1"
3
! -i -!~7 "2■^Mr
___
4M
i-MM
X
LjLMM i ' •'
.r jr ■FrWM~T
*
”r*4T7tr
■1" ':
EMM Li:!:'- !!' 11.1. •!::
—
.
ih-M---i'B ' ’ ' ' -H.l:
-e-- ■•■•■■
-■
KM
E
ElL.:
—4—2—
I
□““E
m"r
rrui:
*
iii® ^44—■^-4—
EE:
1
r- H LL.*u-;.
fpiiJi:
. .. ,J. I'LL; ■
I
I
I
. , : -• d ' i •.•
'':i •":
:r.-»4:.:■: ••/ .::•:■
dH'-iEd^L. 4EL..J4:4'4
KhjOi
■ i
—
■
______M—.L..™.1. -.:
■'r~
, /4.'4-
.........
' feiLi 7- -’.i™-' -T
-a : i
?
"4r'-
■I;:?....
Bia L... I
&ba a r
i
I
I
•I.el
J;
. -q
d-w®
...... BM
Ba
-q-B
J b..; I " ■ ■-
H'/ri: Fp.d: - [■’
ddr;
44<iL MM-ll-L.'.4-’“
■r’lL'
7^
!
45.
-5
X
5
This is a purely arbitrary choice of domain.
over any domain of X we chose.
We could plot the graph
The function, it will be recalled,
tells us the values of Y which correspond to each value of X in X's
domain.
In order to find, therefore, the coordinates to plot on
the graph, we should draw up a table of corresponding values of X
and Y.
Here we have chosen 7 arbitrary points in X’s domain:
Function:
(7*8)
Y = 3 X - 4
Domain
X
Y
-5
-19
-3
-13
-1
-7
0
1
-1
3
5
5
11
The reader should verify for himself or herself that these are
indeed the correct corresponding values of X and Y, by substitution
of X = -5, -3, ••••, and so on into the equation.
6.2.2
We now have 7 coordinates:
7 pairs of (X, Y).
points may now be plotted on a graph (see Figure 2 ).
These
All seven
points have been plotted, from (-5, -19) through to (5, 11).
The
reader nay have realised, of course, that because the function is
linear, only two points need strictly be plotted.
The straight line
connecting them will represent the linear function.
More points are
plotted here simply to familiarise the beginner with the techniques
involved.
46.
n
I
^rhi.ririp h:;J
;
hi
-f-
IF SBtt ibihi
"
~r
r
ruoi
-wdw
..
Hr^rritWk L .i-. BBito ~r^
®K
±44
J
.... .
."iHiTV. |.'
.±11 l-H
IrrF!
dri: yr1 r-iitif i-4-i 14
ir.::
____
WLtlij.
1 ■ ±-H±
±;ipTi.^
1:
±F
-r
4.-:-r—
o
sagjWi'T'
73
nn x
HJ]
IpiSS
aygpg?- h:-:
.-.i
4K41”. ±^
WiOii.ilUi-’i" J
J;.: . ...
: ’.•XiHE
}-•’i-;:• “ ‘ ”
4- \ 47-—^ 44;::
-mY- 11V." 47Tt—ip-
•ERffihi i i
sr~~
^h'?;
U---- 14—
! 7
■4:
|
"x;:.
1:P'K
jlf
5 5:55^5552
4: 74.4 FF
....t..±±7E±± •'F.n i
■
:n:4 WWF?
w±
if
: ■■ ■■
L_-
,-1***^** •
■T±--
*--4^ “‘I
;44 7|4-,... J/
-rjr-4 '7
:
454: ±47=4^
r.ii-i :.u..:—44x;.x; ■
, } 1^714
g±744’
—rife
irrw
jiwFlZl
i
Hr •] i * i [i'-thH frh J;
’ ;±li ~i: ’!
r®i _
r—j
MptSP
544 !:: 4f
r’F'iht----45444725U;r!
t
■ ■ i
-yU • i ' I1-4 J'14
■
------- —yrrrr
MbB
±r±K'T4
4±±W □fl
iC-'r. f
?|5l
: : (.... .
—1—
Fj 4: - I ■
±
nF :;F
ft
WtI nfr
nrr 44454 . 2.
r
Mb r
1
ftFf
F
fyffri4-7Wn4144
lL7L i _
t
FF
nrr
■!'•• t I ' 1
CBSiSW4-IT;
-
iihEr -J.-: H+! i3£l-- iiWfi-F 4.
tw; HxEJ
HtlEU 421151
ttrt
41 :4 i
■I:'' •X-
7^
i
X’V
I.:- •
w ■
br iri. ,■ r- ■ pj
ri ..J.:'14.
■ 1 ,r
■
I ’
: ■; ::|T'1::5
1
n
'fr’i :r-.:
:rrW
r; ! r-f-FlbN
.r
-—~r
-I ■■--■-
■"Ht
±±n t
lii
... 4-r • h’--' ■ 1 •
’
Fpr?. irp 1.4
aXPrwn' 1±±
■ i
■
■
t-i” ti..........
F
rr
\
‘■.r. rtr'
rWiW±±|l
"iW r.~ WF
!.:4 ; •:I
. ■■■-r:;.i:i.| -7
4 ■'.:Util-.]“i-T 1: :. ;.. • .(
,r..,r.7
iii;
—
... B i
bhFt.FilL,-.
U'rhth h;t'
.7:^Hxp±i-.l:-.’rnn:
•.'.ii/..-
Hiitf i ji-/FtHt; Tib
-
!
!i
^4.—:4,.l—-1
—ux
..(..«
■ • I ......
.— ■ ... ...
±r:±n Wl±5.... fir.. rf
■?f mtort
7
'.4PU.
o'r t*
T<-- Hhj
_;Lnto_dB■’"i• -±r - f -Ip-':;:''
rip:
<jf.4 p-bnr-f'
—ii
-; 44l 4:K44-44-l 47
±±1F ’
li&iw hw
•’••■ nr- Frt'L
.. -
FhjF~
_,irrFilrli*.:•.
...
4 ‘ fll’ij:
J3nrOr
rsfeSoj
...
tr4-:;'< ,
»«■> -
;?4ii
4; 4.474,
■^737
Fn'p;'- : -I
..•i.::'Jli pl-ii ±-.-
-c >-■; f.--. r
Oi
—■*
..,77
|ir::. .:;.7'44
t;
■ in?
,—u.
'•T'-.
yT. It.'”’r’T
.>n±
BiM. 5:411-3
'
iw
Bfci fW
i? nli'i’l S3Fg
1544
wrnn
nr?hi
4*z|ir
IEHHBfX±
I
!
Jk-U-fe
J .> X
■ -j: 4r; -
: ..l.i/ •.
‘
A>
•Fi-iM; 7r“^--rfijr-:
^4il..ih,h'.i.'-;l ’
ffl
i,
!
i10
47.
Interpretation of coefficients
6.3
6.3.1
Let us consider again the general linear function,
C^i)
Y = a + b X
In the functions have plotted,
a
3
b
There is only one straight line with this particular pair of coefficients.
a and b.
The reader should ask himself or herself:
what is the
interpretation of a and b on the graph?
6.3.2
First, consider ’’a1’.
A little thought will show that,
when:
X
0 , then:
Y
a + b (0)
Y
That is,
a represents the value of Y where the function cuts. or
intercepts, the Y axis.
a is in fact sometimes called the intercept.
In this case, as can be
seen :
-4
i
6.3.3
Let us now consider the meaning of "b”.
b tell us that
when X increases (or decreases) by 1, the corresponding value of Y
i
increases (or decreases) by b tines 1, or, more simply written, b.
That is, b represents the
I
change in Y
change in X
This is sometimes called "the rate of change of Y with respect to
X".
To understand this, first consider the function
I
Y
s
3 X - 4
tfhen, for example, X » 3, then Y = 5.
Y now equals 8.
Now increase X by 1, to 4.
It has increased by 3 (from 5 to 8).
Therefore, the
48.
3
1
change in Y
change in X
3
b
is constant and equals 3t the reader
To verify that the change in Y
change in X
should change X by a variety of amounts and see the corresponding
constant proportional change in Y to X.
6.3.4
b, the coefficient of X, is generally referred to as
the slope or gradient of the line.
2).
Now look at the graph (Figure
It has a constant, upward gradient.
The gradient or slope of
a line is in fact defined as the
change in Y
change in X
and this is shown on the graph.
With a linear function, it is of
course constant.
6.3.5
When
b is positive, the line slopes upwards from left
to right, and changes in both X and Y are in the same direction both up, or down, together.
When b is negative, this signifies that the line slopes downwards
from left to right.
As X increases, Y decreases, and vice versa.
When b is zero, the slope is horizontal.
For in that case.
Y - a + (0) X
Y = a
and the line is a horizontal one intercepting the Y axis at value a.
Exercise 11
The reader should now
Exercise H on page7o,
49.
gtHryi:• •:fel iH-kii":::
brbbbb kBi?
._. ^.._
iM
- MLii'LBz
RLm4llRlL><
RL? ■Bgfcqggj
. . ...
r—±«-t- . • ' "
\
r — -1 - -------- r-
,
RL' kk;l;k]
<
j........
J‘ -L
■' bb zL SRzLLn
,
iZ-ft ■
.ZS‘T-;
• - •I . t- ••*
-i-1-*■»••-* •
ZB ■
w
■■.■Bn.o^r.Lgp
___ 7Bl
r:.
aS
ZkzkjSS
ft®u
ZEjZnl'ZZMBBLr;
LLSMiM
Ii ?
zg®
■r-r-vj:--:-
L7'j
' 'rRLB '"aJ~''r:
■ ■
■
»
B'kk
.
bZ:zb
i. ,4 --R. J *. . : . . . . I • •■ i- ; • -
Ow
—-4 k".
gmfgzMMz
.
SSHZiZt L'^
r^yr?^!
_4_4_1Z4_
bzRLJ
■ b . :.L B L: R L. RhzLjR
bb_ 7L:7gsz_MBR7LR '
MMM- ’
■' b FFpM MLsBgp gggiggg
RRzl: ,
kWLL'
1-77-rr-^
L m---.;.-;.z
►*>
2
o
C4
k -L
_ ^L’Lb
......... M-B-iB-MM zRIR
’Ol"
i■"Wn'M^^LT
gL
—
LLLLL-LM
Or
r BBB zLRiLg b-LLpL--ZZfLL.
,■ Iz|l
}r-:U? : LRRbl -LLLLz
"f
LLR nh• | ■:!: L O
. !.
WMRLr/; ' ■
I -
_
.zSg^nti^S,
WO ;~L
O-P^Sx--;-•.....!- :-r- 1
.... lllpQ
ZBpI
, *.1 rlk«. k !. jl* i '.j.;-;1
'•............ : ; :.,.piJ:
'
t"T
J.,.-p-
®|3®BSE ~ k z
hRbLMje-L —k”T k
ZgBf B"
ffttuH -■
^r_iLX-
z.-Ttr-.-
-.<:-SkT' F?U;?£ F:
_
• RH-Fk-irli:
“k ' "
Rfer/BB-
-U
f_______
■
“-rr—f tr-i-^rl ‘7“ F‘' ' '' ~
R’7t?77Lr-a
SLLfzR
-“R-H L:
. i-"
;-----
BB7
! i:? IF;."*
O- H
‘i
Ik;:r4;..7;:LIsKZ.B.. .- k-Rikz_
Szlii'
rr-z MRRz 04'1’l.:.
bBb
:Z;4-: :
:H Tr7!~:r:r
tRF:;- 1_.
•.' ..L.i:. i’,.:..p.;.;/
~ RR.'F:
i
soogiBiJ
- -.4 «- •-<■; . ^--» - - --ti-* •
l
■) •
• ■ r ■:..-
Bs
sZ-
‘z:: 1:
izitlMt L.ruZ-.-R
___
4 ..■-; rrrr ” t_
- rt ■?. i.!i .• -T? *
OrBoRFRE -..H f- ■ ri *.F. -i rhT*L±rr:1 rrzi ‘-_xtt-.-ri £r.'
•}•
Blv
B^:- •r^^L^flp7
L
aLRM-
- t-L,Z-:
.. i.
-.L-g
TO§sgpi4---i>
A,,. ....
| i-7jB_^r^"-hZrLrp- z-:
••> - r -
• •• 7-{ -
■J
r
ziLB-Lr irrimz rS
—r-R-z.-k::-.- 7
i:-t:2rz ~ rrpzji^*
zB
Z
- r.cr7;:-g.rj5S*^T*
..b-bR- ; ..:feii-:
ri- •' —-r‘
' ;te-r iffgl gQ WR;L;M-i
H- /*'7| ;■
■>4:.:-: Lx.,
. - ;,
: J !!! ,.g L. . Z-M.-i-l
; ■■■'!■-" z. —r-—b
g6
T" Jrikkik
~
~ “1
—
' ~71" i~*0 r*1
-: pz t •_•
I
L- zAR-Zt.- ;;-i
b 7.HM1
..•i-LR: LRAr
"*"iLiL. L -.
ISO
.rF: H174-H ..... £ P’?-L
TSTir.- TTrr-“t—
i 7.
' ’ ■ • •; ~t: RR. z |
wfl
i^L:-
:
, * ~..
:
H ’I-??.-
mof eb ■ -B-irKl'itWrffr
hate
- f 7,' . i j -
r
!RtK
ZZy Z k in; ■ j: ; ■ M-h*»gz Li z
. Bl I
'
_ztz
jLIlzii. lRRR
i -;•! -M-B:
TfeRdM7a-gBS:Qn“"-L'|7
BLB'
iRORttpR
■Li LFT
L FMitw:'sSR Li'.
.V'F
gg
!
-TeLL-
wm-y
RpR
i
!
ziRL L __
• MRRULLLL
zb
__
z~Rt7bRR7b7;
- -»
IL..u.1
.:LLgr' '
< .4- .1—HiLLM
• - n: ■•t •" m ■ ■ ■ ■ zT>^rr~-
RzRRLLb
RRmlLI
--TT”—
h
4 ig-Q kO W£ kife
1*.
I
r
LLLk'
Bl:
RM
■LRIRL
M
: ' r'i'
tega=fe±f.
tZpjjg ggg Map
m
Bzaai
■' jlBLZLai
zMWBF
1
l
RM
lf
. EQ- ZL
L WRRz RR;:
RLLLRgtB WRi^::.
mziiZZZrSriBBL
H4-
RL :i?LtLF,|i:.;]j.;L: RRILil
■■ I• •■-
1—'li-L
.B-.lL
WSTF
■ iHfoffiBiSl
i __Mtezi
..
■ JoBomk
0#
■‘Li-' ;
ll-B.-L
iBBii.
!_■.;: .:z:.
; |. i [i :::
-
Si
Wk
F
ft'RtrrfR
' t~ Pz'
—'‘-M, .-
: BK.y
Rg
LLffiRMliLML'L
-r!■LgL,
L! z M;
bR iLLi:' -Life
*• i- i-Writ"*'4’ • • t"
.-'~—.-7 .- |
bbRbZ'b
zL
s LiL-i]
••• bLs
:IZ iM
ML-Qb
-■u* —r-RU-uz.'Z.
t - ! ;L
L.j./B- --B-B
•■. I L Ln
Li!
L
■|b g:L-t-Lb; ■
1 _.L_„
;•• ‘rIHz
7:';:1t;7.: 4
rr7-|-----
.1
OOEBk.
Oik
TZp- — ZZg
} ... 6
zz B..:.Lfeag®
TTz ziz: F7 ■|L; F: i*..
zgj;rBiFiB‘r
i
I
50.
6.4
Plotting the graph of a quadratic function
6.4.1
Let us now consider a quadratic function.
We have
seen that the general expression for this is:
a + b X + c X2‘
Y
This function has 3 coefficients, a, b and c.
We have already seen
above the particular quadratic function^
X2
Y
In this particular function,
=
a
0
b
0
c
1
Let us draw the graph of this function (see Figure 3).
Again we
shall choose an arbitrary domain for X, purely for illustrative
purposes:
-4
X
4
We may now draw up a table of corresponding values of Y and X, as
before in 6.2.1 above:
Function
Y = X2
Domain
-4 4X^4
X
Y
16
-3
-2
-1
0
6.4.2
9
4
1
1
2
3
0
1
4
9
4
16
These coordinates have been plotted in Figure 3.
reader will see the striking difference between this quadratic
The
51.
function and the linear function of Figure 2.
The curvilinear
nature of the function, Y = X2 is typical of quadratic functions.
The usefulness of non-linear functions will become clear to the
, in the discussion of the fitting of curves
reader in
to observed data.
Population growth, for example, when plotted on
a graph, often more closely resembles the right-hand portion of the
curve in Figure 3 than it does a straight line.
6.4.3.
A proper discussion of how to calculate the gradient
of the graph in Figure 1 requires some knowledge of the calculus,
a relatively advanced branch of mathematics which is not
discussed
In fact, the gradient of this
in this introduction •
function can be shown by the methods of calculus to be
2 X
positive when X is positive;
(1)
It increases in absolute/size as a constant
Thus, it is negative when X is negative;
zero when X is zero.
proportion of X.
X is
Thus, e.g., when X is 1, the slope is 2;
the slope is 7.
when
The "rate of change of Y with respect to
X” is always twice the value of X.
It is therefore no longer,
as in a linear function, constant and independent of X.
Exercise 12
The reader should now
6.5
Ao
Plotting graphs of observed data
6.5.1
Let us now illustrate the value of plotting
statistical data graphically.
(1)
Exercise 12 ,on page .7/,
As an example, Table 1 below shows
’•Absolute” means here the numerical magnitude ignoring the sign.
52.
a/”
gross primary level^ratios (both sexes) for three selected countries
for each year 1965-1973.
in primary
The gross enrolment
education is defined as the ratio of total enrolment in primary
education regardless of age to the population belonging to
the age group that, according to national regulations, should be
enrolled at this level.
Table 1
Gross enrolment ratios^ both sexes, for Ecuador, Iraq and Singapore,
1965-1973
Gross Enrolment ratio (%)
Year
Ecuador
Iraq
Singapore
1965
79.9
73.3
80.2
1966
79.8
77.7
88.1
1967
80.2
79.0
89.1
1968
79.3
77.9
83.7
1969
78.7
79.7
84.1
1970
80.2
80.5
89.3
1971
79.5
81.6
91.7
1972
78.9
83.1
91.2
1973
77.7
84.4
92.5
6.&1 Consider for example, the columns headed ’’Year” and ’’Ecuador”.
These columns of data may be interpreted as corresponding respectively
to the X and Y variables introduced above.
Thus it can be seen that
there are 9 pairs of observations (X, Y), i.e. (year, gross
enrolment,ratio), as follows:
53.
(1965,
79.9)
(1966,
79.8)
(1967,
80.2)
(1968,
79.3)
(1969,
78.7)
(1970,
80.2)
(1971,
79.5)
(1972,
78.9)
(1973,
77.7)
fcOAoXor x ck ft.
These are the coordinates for plotting^on the graph of Figure 4.
After plotting, they have been connected by straight lines, to aid
the eye.
They could, with equal justification, have been connected
with a freehand curve.
as a visual aid.
The purpose of connecting them at all is
It is one way of helping the analyst to make
interpolations between observed data points.
entvftxj .^have been constructed similar!^
verify thVxoaK.
6.5.3
The other two lines
The reader should
.
The graph is a valuable way of showing the general movement
of the enrolment ratios.
Simple, inspection of the data, as
arranged in tabular form in Table 1, does show clearly that there is
a general upward trend in rates in both Iraq and Singapore, and a
slight downward movement in Ecuador.
But the table cannot show
so clearly as does the graph of Figure 4 the details of the
movements^/or the relative gradients of the functional relationships
between enrolment and time.
The graph shows ”at a glance” several
features of the data, including:
- the similar rate of increase, taking the period as a whole^of
gross enrolment rates in both Singapore and Iraq
- the
1969-1970
decline of the rate in Ecuador, except for
54.
1
(D 4-
g:
W ~
>£
►X5 ®
—
tr
o
(D
X
o;
w£
Gt
3 P
(D p
03 r
O F
0 :
°E
»-h r
C
o
E
n> i
HE
0 t
Mi
cn
oj
3
55.
- the greater variability of the rate in Singapore than in
both Iraq and Ecuador
- the periods in which all three rates increased, or declined,
together
6.5.4
The reader will by now appreciate that these three lines
may be seen as three particular examples of the function:
E
=
(A6)
f (t)
where:
E is the symbol for the gross enrolment ratio
t is the symbol for time
Figure 4 shows that they are not linear functions,
The slopes of
the lines are not constant throughout the timerperiod.
Nevertheless
there are certain sub-periods in which they are very nearly
constant;
for example, in Iraq 1968-1973.
As a matter of fact.
real-world measurements of variables very rarely do produce exact
linear relationships.
T
not so reliable!
Human behaviour, and technical change, are
But very often observed relationships are
approximately linear, or may display anAonstant proportional rate
of growth (1).
It can often be assumed, especially for small
changes in independent variables, that, for all practical purposes,
/
they are in fact linear - and may
(1)
In which case a graph of the logarithm of the variable
plotted against time would display linearity.
66.
remain so in at least the near future.
But we are now approaching
discussion of the statistical methods which we can employ
the
to ’’fit” functions to the often rather scattered data we observe.
And that topic is reserved for
6.6
Ciales
Oh
Ten practical hints on drawing graphs
*
1.
Use proper sheets of graph paper if you can.
2.
Use as much of the sheet as practicable.
There is no
point in using just a half of it unless other considerations
dictate this.
3.
Look at the range of values of X and Y.
Decide what scale
would be best, taking into consideration the size of your sheet.
4.
Remember that different choices of scale can make the graph
appear to have very different slopes (though of course their
mathematical properties
are not affected).
There is no
general rule to adopt here, except perhaps that your graph
should be as informative as possible, and not in any way
potentially misleading.
5.
You don’t have to intersect the axes at (0, 0).
To do this
in many cases is simply to invite blank, empty spaces in
your completed graph - helping nobody to understand the
relationships between the variables (which, after all, is
the fundamental purpose of the graph!)
6.
Label the axes clearly.
7.
Use a pencil before inking-in.
You are almost bound to
make some errors.
8.
Do not put too many functions on a graph.
As a general
rule, more than 3 or 4 tend to confuse the eye, and hence
to hinder understanding.
57.
9.
Give your graph a clear, unambiguous title so that the
final user is left in no doubt as to what your graph is
showing.
10.
Quote the exact source(s) of data on which your graph is
based.
Exercise 13
The reader should now
d*
Exercise 13 , on page
58.
EXERCISES 1-13
pp. 59 - 72
Anslj&IS 1-13
pp. 73
92
*
EXERCISE 1
This exercise is designed to familiarise the reader with basic
concepts and operations of algebra.
Read Section 1.1 - 1.5 before
completing the questions.
1.1
A number is represented by x. Double it, add 29 to the result,
Write down the expression for the result.
1.2
The product of two numbers is a and one of them is w.
What is
the other?
1.3
What number must be subtracted from
to get b?
p
, which of the following expressions are correct, and
C
1.4
which incorrect:
P
A
i)
C =
ii)
(A + 100) =
iii)
(A - C) =
(P + 100)
c
(P ■ c2)
1.5
1.6
c
Eliminate the brackets in the following expressions:
i)
(a + b) (a + c)
ii)
(- x) (2x + y - 3)
iii)
(x - y) - 3 (cz, - x)
iv)
(x *
+ y - z) ((---”)
Find the factors of the following expressions, where possible:
i)
4 a2 + 2 a b
ii)
ax + ax
iii)
ax + by
iv)
x
3
- x
2
2
+ ax
cz
3
io
1.7
When a = 2, b s 3, calculate the numerical values of the
following:
i)
2
(a + b) (a^2 - b*)
ii)
/ 3
2
a (a - a )
ab
iii)
a
2
<0
EXERCISE 2
This exercise provides the reader with practice in solving simple
equations.
2.1
1.2
Read Section 1.6 before completing the questions.
A man is four times the age of his son.
In four years* time
he will be three times his son's age.
How old are they now?
Solve the following equations for the unknown x:
i)
3x + 10 = x + 20
ii)
4x
3
iii)
(2x - 2)
3
+ 11
5x
6
+ 69
(x + 1 )
6
3 (x + 2)
b2-
EXERCISE 3
Round the following numbers to the indicated accuracy:
Number
Round to nearest:
i)
89.3245
thousandth
ii)
89.3254
thousandth
iii)
7.299
hundredth
iv)
1.145
tenth
v)
27.6
unit
vi)
3.49
unit
vii)
150.001
hundred
viii)
326,000.0
hundred thousand
ix)
18,000,000
million
x)
18,500,000
ten million
3.2 Add the numbers 2.25, 6.95, 7.35, 2.15, and 4.55
(a) directly
(b) by rounding to the nearest tenth according to the
’’even integer” criterion
(c) by rounding so as to increase the digit before the 5
/J
EXERCISE 44
Zf.1
Tablebelow shows,in thousands, enrolment in Africa by
level of education, 1960 - 1975.
Calculate, for each year.
enrolment at each level as a percentage of total enrolment,
rounded to 1 decimal place.
Sua the percentages by level for
each year.
Table 4.1
q.2
Enrolment by level of education, 1960 ■> 1975
(thousands), Africa
Year
First level
Second level
Third level
Total
1960
19,391
1,740
180
21,311
1965
1970
26,534
3,058
306
29,898
33,817
4,905
471
39,193
1974
41,843
7,411
779
50,033
1975
44,243
7,812
865
52,920
Calculate (rounding to one decimal place)
i)
i% of 72.3
ii)
iii)
10.6% of 10.6
110% of 110
iv)
v)
50%
10%
vi)
0.02%
vii)
36%
viii)
in Grade 5
In a certain school, 80% of all students >
in a particular year were promoted from Grade 5 to Grade 6.
Of these students, 10% dropped out of Grade 6. What percentage
of
of
;100
10%
of ^4
of
7000 students
ix)
of all Grade 5 students eventually dropped out of Grade 6?
Two out of 3 students passed an examination, What percentage
x)
failed?
By how many percentage points did second level enrolment as a
percentage of total enrolment increase from 1965 - 1970?
(see Table
EXERCISE 5
Table y.
below shows public expenditure on education per pupil
(in U.S.
at current market prices) in Latin America, 1960 - 1974.
Calculate the percentage increases in expenditure per pupil.
to one decimal place, between:
*
i)
1960 and 1965
ii)
1960 and 1970
iii)
1970 and 1974
Public expenditure at current market prices on education
per pupil, U.S.#, 1960 - 1974, Latin America
Table
Year
per pupil
1960
57
1965
1970
77
1974
172
97
in thousands
Table -5^2 showsy^the (estimated) male and female populations of
Afghanistan in 1968 and 1975.
Calculate, rounding to two decimal places:
i)
the percentage growth over the period of the
ii)
male population
the percentage growth over the period of the
iii)
female population
the percentage growth of the total population
Table 5^2
Estimated male and femafe populations of Afghanistan,
1968 and 1975 (thousands)
1968
1975
Male
7448
8666
Female
6765
7999
t>s~
EXERCISE 6
What is the characteristic of the logarithm
6.1
i) 62.3
ii) 101.9
ill) 100
v) 72.9
vi) 0.03
vii) 1.01
ix) 0.0004
x) 310,000
of each oftheofollowing
numbers?
iv) 10.12
viii) 0.2
6.2 Are the following logarithms correct or incorrect?
Correct where necessary
i) log 113.3 = 2.0543
ii) log 612,000 = 4*7868
iii) log 0*0071 = 3.8513
iv) log 1.262 = 0.1100
v) log 1,001,000 = 6*0050
6.3
Are the following antilogarithms correct or incorrect?
where necessary:
i) antilog 1.5572 = 3.608
ii) antilog 2.6672 = 0.4647
iii) antilog 0.0010 = 1.200
iv) antilog 4.9999 = 9997.0
v) antilog 1.6990 =0.5
6.4
Calculate each of the following, using logarithms:
i) N = (121.4) (0.06)
0.114
ii) N = (2.721) (0.0071) (71)
iii) N = ( 21.6 ) ■? (0.002)
iv) N =
(20.83) (0.0003)
v) N =
37.2
16
Correct
bi
EXERCISE 7
Calculate each of the following, using logarithms:
7.1
2
i)
N = (0.039)
ii)
N = (17.3)
iii)
N =
5 (1.4)3
(0.21)4
(0.3)6
7.2
2 (199.3) 3
iv)
N = (17.6)
v)
N = (0.0006)
7
Calculate each of the following, using logarithms:
i)
N = (0.072)^
3
62.7
ii)
N =
iii)
N = the sixth root of 1524
iv)
N = the fifth root of 0.73
v)
N =
4
V<0-02) (0.13)
EXERCISE 8
8.1
Table 8,1 below shows the absolute numbers of sale and fenale
teachers in prinary education in Afghanistan in the years 1969
and 1974.
Calculate separately for:
1.) sales
ii ) females
iii ) total of males and females
the average annual percentage rates of growth over the period.
to one decimal place.
Table 8.1
Teachers at primary level, 1969 — 1974, Afghanistan
I '
8.2
1969
1974
Males
9606
14377
Females
1468
3215
Table 8.2 shows, for Oceania, the absolute numbers(in thousands)
of females enrolled at all levels of education in I960, 1970
and 1975.
Calculate, rounding to one decimal place:
1) the percentage growth of female enrolment over the period
ii ) the average annual rate of growth of female population over
the periods:
a ) 1960 - 1975
Table 8.2
b) 1960 - 1970
c ) 1970 - 1975
Female enrolment at all levels of education, 1960 - 1975,
Oceania (thousands)
1960
1970
1975
Female
enrolment 1465
1983
2184
EXERCISE 9
9.1
Female enrolment at first, second and third level of education
in Africa increased from 7.66 million in 1960 to 21.18 million in
1975.
i)
Calculate the average annual growth rate of enrolment
ii)
Assuming this rate were to continue unchanged after 1975,
during what year would female enrolment become 100% greater
than in 1975?
9.2
In a certain country, enrolment doubled in 10 years.
What
was the average annual rate of growth during the period, in
percentage terms?
(to one decimal place).
EXERCISE 10
10.1
Given the continuous linear function
Y
1 ♦ 4X
-4 < X
4
i) what is the range of values of Y ?
ii) what is the value of the function when X =
2.4?
iii) what is the value of X when Y = 0?
iv) what is the value of the function when X
6?
v) what is the value of the dependent variable when
the independent variable « 1?
10.2
Given the continuous quadratic function
1 + 4X2
Y
X $ 4
-4
i) what is the value of the function
=
when X
-4?
ii) what is the value of the function
when X =
+4?
iii) what is the value of the function
when X =
0?
iv) taking this function ;as a particular example of the
general equation of a. quadratic function :
Y
=
a
+
b
X
+ c x2
What are the values of a, b and c?
v) When Y s 0, what is the value of X?
7o
EXERCISE 11
11.1 DPlot the graph of the linear function
Y = 1 + 4X
4
ii)Use the graph to find the values of Y when X » + 3.3.
11.2
Show on your graph of
Y
1 + 4X
the intercept ’’a11 and the slope '’b11
71
EXERCISE 12
12.1 i) Plot the graph of the quadratic function
Y
= 1 + 4X2
-3
X
+ 3
ii) Use the graph to find the value of Y
when X =
12 .2
2.5
Show on your graph the intercept a«
b
changes with X •
Demonstrate how the slope
At which point is the slope at a minimum?
7i
EXERCISE 13
Table 13.1 gives the total number of pupils enrolled in
13.1
primary education in Cuba over the period 1968-1974, and their
distribution^
Table 13.1
urban and rural schools.
EnroIment in primary education, total? urban and rural,
Cuba, 1968-1974
i)
Year
Urban
Rural
Total
1968
811,966
585,745
1,397,711
1969
864,370
601,916
1,466,286
1970
926,240
631,905
1,558,145
1971
994,693
669,941
1,664,634
1972
1,053,549
705,618
1,759,167
1973
1,119,961
732,753
1,852,714
1974
1,150,884
748,382
1,899,266
Draw up a new table, with columns showing the absolute data
rounded to the nearest 1000 pupils, and columns expressing urban
and rural enrolment as percentages of the rounded totals in each
year.
ii)
Round the percentages to one decimal place.
Using the data in your table, calculate the percentage growth I*
absolute urban, rural and total enrolment over the whole period,
rounded to one decimal place.
iii)y<cal3ulate the average annual rate of growth of urban, rural
and total enrolment over the whole period, to one decimal place.
13.2 i) Draw a graph of total, urban and rural enrolment over the
whole period, using data rounded to the nearest 1000 pupils
which you have calculated in 13.1 i) above.
ii) If your table and graph were to be included in a report written
by you, to which features of the data would you draw your readers'
attention?
72
ANSWERS TO EXERCISE 1
1.1
(2x + 29)
4y
1.2
Let the number required be v.
We know that v w = a.
Hence
a
v = —
w
1.3
We know that (z - a) = b
Let the number required be a.
Hence a = (z - b).
1.4
i.)
Correct.
P
Multiplying both sides of the formula A = £■
Dividing both sides by A, we obtain
by C, we obtain AC = P.
P
C s A
1.5
U)
Incorrect.
lii)
Correct•
i)
A
A
100 =
P
C
P
C
— C =
- C =
* 100
s
(P + 100C)
C
(P - c2)
C
(a + b) (a + c) = a(a + c) + b(a + c)
= a
2
+ ac + ba + be
Note that exactly the same result could be obtained by
multiplying the first bracket by each term in the second:
(a + b) (a + c)
a(a
= a
b) + c(a
b)
2 + ab + ca + cb
For ab = ba, ca = ac, and cb = be.
The order in which a number
of factors are multiplied does not affect the product;
the order in which numbers are added is also immaterial.
ii)
(-x) (2x * y
2
3) = -2x‘ - xy + 3x
Note carefully the signs.
iii)
(x - y) - 3 (z-x) = x - y - 3z + 3x
=
iv)
(x + y - z)
(-1)
X
= - 1
- y - 3z
x
+ £
x
and
1.6
1.7
i)
2a (2a + b)
ii)
2
ax (1 + x + x )
iii)
No factors
iv)
x
i)
(a + b) (a
2
(x - 1)
2 -b2)
= (5) (-5) =
-25
ii)
a(a^ - a2)
ab
8
6
iii)
a
2
+ 2
4
3
ANSWERS TO EXERCISE 2
Let the age of the father at present = x.
2.1
x
4*
therefore now
In four years' time 5 we are told that:
4)
(x + 4)
3x
4
x + 4
; chatty
2zz
32
x
<
12
3x + 48
. • 4x + 16
t
His son's age is
That is, the father's age at present is 32, and his son's age
is 8.
2.2
3x + 10 = x + 20
i)
• * 2x
10
x
5
4x
3
ii)
5x
6
+ 11
69
Multiplying both sides by 6,
414
8x + 66 = 5x +
iii)
. K
3x
348
%
x
116
(x + 1)
6
(2x - 2)
3
3(x +2)
Multiplying both sides by 6,
2(2x - 2) + (x + 1)
. . 4x -
4
x + 1
, -13x
39
x
-3
18 (x + 2)
18 x + 36
yb
TO EXERCISE Z
Results of rounding:
^.1
i)
89.324
ii)
89.325
iii)
7.30
9
1.1
v)
28
3
vii)
200
viii)
300,000
3.2
ix)
18,000,000
x)
20,000,000
(a)
2.25
(b)
2.2
(c)
2.3
6.95
7.0
7.0
7.25
7.2
7.3
2.15
2.2
2.2
4.55
4.6
4.6
23.15
23.2
23.4
Note that method (b) is superior to method (c).
rounding errors are minimised by method (b).
Cumulative
ANSWERS TO EXERCISE tj
A-1 Table 7^.2
Enrolment by level of education, 1960 - 1975,
percentage of total, Africa
First level
Second level
Third level
Total
(%)
(%)
(%)
(%)
1960
91.0
8.2
0.8
100.0
1965
88.7
10.2
1.0
99.9
1970
86.3
12.5
1.2
100.0
1974
83.6
14.8
1.6
100.0
1975
83.6
14.8
1.6
100.0
Year
Note that the total for 1965 adds to 99.9%,due to errors in rounding.
summed to
The figure should be displayed as correctly
q^9%It should not be presented, wrongly. as 100.0%.
4.2
i)
0.005
x
72.3
0.3615
ii)
0.106
X
10.6
1.1236
iii)
1.1
x
110
121.0
iv)
0.5
x
100
50.0
v)
0.1
x
10%
1.0%
#64
vi) 0.0002 x
vii) 2520 students
^0.0128
8.0%
ix)
6 ’ i) X
80%
100%
x)
Percentage point increase
=
viii)
0.1
x
percentage points.
0.4
3
1.1
#0.0
33.3%
(12.5 - 10.2)
2.3
It
I
ANSWERS TO EXERCISES^
I
i)
(77-57}
57
X
100%
35.1%
ii
(97-57)
57
x
100%
70.2%
ill)
(172-97)
97
x
100%
77.3%
i
I
I
|
I
*
I
I
^2
i)
(8666-7448)
7448
x
100%
ii)
(7999-6765)
6765
x
100%
18.24%
iii)
(16665-14213)
14213
x
100%
17.25%
s
16.35%
I
I
I
I
I
I
I
i
I
I
I
I
I
I
I
I
I
I
I
|
I
79
ANSWERS TO EXERCISE 6
6.1
6.2
i) 1
ii) 2
iii) 2
iv) 1
v) 1
vi) 2
vii) 0
viii) 1
iv) 4
x) 5
The correct answers (using logarithms to the base 10) are:
i) log 113.3 = 2.0543
ii) log 612,000 = 5.7868
iii) log 0.0071 = 3.8513
iv) log 1.262 = 0.1011
v) log 1,001,000 = 6.0005
6.3
The correct answers are:
i) antilog 1.5572 = 36.08
ii) antilog 2.6672 = 0.04647
iii) antilog 0.0010 = 1.002
iv) antilog 4.9999 = 99970.0
v) antilog T.6990 = 0.5
i) log N = log 121.4 + log 0.06 - log 0.114
6 •<
log 121.4 = 2.0842
(+) log 0.06
= 2.7782
0.8624
(-) log 0.114 = T.0569
log N.
•
- 1.8055
N = antilog 1.8055= 63.9
ii) log N =
log 2.721 + log 0.0071 + log 71
= 0.4348 + 3.8513 + 1.8513
0.1374
N = antilog 0.1374 = 1.372
iii) log N = log 21.6 - log 0.002
= 1.334$*- 3.3010
= 4.0335
J, N = antilog 4.0335
iv)
log N = log 20.83
=
10800.0
log 0.0003
= 1.3187 + 4.4771
= 3.7958
9 • N = antilog 3.7958
v)
=
0.006248
log N = log 37.2 - log 16
= 1.5705
1.2041
= 0.3664
» N = antilog 0.3664
=
2.325
Si
ANSWERS TO EXERCISE 7
7.1
i)
log N = 2 log 0.039
= 3.1822
.\ N = antilog 3.1822
ii)
=
0.001522
3 log 1.4
log N = 5 log 17.3
= 6.6283
» » N » antilog 6.6283
iii)
=
4249000.0
log N = 4 log 0.21 - 6 log 0.3
= 0.4262
N = antilog 0.4262
iv)
=
2.668
3 log 199.3
log N = 2 log 17.6
9<>3895
a »
v)
N = antilog 9.3895 = 2.452 x 10 9
log N = 7 log 0.0006
= 23.4474
A N = antilog 23.4474 = 2.802 x 10
7.2
i)
-23
log N = -J log 0.072
= I (2.8573)
=
1.4287
. , N = antilog 1.4287 = 0.2683
ii)
log N =
log 62.7
= | (1.7973)
0.5991
b N = antilog 0.5991
a
iii)
=
log N =
=
log 1524
•g- (3.1829)
=
N = antilog 0.5305
iv)
log N =
3.973
i0.5305
=
3.392
log 0.73
(T.8633)
(5.+ 4.8633)
=
= '1.9727
«
N = antilog 1.9727
=
0.9391
&L
v)
log N =
(log 0.02 + log 0.13)
= i (2.3010 + T.1139)
= | (3.4149)
=
i (4 + 1.4149)
= 7.3537
N = antilog 1.3537
s
0,2258
ANSWERS TO EXERCISE 8
8.1
K
Pn
5
14377
po =
9606
n
we have:
5 14377
rm
9606
1.084
►
■
from para.
using formula
i) males:
rn.
1
1
0.084
» »
8.4%
ii) females:
rf
= 16•9%
iii) males and females:
rt = 9.7%
‘\a Ark\ $
Note thatyrthe average annual rate of growth of the total, r^ , does
not equal the (unweighted) average, or arithmetic mean, of the male
(ra) and female (r^) rates:
rt X (r° * r_O
Swathe male and female populations ^comprise different proportions of the
total.
8.2
i )
(2184-1465)
1465
x
100% =
49.1%
ii) Using formulaeas in 8.1 above, by substitution of
appropriate values of n, Pn and Po,
r = 2.7%
a )
b )
c )
r = 3.0%
r = 2.0%
ANSWERS TO EXERCISE 9
9.1
Using formula (ZX ) from para^fZfct where:
i)
n = 15
Pn = 21.18
P
o
= 7.66
we may calculate r = 7.0 %
Using formula GA4 ) from para
ii)
n =
I
108 ft)
log (1+.070)
Pn
but
=
2 (i. e., enrolment is 100% greater in year n than
P
o
in the base year 0)
hence:
log 2
log 1.070
n =
0.3010
0.0294
10.24 years
Thus female enrolment would become 100% greater than 1975 during
the year 1985.
9.2
) from para^.7.$ where:
Using formula
n
P
and __n
P
o
10
2
we have:
r
10
2
. • r
0.07a
, % r
7.2%
1
ANSWERS TO EXERCISE 10
j
10 .1
i) Y ranges in value fron a aaximun
when X = 4,
for then:
Y = 1 ♦ 4 (4)
17
to a minimum when X = -4, for then:
Y = 1 + 4 (-4) = -15
ii) Y = 1 + 4 (2.4) = 10.6
iii) 0 = 1 + 4X
-0.25
X
1 '
iv)
This is something of a trick question!
The answer is that Y has no value, i.e. the function
i
I
is undefined.
Y
only has values corresponding to the domain of X:
4
-4
v)
10.2
the dependent variable
i) Y = 1 + 4 (-4)
is
Y;
when
X = 1, Y = 1 + 4 (1)
2 = 65
2
ii) Y = 1 + 4 (t-4)' = 65
2
1
iii) Y = 1 + 4 (0)
iv) a = 1,
I
L
L_
b = 0,
c = 4
v) X =. "■i , which is not a'real”number
4
the square root of a negative number.
for we cannot find
5.
Zb
ANSWERS TO EXERCISE 11
11.1
i)
As the function is linear, it is necessary to plot
only two points (coordinates).
A line passing through the points
will define the function Y = 1 + 4x.
Choosing, arbitrarily,
two points in the domain of X, we find that when X = 2, Y= 1 + 4(2) = 9;
and when X =*3, Y = 1 + 4(-3) = -11.
Hence passing a line through
the two points (2, 9) and (-3,-11) we have the required gTlph
(see Fig. 5).
Note that Y ranges from +17 (when X = 4) to -15
(when X = -4).
ii)
If you have drawn your graph accurately, you should find that
a line drawn vertically from the X axis through X = 3.3 will cut
the line Y = 1 + 4x where Y = 14.2.
11.2
Similarly when X = -3.3, Y = -12.2
On Figure 5 the intercept a may be seen to be the distance
between the origin, (0,0) and the point (0,1).
Hence a = 1.
The slope b is the gradient of the line.
It is the ratio of the
increase in Y to a unit increase in X.
In the triangle ABC, it is
XU
X-
EC
the ratio -To
AB
+4
27
II'H IUU ffmWUUlRtffi
wmIMI
«=?■
safasaafagte
(
i
(
I
:=
J
hJ
!H=::s=hi=!
ANSWERS TO EXERCISE 12
12.1
In Exercise 11.1 (i), only two points were necessary to plot
the (linear) function.
With a non-linear function, however, more
than two points must be plotted in order to be able to draw in
the curved line passing through them.
We proceed by finding
which values of Y correspond to a selection of values in the domain
of X:
X
Y
-3
37
-2
17
-1
5
0
1
1
5
2
17
3
37
We now have 7 coordinates and may plot these against a pair of
axes (see Figure 6).
Note that as there are no negative values of
Y, a negative Y axis is unnecessary.
The curve (a "parabola11 may
now be drawn through the points.
ii)
Lines drawn vertically through X =
2
where Y = 1 + 4 (2.5) = 26.
12.2
2.5 should cut the curve
The intercept a can be seen to be where the curve cuts the
Y axis, at the point (0, 1);
hence a = 1.
can be seen to change continuously.
The slope of the curve
From the point (-3, 37) it is
negative so long as X is negative, and declines in absolute terms
until it reaches a minimum of zero when X = 0.
It then becomes
positive and increases with X to a maximum when X = 3.
...... g fgUHin
gig
-rpft -jif* *^’
Eg ^es we- EeB ga
BhBeBt..........
gl
IB
zjfe ^WMgW
u pHOOm+OT tkifefe iidptMiMW
OHfefe kfe-Etn4-100 1
twl
pf Pi pfipH- few gpiiO QQBpplii
gpH
■~H~ ■* *•*■’+’+’+- '*'*'■" W~-!( Et ,
' kit -Hr' —-'-hT*-'- 4-W fE-—H+t-* '■ -ftr? I ? ’• »n‘t . i ,
-k-M-rl; 4-71. r ;4 ktUU4t M4 M4rtTtTki+r-4T^L4[; .k [t
m±in±i
^-thr-44
11 nO
w»pW-tn
Hrrrfe
zrn
•ig. iBOBBsOMOBMfeOBfeL^WS
g;
—ft—;—
----
feHr-nfe
■BeBhhSB
7ILT
O»0i§liwgg
BB
mu
IBMfct IjWCTBwMWwMHb
iilBHr
SSSffiB'*14
11 SmffSfHgjB
og g B o bPP iOMiSwl
sa
mH 1H £1 ng tHSt§ g-M gp - OtO
'::::.w i Etti~ h * t'
-rri | |
U 7i I
gH
mz Twri-kwu
■LieT ‘514:
d±t ±m jl^lt rH+W
7 _ I. L.J1- _LI 1..: l.kO
--77i—
gagHE^gplI LHtfiSHSHw
4
HH i
Sit
.~:-77
|S2"ttff 44k T~t"H~ H; ~rti~ttfT t
pg igOBB^gg
w
:#ftW
^.-.•-.w? -.IB [W L -L'1
4H
ItBOpEE
uf u
17117111
g&
OiSBOBOB
±4few
—Or—kTk gWdWW Wikitt tS,LHU TtP
l<gte;
g ip BOB
wlBii
UMkjgEgg
£P
•E E7 |4r EEp
HE
BBWBraHraajr
SSn ■
B
iS
MO
E§agQ£BB gffigT
S£ j1
O EkWiEEH'rr
llUlni BBgii
j
ttrr
H
. .^...PiiOBkfkiSBW
a^pa
i~ j
^■iiii
iHliHl 1Peeeww^EOwO^BwOOggsgis
r
■HE: HF
—-i--—.7.
g
■
wp B' p -pfefeft B -
fe"47^
j|g 44 41 sf 44 74 jOp ~q4§ e
SggfiW&iOa
W-'
w
B BPO
O
Ib pw
EiSfill wts < woiiwB
m
n iTnOdpHrni
pmUOfefei- PMlirfe^w
.’ itir-S
eBBOee
iM*«aaaaika«M
IZZIaasZaB* n«B6
Oxfe
’T^ng;
K
a.!I
^4- 4
il
H
i!>rirn :,.U4.!%yinn
': ■ ■11 - i-r —wLOP pOOfeHH LO-pr r.nuk: gOrS tern Id ~ :
Shr r:E4t~ Z ZZ" t~
4-144- -P-k tk-^pgrt +477
hJ
■.•^^7
3
30$
gq
^ieBpig
jJkrrrr- r:rr .rTry-r;-g;.unxr.ir---^--p-Hj‘*^1.T1J i^T
n >11®' ■
Og
B
Ex
Bfe
tniijS
L
Hag
Exiiqg
jEiHg:
^7
wWWf wL:::.i±
O
^.L.CpC.^-pk-i: nflk-H-k^y-rtltn
i
PEE: T^ SE —E - ; 7E
WjgfBBflSs
-*4-r
e
:
BBgfWteSE
H44xttk4tE^Hr
4-rH-
•0
eh ee
ee
1
>TlttT',y
■ -
EpIJIBbIII
FpBSfeE
H tSJi
o
g
■
47£-;-:3-~
c TiH+Oi
ISsHM^Offgte oyo
yteg
I
Hww^sfeg
gOBSww
fmitaiB
wodSHgfclllps ^^Sgftgsg
fefiSgS5s iafi
nSw
IBBW- wi
w!i
"'^■WgjggBl...........
SBrfSigBOlp BStiteBWQiglfektfflllllllil
liPlwwfts aggMBEg
£
•£rr Ht'ES
rTTrTTl^£riniliiriril.-TiJl-
tfIw
trr
>4+4 Pmlti •fr’-r+rt; 'ri+44.4rHi
7: -
§g ffiSE Wffig S7 B
ete-? BfF Ekn; Syn
' **1: ?'.’Tr” ' '1 r'b^^-*Sw£*7r ttc ip-F •
:::::::::::
feggS
£gig
3WE3E5i>sBOB
dnlM' WlkiETEp
itaaaa
iaaa«
taaai
:::::
■aaai
wq:
■„iL:._r:z7_ 44
OEEWEEgWEBEI
HlsS&B
■saiii
w
■HHJjljp®
^i!il
liiaiigllfiotri
h
■■Ml
■■■■I
-wEest
■■■ii
-tr-rrt rrrT-: Cirri grr1! g-T. fxt
HESE^EHEgL;^
ffiffiTflrn II 11 iErHTuWrHi44Hr44iUH
H4ttm.
EMI
An:::::nK3::d::
MMMMWHfCMI
ISSSggs
li|||j|Bi
ita^B
w
■Illi
jpglgfcfcgteHwaaaaa i 11! HI lOHiifl W W w
Siisfffiiiili
SB iQWBftwfflfe
•■aaaaaaw■■••aaaaaat
MMMOOMaMMMMt
th i: i r i h i
H iH? t-1
===!
_........ g::,g- ~:.t:!g:
a
swasggnmg;
IEssSSSS^sSSKSsS1
_________ IH fekffBEW k&
jgg
. ?7IH1.T}4T Tit- -H-R- 4-U+- ■‘4-4- -T-t -i-iH- -i-t-f E-u- j-H-t
wi——4-777 THt nUtilH'OrLTF^T+w
-k--{.;.!
-. '--j1 — ’■ :♦+rr
—J.-.
. .i,.u
-rr+rr+Tr
ItH
l£,i.L’1r—.. - - tTcik [ill mil 7477tntgt-ir
■ jmlUkk -‘t-
1
WE
BH
liiwWl
n li|!
4
.Jllp
1
ft-SBh
wwwwp
i Hr’*H| i
v-1p .jt
mgpwfe O-rtid-Lt
BBOwew
HfH
I Hi l-H IHI tt-HHiffffH IH >HH
1 .1 i 4 Ld 1: fJ i 1 !_Li -1-1 ~1.
T
«8
fessi kilti
Bpgmp
EE
ElEE
^0
ANSWERS TO EXERCISE 13
13.1 i)
After rounding to the nearest 1000 pupils and then calculating
the urban and ruraly^data as percentages of the rounded totals, your
table should be as follows:
Table 13.2
Enrolment in primary education^ total, urban and rural,
in thousands and as percentages of total, Cuba, 1968-1974
Total
Rural
Urban
Year
(ooo)
(%)
(ooo)
(%)
(ooo) n (%)
1968
812
58.1
586
41.9
1,398
100.0
1969
864
58.9
602
41.1
1,466
100.0
1970
926
59.4
632
40.6
1,558
100.0
1971
995
59.8
670
40.2
1,665
100.0
1972
1,054
59.9
706
40.1
1,759
100.0
1973
1,120
60.4
733
39.6
1,853
100.0
1974
1,151
60.6
748
39.4
1,899
100.0
Note that in 1972 the rounded urban and rural figures do not,
because of rounding errors, sum to the rounded total of 1759.
ii)
Percentage growth in urban enrolment, 1968-1974 =
= 41.7%.
(1151-812)100 *
812
Z
Percentage growth in rural enrolment, 1968-1974 =
(748-586)100
% = 27.6%. Percentage growth in total enrolment,
586
(1899-1398)100 0/
35.8%
1968-74 =
1398
iii)
Average annual growth rate of urban enrolment, 1968-1974 »
•%%
Average annual growth rate of rural enrolment, 1968-1974 = 4.2%
Average annual growth rate of total enrolment, 1968-1974 = 5.2%
13.2
i)
See Figure 7.
Note the choice of scale.
Read the hints on
drawing graphs in 6.6
ii)
A report on the development of primary level enrolment in Cuba
over the period 1968-1974, using the
datx
would bring out at least the following features:
given in Table 13.X,
9!
- the fact that both urban and rural enrolment grew continuously
with no period of decline (i.e. enrolment grew '’monotonically”),
with the consequence that total enrolment also grew continuously
- the fact that, as the graph shows, there was a decline in
the rate of growth of both urban and rural (and hence total)
enrolment in the final period 1973-1974•
- the fact that urban enrolment grew more rapidly than rural.
This may be illustrated by your calculations in answering
13.1 ii) and iii), and can be seen in the greater positive
slope of the graph of urban enrolment over time than rural
enrolment.
H OB
-.n ,r,
apgf
OB m gr
t■
°
;
^satesas^sitsis
^BgSBafe
. . _____
Wliil
*
mini
■•
w
Trrr
IBUUMi
(■■■•ei
1
fHHSft VJ
w:
isS i •
ggi
tffi
==2SS
I
,i
! 1
■:
r v .-.
!«!==
0
1
I 2 I 3
4
5
0
7
8
9
12 3'4 5 0
0
7 8 9
5913 i 172126 303438
5 9i
0212 0253 0294 0334 0374 4812(162024 28 32 36
4812 16
1 20 23 273135
11 0414 04531 0492 053* 0569
0607 0645 0682 0719 °755 47 x£ IS
1 l8 22 26 29 33
37 xx 14
! l8 21 25 28 32
12 0792 0828 0864 0899 0934
_____
273*
0969 1004 1038 1072 1106 3 7 10 :14 17 20 24
36 10 13 l6 19 2326 29
13 1139 1173 1206 1239 1271
1303 >335 1367 1399 X43° 3 7 10 13 16 19 :22 25 29
36 9 12 15 19 22 25 28
14 1461 1492 1523 1553 1584
1614 1644 1673 1703 1732 36 9 12 14 17 20 23 26
36 9 II 14 17 20 23 26
15 1761 1790 1818 1847 1875
1922 25
1903 1931 1959 1987 2014 36 8 11X4X7
36 8 II I4 16 19 22 24
16 i|2O4i 2068 2095 2\22 2148
2175 2201 2227 2253 2279 35 8 1013 16 18 21 23
35 8 1013 15 18 20 23
-i7-|i ’2304 2330 2355 2380 2405
17 20 22
2430 2455 2480 2504 2529 35 8 IO 12 15
25 7 9*2 14 17 19 21
2553 2577 2601 2625 2648
2672 2695 2718 2742 2765 24 7 911 *4 16 18 21
24 7 9*1 13 16 18 20
2788 2810 2833 2856 2878
2900 ?9?3 2945 ___
2967 2989 246 8ll 13 »5 x? >9
20 3010 3032 3054 3075 3096 3*18 3139 3160 3181 320X 246 8ll 13 15 »7 X9
14 16 18
21 3222 3243 3263 3284 3304 3324 3345 3365 3385 3404 24 6 8 10 12
3598 24 6 8 10 12 14x5 »7
22 3424 3444 3464 3483 3502 3522 3541 356o 3579 ...
. 3655 3674 3692
23 3617 .3636
. . 37X1 3729 3747 3766 3784 24 6 7 9ii 13x517
12 14 16
24 3802 3820 3838 3856 3874 3892 3909 3927 3945 3962 24 5 7 9*x
4116
X2 14 »5
7
9*o
4048
4065
4082
4133 23 5
4099
25 3979 3997
3997 4014 4031
26 4150 4166 4183 4200 4216 4232 4249 4265 4281 4298 23 5 7 8 10 II 13 15
13 i4
.... 4409 4425 4440 4456 23 5 689 ix
27 4314 4330 4346 4362 4378 4393
689 1112 14
45’8 4533 4548
_
28 4472 4487 4502
... 4564 4579 4594 4609 23 5
29 4624 4639 4654 4669 4683 4698 4713 4728 4742 4757 13 4 6 7 9 1012 13
30 4771 4786 4800 4814 4829 4843 4857 4871 4886 4900 13 4 6 7 9 1011 13
81 4914 4928 4942 4955 4969 4983 4997 5011 5024 5038 X3 4 678 10 11 12
12
32 5051 5065 5079 5092 5X05 5”9 5132 5145 .5X59
.. 5X72 13 4 5 7 8 911
910 12
33 5185 5198 5211 5224 5237 5250
. . 5263 5276 5289 5302 13 4 5 6 8
910 11
34 5315 5328 5340 5353 5366 5378 5391 5403 5416 5428 13 4 5 6 8
910 11
5
6
7
124
5478
5551
5502
35 5441 5453 5465
5514 5527 5539
5490
36 5563 5575 5587 5599 5611 5623 5635 5647 5658 5670 12 4 5 6 7 8 10 11
8 9 10
37 5682 5694 5/05 5717 5729 5740 5752 5763 5775 5786 12 3 5 6 7
8 9 10
38 5798 5809 5821 5832 5843 5855 5866 5877 5888 5899 X2 3 5 6 7
39 59X1 5922 5933 5944 5955 5966 5977 5988 5999 6010 12 3 4 5 7 8 9 10
40 6021 6031 6042 6053 6064 6075 6085 6096 6107 6117 12 3 4 5 6 8 9 10
41 6128 6138 6149 6160 6170 6180 6191 6201 6212 6222 12 3 4 5 6 789
42 6232 6243 6253 6263 6274 6284 6294 6304 6314 6325 12 3 4 5 6 789
c. : X2 3 4 5 6 789
- _ 6425
43 6335 6345 6355 6365 6375 6385 6395 6405 6415
789
44 6435 6444 6454 6464 6474 6484 6493 6503 6513 6522 X2 3 4 5 6
45 6532 6542 6551 6561 657X 6580 6590 6599 6609, 6618 12 3 4 5 6 789
7 7 8
48 6628 6637 6646 6656 6665 6675 6684 6693 6702 6712 12 3 4 5 6
47 16721 6730 6739'6749 6758 6767 6776 6785 6794 6803 12 3 4 5 5 678
48 ; 6812 6821 683O,6839 6848 6857 6866 6875 6884 6893 12 3 4 4 5 678
6946 6955 6964 6972 6981 12 3 4 4 5i 6 7 8
49 || 69c102 6911 6920'6928
10 I 0000 0043 J10086 0128 0170
ia
1
1
1
2
3
4
fl
6
7
8
9
123 450 789
7050 7059 7067 x 23 345 678
50 6990 6998 7007 7016 7024 7033 7042
. .
51 7076 7084 7093 7101 7110 7118 7126 7135 7143 7x52
. 123 345 678
52 7160 7168 7177 7185 7X93 7202 7210 7218 7226 7235 122 345 677
.. 7308 73x6 12 2 345 667
53 7243 7251 7259 7267 7275 7284
. . 7292 7300
54 7324 7332 7340 7348 7356 7364 7372 7380 7388 7396 12 2 345 667
55 7404 7412 7419 7427 7435 7443 745 x 7459 7466 7474 122 345 567
50 7482 7490 7497 7505 75X3 7520 7528 7536 7543 7551 122 345 567
7582 7589 7597
57 7559 7566
.... 7604 7612 7619 7627 12 2 345 567
.. 7574 ..
58 7634 7642 7649 7657 7664 7672 7679 7686 7694 7701 11 2 344 567
59 7709 7716' 7723 7731 7738 7745 7752 7760 7767 7774 11 2 344 567
344 566
60 7782 7789 7796 7803 7810 7818 7825 7832 7839 7846
.. 7917 11 2 344 566
61 7853 7860 7868 7875 7882 7889 7896 7903 7910
63 7924 7931 7938“ 7945 7952 7959 7966 7973 7980 7987 11 2 334 566
63 7993 8000 8007 8014 8021 8028 8035 8041 8048 8o55 11 2 334 556
64 8062 8069 8075 8082 8089 8096 8102 8109 8116 8122 11 2 334 556
65 8129 8136 8142 8149 8156 8162 8169 8176 8182 8189 11 2 334 55*
66 8195 8202 8209 8215 8222 8228 8235 8241 8248 8254 11 2 334 556
67 8261 8267 8274 8280 8287 8293 8299 8306 8312 8319 l X 2 334 556
68 8325 8331 8338 8344 8351 8357 8363 8370 8376 8382 I I 2 334 456
“
8445 I I 2 234 456
69 8388 8395 8401 8407 8414 8420 8426 8432 8439
8494
8500
8506 I I 2 234 456
8457
8463
8470
8476
8482
8488
70 8451
‘ ‘ I I 2 234 455
71 85X3 8519 8525 8531 8537 8543
... 8549 8555 8561 8567
73 8573 8579 8585 8591 8597 8603 8609 8615 8621 8627 112 234 455
73 8633 8639 8645 8651 8657 8663 8669 8675 8681 8686 I I 2 234 455
74 8692 8698 8704 8710 8716 8722 8727 8733 8739 8745 112 234 455
75 8751 8756 8762 8768 8774 8779 8785 8791 8797 8802 I 1 2 233 455
76 8808 8814 8820 8825 8831 8837 8842 8848 8854 8859 I I 2 233 455
77 8865 8871 8876 8882 8887. 8893 8899 8904 8910 8915 I 1 2 233 445
_ . 8932 89^8 8943. 8949 8954 8960 8965 8971 I 1 2 233 445
78 8921 8927
79 8976 8982 8987 8993 8998" 9004 9009 9015 9020 9025 1 1 2 233 445
80 9O31 9036 9042 9047 9053 9058 9063 9069 9074 9079 1X2 233 445
81 9085 9090 9096 9101 9106 9112 9117 9122 9128 9x33 I I 2 233 445
83 9x38 9’43 9149 9X54 9X59 9165 9x70 9175 9180 9186 I I 2 233 445
. . 9238 I 1 2 233 445
83 9x91 9196 9201 9206 9212 9217 9222 9227 9232
84 9243 9248 9253 9258 9263 9269 9274 9279 9284 9289 I 1 2 233 445
85 9294 9299 9304 9309 93>5 9320 9325 933° 9335 9340 1 I 2 233 445
86 9345 9350 9355 9360 9365 9370 9375 938o 9385 9390 1 I 2 233 445
,9440 Oil 223 344
87 9395 9400 9405 9410 94X5 9420 9425 9430 9435
88 9445 9450 9455 9460 9465 9469 9474 9479 9484 9489 O1 1 223 344
89 9494 9499 9504 9509 95X3 95*8 9523 9528 9533 9538 OX I 223 344
90 9542 9547 9552 9557 9562 9566 9571 9576 9581 9586 Oil 223 344
9605
91 9590 9595 9600
.
. . 9609 9614 9619 9624 9628 9633 OKI 223 344
93 9638 9643 9647 9652 9657 9661 9666 9671 9675 9680 OI I 223 344
9727 Oil 223 344
..
93 9685 9689 9694 9699 9703 9708 9713 9717 9722
94 973 x 9736 9741 9745 9750 9754 9759 9763 9768 9773 Oil 223 344
95 9777 9782 9786 9791 9795 9800 9805 9809 9814 9818 011 223 344
9850 9854 9859 9863 Oil 223 344
.....
96 9823 9827 9832 9836 9841 9845
" 9903 9908° OK I 223 344
97 9868 9872 9877 9881 9886 9890 9894 9899
98 99X2 99*7 9921 9926 9930 9934 9939 9943 9948 9952 Oil 223 344
99 9956 996i 9965 9969 9974 9978 9983 99871 9991 9996 Oil 223 334
6937
The reader is probably aware that more detailed logarithm tables exist. Had such more
detailedtable^been used in the calculations included in this chapter, slightly different
answers may have been obtained.
f
1
i
4
/bv77 4.<M
0
1
2
3
4
5
0
7
8
9
01
•02
•03
•04
1000 1002 1005 1007 1009 1012 1014 1016 1019 1021 001
1023 1026 1028 1030 *033 ’035 1038 1040 1042 ’045 00 1
1047 1050 ’052 1054 1057 ’059 1062 1064 1067 1069 00 1
1072 ’074 1076 1079 1081 1084 1086 1089 1091 1094 00 i
1096 1099 1102 1104 1107 1109 1112 1114 1117 1119 01 1
05
•06
•07
•08
•09
1122
1148
”75
1202
1230
•10
•11
•12
•13
•14
”59
1288
13’8
’349
1380
•15
•16
•17
•18
•19
■20
•21
•22
•23
■24
’4’3
’445
’479
’5’4
’549
’585
1622
1660
1698
’738
•25
•26
•27
•28
•29
1778
1820
1862
’905
’950
•30
•81
•82
•83
•34
’995
2042
2089
2138
2188
•35
•86
•37
•88
•89
2239
2291
2344
2399
2455
2512
2570
2630
2692
2754
2818
2884
295’
3020
3090
00
•40
•41
•42
•43
44
45
•46
•47
•48
•49
”25 1127
”5’ ”53
1178 1180
1205 1208
”33 1236
1262 1265
1291 ’294
’321 ’324
’352 ’355
’384 ’387
”30
1156
”83
1211
”39
1268
1297
1327
’358
’390
’4’9 1422
MS2 ’455
i486 ’489
1521 ’524
’556 1560
1416
»449
’483
>5’7
»552
’589 ’592 ’596
1626 1629 ’633
1663 1667 1671
1702 1706 1710
’742 ’746 ’750
1782 1786 ’79’
1824 1828 1832
1866 1871 ’875
1910 ’9’4 ’9’9
’954 ’959 ’963
2000 2004 2009
2046 2051 2056
2094 2099 2104
2’43 2148 2’53
2’93 2198 2203
2244 2249 2254
2296 2301 2307
2350 2355 2360
2404 2410 24’5
2460 2466 2472
2518
.
2523 2529
2576 2582 2588
2636 2642 2649
2698 2704 2710
2761 2767 2773
2825 2831 2838
2891 2897 2904
2958 2965 2972
3027 3034 3O4»
3097 3’05 3”2
0
123 456789
11 1 222
-50
1
2
3
4
5
0
7
8
9
123 4 5 0
7 8 9
3’62 3’70 3’77 3’84 3’92 3’99 3206 32’4 3221 3228 11 2 3 4 4 5 6 7
3243 3251 3258 3266 3273 3281 3289 3296 3304 1 2 2 3 4 5|; 5 6 7
33’9 3327 3334 3342 3350 3357 3365 3373 338’ 122 3 4 5 5 6 7
3396 3404 34” 3420 3428 3436 3443 3451 3459 122 3 4 5 667
3475 3483 349’ 3499 35o8 35’6 3524 3532 3540 1 2 2 3 4 5 667
358i 3589 3597 3606 3614 3622 1 2 2 3 4 5 6 7 7
3548 3556 3565 3573 _
' ~ 3707 ’23 3 4 5 678
363’ 3639 3648 3656 3664 3673 3681 3690 3698
374. ’ 3750 3758 3767 3776 3784 3793 ’23 3 4 5 678
37’5 3724 3733 ..
_ 3846 3855 3864 3873 3882 ’23 4 4 5 678
3802 38” 3819 3828 .3837
3890 3899 3908 39’7 3926 3936 3945 3954 3963 3972 123 4 5 5 678
398’ 3990 3999 4009 4018 4027 4036 4046 4055 4064 123 4 5 6 678
4102
4111 4121 4’30 4’40 4. ’_50 4. ’_
.
.
59 ’23 4 5 6 789
4074 4083 4093
4169 4178 4188 4198 4207 42V 4227 4236 4246 4256 ’23 4 5 6 789
4266 4276 4285 4295 4305 43’5 4325
- - 4335 4345 4355 ’23 4 5 6 789
4365 4375 4385 4395 4406 4416 4426 4436 4446 4457 ’23 4 5 6 789
’23 4 5 6 789
..
4467 4477 4487 4498 4508 45’9 4529
_ _ 4539 4550 4560
457’ 4581 4592 4603 46’3 4624 4634 4645 4656 4667 ’23 4 5 6 7 9 ’o
4677 4688 4699 47’0 4721 4732 4742 4753 4764
.. . 4775 ’23 4 5 7 8 9 10
4786 4797 4808 4819 483’ 4842 4853 4864 4875 4887 ’23,;4 6 7 8 9 10
4898 4909 4920 4932 4943 4955 4966 4977 4989 5000 123 567 8 9 10
5012 5023 5035 5047 5058
- . 5070 5082 5093 5’05 5”7 124567 8911
5’29 5’40 5’52 5’64 5176 5188 5200 5212 5224 5236 124 5 6 7 8 10 11
5248 5260 5272 5284 5297 5309 532» 5333 5346 5358 ’24 5 6 7 9 10 11
5483_ »34 568 9 10 11
5370 5383 5395 5408 5420 5433 5445 5458 5470 ,.
5495 5508 552’ 5534 5546 5559 5572 5585 5598 5610 ’34 568 9 10 12
5623 5636 5649 5662 5675 .5689, 5702 57’5 5728 574’ ’34 5 7 8 9 10 12
5754 5768 5781 5794 5808 5821- 5834 5848 5861 5875 ’34 5 7 8 91112
5888 5902 59’6 5929 5943 5957 5970 5984 5998 6012 ’34 5 7 8 ion 12
6026 6039 6053 6067 6081 6095 6*09 6124 6138 6152 ’34 678 1011 13
6166 6180 6194 6209 6223 6237 6252 6266 6281 6295 ’34 6 7 9 10 11 13
11 1
11 1
11 1
I 1 2
222
222
222
222
51 3236
•52 ! 33”
•53 3388
•54 3467
II 2
11 2
I 1 2
II 2
11 2
1140 ”43
1167 1169
”94 ”97
1222 1225
1250 ”53
1279 1282
’309 13’2
’340 ’343
’37’ ’374
’403 1406
1146
1172
”99
1227
1256
01 1
01 1
o1 1
01 1
o1 1
222
222
222
223
223
•55
•56
•57
•58
•59
1285
’3’5
’346
’377
’409
O I I 11 2 223
01 I 1 2 2 223
Oil 1 2 2 223
O I I 1 2 2 233
01 I 1 2 2 233
•60
•61
•62
•63
•64
’435
’469
’5O3
’53»
’574
1611
1648
1687
1726
1766
’442
1476
1510
’545
1581
0 1 I 1 2 2 233
O I I 1 2 2 233
O I I 1 2 2 233
01 I 122 233
O I 1 1 2 2 333
65
•66
•67
•68
■69
O1 I
01 1
01 I
OI I
OI I
1 22
222
222
222
222
•70
•71
•72
•73
•74
OI I
01 I
01 1
OI I
O1 I
223
223
223
223
OI I
OI I
OI I
01 1
I I 2
223 344
223 344
344
223 344
233 44 5
80 6310
81 6457
■82 I 6607
83 ! 6761
•84 ’ 6918
6324 6339 6353
"
647’ 6486 6501
6622 6637 6653
6776 6792 6808
6934 6950 6966
6368 6383 6397
6516 6531 6546
6668 6683
"" 6699
6823 6839 6855
- - 6998
- - 7015
6982
233 445
233 44 5
233 445
233 445
233 45 5
I I 2 234 455
1 1 2 234 455
1 I 2 234 456
I 1 2 334 456
I I 2 334 456
85 ' 7079
•86 : 7244
•87 1! 74’3
•88 ,7586
,
•89 7762
7096 7112 7129
7261 7278 7295
7430 7447 7464
17638
7603 7621
.
7780 7798 7816
7’45 7161 7’78
73” 7328 7345
75’6
7482 7499
.
7656 7674 769»
. .
7834 7852 7870
2535 254’ 2547
2594 2600 2606
2655 2661 2667
2716 2723 2729
2780 2786 2793
1618
1656
’694
’734
’774
1807
1816
1858
’849
1901
1892
’936
’945
1982
’99’
2028 2032 2037
2075 2080 2084
2123 2128 2’33
2’73 2178 2183
2223 2228 2234
2275 2280 2286
2328 2333 2339
2382 2388 2393
2438 2443 2449
2495 2500 2506
2553 2559 2564
2612 2618 2624
2673 2679 2685
2735 2742 2748
2799 2805 2812
90
•91
7962 7980 7998
~ " 8185
8’47 8166
8337 8356 8375
„ ’ 8551 8570
853
8730 8750 8770
8017 8035 8054
8204 8222 8241
8395 84’4 8433
8590 8610 8630
8790 8810 8831
’34 6 7 9 1012 13
235 689 11 12 14
235 689 11 12 14
235 689 ” ’3 ’4
235 6 8 10 ” ’3 ’5
7’94 72” 7228 235 7 8 10 ”’3 ’5
7362 7379 7396 235 7 8 10 ’2’3 ’5
7534 755’ 7568 235 7 9’0 12 14 16
7709 7727 7745 245 7 9” 12 14 16
7889 7907 7925 245 7 9” ’3 ’4 ’6
8072 8091 8110 246 7 9” ’3’5 ’7
8260 8279 8299 246 8911 ’3’5 ’7
8453. 8472 8492 246 8 10 12 »4’5 ’7
8650 8670 8690 24 6)81012 14 16 18
8851 8872 ""
8892 246 8 10 12 1416 18
2844 2851 2858
2911 29’7 2924
2979 2985 2992
3048 3055 3062
3”9 3’26 3’33
2864 2871 2877
293« 2938 2944
2999
___ 300630’3
3069 3076 3083
3»4i 3U8" 3’55
I I 2
I I 2
I I 2
I 1 2
1 I 2
95 89’3 8933 8954 8974
■96 9120 9’4’ 9162 9’83
•97 9333 9354 9376
9397
. ......
•98 |955o 9572 9594)96’6
•99 [9772 9795198’7!984o
8995 90’6 9036
9204 9226 9247
94’9 944
... ’ 9462
9638 9661 9683
9863 9886 9908
9057 9078 9099
9268 9290 93
_”
9484 9506 9528
9705 9727 9750
993’(9954 9977
”32
”59
1186
1213
1242
1271
1300
’330
1361
’393
1426
’459
’493
’528
’563
1600
’637
’675
’7’4
’754
”38
1164
1191
1219
”47
”74 1276
’303 1306
’334 ’337
’3*5 U68
’396 1400
”35
1161
1189
1216
’245
’429
1462
’496
’53’
1567
1603
1641
’679
1718
’758
’432
1466
’500
»535
’570
1607
’644
’683
1722
1762
’799 1803
1841 ’845
1884 1888
1928 ’932
’972 ’977
2014 2018 2023
2061 2065 2070
2109 2113 2118
2158 2163 2168
2208 2213 2218
’795
’837
’879
’923
1968
2259 2265
2312 23’7
2366 237’
2421 2427
2477 2483
2270
2323
2377
2432
2489
’439
’472
’507
’542
’578
1614
1652
1690
’730
’770
1811
’854
1897
’94’
1986
333
333
333
334
334
222 334
334
334
344
344
I 1 2
1 I 2
I 1 2
I I 2
I I 2
334 556
334 556
334 5 5 6
344 566
344 566
•75
•76
•77
•78
•79
■92
•93
•94
7943
8128
8318
8511
8710
6412
6561
67’4
6871
703’
6427
. . 6442
6577 6592
6730 6745
6887 6902
7047 7063
246 8 10 12
246 8ii 13
247 9” ’3
247 9” ’3
257 9” ’4
’5’7 ’9
’5’7 ’9
15 17 20
16 18 20
16 18 20
1
UNIT II - BASIC STATISTICS
by Robin Shannon, Lecturer
Department of Economics,
University of Newcastle-Upon-Tyne
(United Kingdom)
CONTENTS
iii
Introduction
USEFUL NOTATION
1
1.1
The summation sign
1
1.2
Subscripts and superscripts
4
SECTION 1
SECTION 2
FIRST STEPS IN DATA ORGANISATION
2.1
Raw data
5
2.2
Data aggregation
6
2.3
Tables
7
2.4
Time-series tables
8
2.5
Geographical tables
8
2.6
Frequency tables
9
2.7
Two-way tables
Ten practical hints on designing tables
9
11
FREQUENCY DISTRIBUTIONS
12
3.1
3.2
Classifying data
Hints for constructing frequency distributions
12
3.3
14
3.4
Histograms
Frequency polygons
3.5
Relative frequency distributions
17
3.6
Population pyramids
18
3.7
Cumulative frequency distributions
21
3.8
Frequency curves
23
MEASURES OF CENTRAL TENDENCY
26
4.1
4.2
Summary statistics
26
The arithmetic mean
26
4.3
4.4
The median
29
The mode
30
MEASURES OF DISPERSION
31
31
5.2
The range
The standard deviation
5.3
The variance
33
2.8
SECTION 3
SECTION 4
SECTION 5
5.1
13
17
32
- ii
5.4
Comparison of standard deviations
35
5.5
Calculation of standard deviation of grouped
data
3*?
RELATIONSHIPS BETWEEN VARIABLES
40
6.1
Association and causation
40
6.2
The importance of the theory of probability
40
6.3
Association between variables
42
6.4
Covariance
45
6.5
The coefficient of linear correlation
47
6.6
6.7
Spearman’s coefficient of rank correlation
50
Causal relationships:
52
SECTION 6
6.8
6.9
regression
The method of least squares
56
General formulae of the regression coefficients
56
Interpretation of the estimated regression
coefficients
59
6.11
Extrapolation
61
6.12
Interpolation
61
6.13
Non-linear curve-fitting
63
6.14
Regression:
6^
6.10
)
a summary
EXERCISES AND ANSWERS
Exercise 1 (Suggested to reader on p.3)
ii
it
Exercise 2 (
p.14)
69-70,
80-81
71,
82
Exercise 3 (
ii
ii
p.17)
72,
83-84
Exercise 4 (
ii
ii
p.23)
73,
85-86
Exercise 5 (
ii
ii
p.29)
74,
87
Exercise 6 (
ii
ii
p.31)
75,
88
Exercise 7 (
it
it
p.38)
Exercise 8 (
it
p.52)
76,
77,
89-90
it
Exercise 9 (
Exercise 10(
it
n
93-94
it
p.59)
p.67)
78
it
79,
95-96
91-92
iii
Introduction
It has often been said that "you can prove anything with
%
statistics".
It has also been said that "there are lies,
damned lies and statistics"!
Such statements exemplify an
attitude to statistics which is all too frequently held - but
it is hoped that by the end of this Chapter the reader will
be convinced that it is a mistaken attitude!
The word "statistics" may bring to mind a variety of
impressions.
Most people recognise that numerical data are
. collected, organised and presented by the statistician.
This
chapter will have a good deal to say on this very important
aspect of the statistician's work.
But it is perhaps less widely
recognised that statistical methods exist also for the analysis
and interpretation of numerical information.
Empirical evidence
may, through statistical methods, be used to assess hypothesised
relationships between variables.
Naturally, such methods -
sometimes complex and technical - may be misused, whether
deliberately or not.
abused.
But non-statistical methods, too, may be
Anybody concerned with using statistics in the
educational field should be well aware of the scope - and
limitations - of statistical techniques.
The nature of statistics
There exists a body of mathematical statistical theory of
a highly formal nature, founded in various aspects of the theory
of probability.
This background theory cannot be explored here
iv
Our concern is above all with an exposition of statistical
methods for the practising educational administrator or planner.
involved in such down-to-earth questions as, how many children
may be enrolled in primary educational in 5 years’ time?
or.
how may intake rates into the educational system be expected to
develop over the coming decade?
To answer such basic questions, a number of mathematical
and statistical techniques need to he mastered*
In the previous
Chapter, the basic mathematical techniques necessary for an
understanding of the statistics presented here have been developed*
The reader should not, therefore, commence this Chapter before
having studied that Chapter.
The reader is again urged to complete
the exercises.
Competence in statistics cannot be gained through
reading alone.
One hour of practical work is probably worth
several hours’ reading1
This Chapter will develop from an initial introduction to
certain notational conventions to an examination of ways of
arranging and presenting data.
It then moves on to an exposition
of a variety of summary statistics, after which the reader is
ready to be introduced to the concepts ctf correlation and regression
analysis.
1.
1.
USEFUL NOTATION
1.1
The summation sign
1.1.1
In analysis of educational data, it is frequently
necessary to add (sum) a series of numbers.
For example, the
reader will probably be familiar with the general idea of an
’’average” of a set of numbers.
We shall formally introduce
several average measures in Section 4 below.
To consider the
use of the summation notation let us consider the arithmetic
mean, which is defined as the sum of a set of data, divided by
the number of figures being summed.
Suppose we wish to find,
for example, the average class size in a particular school.
Let class size be denoted by the variable X.
If there were.
say, 5 classes, their size could be written symbolically as:
xl» x2’ x3f X4* X-5
with each numbered subscript denoting a particular class.
1.1.2
How many students are there in all classes?
Clearly, the
sum of the 5 classes:
X1 + X2 *
(1)
+ X4
The average number of students per class is therefore:
X1 + X2
3 + X4
5
(2)
Generalising, if there were n classes the total number of
students would be:
* xn
X1 + *2 +
(3)
and the average number of students per class would be:
+ xn
X1 + X2 +
n
(4)
2.
lol»3 Clearly, this is a very cumbersome and unwieldy way of
writing down what is, in essence, a very simple idea.
It is
here that the summation sign, written 22 (pronounced "sigma”),
proves so useful.
above*
To see how it is used, consider formula (3)
Utilising the 22 notation, this may be re-written:
n
i=l
(5)
xi
The figures below and above the
tell us to sum from
i=l (the first value of x) to i=sn (the last, nth, value of x)*
Thus, for example, the expression (1) above say be written in
the much more convenient shorthand:
5
i=l I
and similarly formula (2) may be written:
5
ir,1i
5
1.1*4
We have dealt with the summation of a variable,X*
that when we sum a constant, a, n times we obtain:
n
a
ial
» na
For example, if a=2, and n=4, then:
4
i=l
= 2 + 2 + 2 + 2
= 4/2
= na
Note
3.
1.1.5
When sunming expressions involving two or more variables.
a little careful thought is necessary in using the notation.
For example:
y1 + a)
i
n
>" a
+ Xn + yl + y2 +
= X1 + x2 +
n
n
r-
- + yn
Xi
i=l
i=l
yi
+ n a
But this approach, which involves taking the
sign to each
term, is only valid where addition or subtraction are concerned.
It is not appropriate with multiplication or division.
For
example:
n
i=l
(xi yi a)
(6)
does NOT equal;
(7)
(The reader should satisfy himself or herself of the truth of
this proposition).
The only valid manipulation of (6) is to
bring the constant outside the bracket, the expression (6) then
becoming:
n
a
(5Ci
Xi)
The reader should now
do
EXERCISE 1
Exercise 1 on page
i=l
4.
1.2
1.2.1
Subscripts and superscripts
The first sight of a ten such as:
x.t .
13
can be very off-putting to the beginner in statistics:
There
seems to be an incomprehensible jumble of lettering vhich appears
designed to confuse rather than clarify.
In fact, as so often
with techniques in mathematics and statistics, a little time
invested in learning the method will pay handsome dividends in
time saved later.
Notational devices such as subscripts and
superscripts are simply a very convenient form of shorthand.
1.2.2
For example, the reader may imagine that he or she were
investigating how many students there had been in each class in
all grades in a school over a number of past y^are
define the variable
Let us
as the number of students in a class.
Let:
i 3 1,
, n s number of classes in each grade
j - 1>
♦ fc = number of grades
t = year
Then, for example, we could write
t
Xij s number of students in the ith class in the
jth grade in year t
Ihe reader may appreciate that Xtij is a very convenient way of
writing down what would otherwise have to be written, in non-
symbolic terms, in a long sentence.
1974
24
For example:
= number of students in the second class of
the fourth grade in 1974
1.2.3
Recalling our summation notation, the expression which
stands for the total of all students in all n classes in the jth
5.
grade in year t would be:
n
t
X. .
ij
i=l
t
•
1J
*
t
X2j
xt .
If we wish to sun all the students across not only all classes but
t
also all grades in the year t, we must employ a double
n
i=l
notation:
k
3=1
x.t .
U
(8)
This conveniently stands for the lengthy expression:
f
r:
t
X11
t
X21
Xil
t
Xnl
t
X12
t
X22
t
Xi2
t
Xn2
t
Xlj
t
X2j
t
x. .
t
t
Xlk
t
x2k
t
Xik
nj
*
The value of subscripts , superscripts and the
L
clear in this case.
ti
nk
x
notation is particularly
Assuming there are n classes in each grade,
(nk) separate terms have been condensed, thanks to these notational
devices, into one simple expression, (8).
L
L
L
L
2.
FIRST STEPS IN DATA ORGANISATION
2.1
Raw data
2.1.1 Data which have been collected?perhaps by questionnaire, but not
yet further organised in any way are often referred to as "raw” data.
The educational statistician may have at his disposal a great deal of
such data in his basic records which might, for example, be in the
fora of completed annual questionnaires.
Hie data in these records must
6.
be arranged, summarised and presented.
ordered patterns.
The statistician will seek
Although the topic of good questionnaire design
cannot be discussed here, it should be pointed out that a welldesigned set of questionnaires and carefully maintained basic records
are invaluable foundations for basic data organisation and analysis.
©
2.2 DatS aggregation
2.2.1
In drawing up tables (the main types of analytical tables are
discussed in 2.3-2.7 below) varying degrees of data aggregation may
be performed.
Presentation of completely disaggregated data would
in general be an indigestible meal for the final user of the
statistics (1).
For example, consider data on pupil enrolment in
educational institutions at all levels.
The data will probably be
gathered by means of an annual questionnaire, and will give a variety
of information usually including the sex, age and grade of all pupils.
For basic tables these data may be aggregated in a variety of ways.
Pupils may be aggregated by sex, age and grade for each level.
At
a higher level of aggegation, national aggregates may be found.
2.2.2
The criteria for aggregation will always depend on the purposes
of the analysis.
Certain basic aggregates will almost always be
made, such as those mentioned.
But a number of other aggregates will
be necessary for particular purposes.
aggregated by language spoken;
For example, pupils might be
by geographical region;
by type of
school, or according to many other possible classifications.
at the highest level, national data
(1)
And
may be further aggregated.
For some purposes of course, the original information, with no
aggregation at all, may be essential.
7.
for example by geographical groupor by levels of economic
development, or still further criteria*
2.3 Tables
2.3.1
In developing ordered patterns fron the mir infornation,
There is a
an early step is to construct well-designed tables*
great variety of different ways of drawing up tables*
The cobmou
attribute of good tables is that they all seek to show patterns
over space, siie or time.
Their conson objective is to bring order
to the chaos of raw data, and to infers the reader as unambiguously
as possible about an aspector aspects^of the structure of the
information collected.
/
s2e3.2 Depending on the nature of the data and the purposes of the
analysis, four common basic types of tables are generally used:
those showing a time-series, i.e., comparing a variable
i)
or variables at one period of time with another period
ii)
those showing a geographical distribution
iii) frequency tables
iv)
2<4
two-way tables
Time-series tables
2e4el
A time series may be defined as a set of ordered observations
on a quantitative
characteristic of an individual or collective
phenomenon taken at different points in time (1).
Numerous time series
are presented in the Chapter oh Basic Mathematics, and there are
further examples below.
Here we stress the main points to bear in
mind when presenting time series in tabular form:
always specify clearly dates to which the data refer
always specify clearly the definitions of the data
where definitions change in the course of a time-series, bring
this fact to the table-reader's attention.
This amy be done
either by a footnote or in the figures themselves; e.g., by
using italics.
2.5 Geographical tables
2.5.1
Raw data may be aggregated and presented in tables showing
geographical (e.g. regional, or urban/rural) distribution.
Such tables
are often of great value in making inter-regional compari
If
they are also time series tables, regional developments over time may
be analysed and compared.
(1)
See the Chapter on Basic Mathematics, para. 2.1.1
2.6
Frequency tables
2.6.1
Examples of frequency tables are presented in Section 3
below.
A frequency table shows how often a particular characteristic
is present.
Often the data is expressed in percentage fora, giving
an easily-grasped impression of the pattern which may be present.
2.7
2.7.1
Two-way tables
Two-way tables are very widely used by the educational statistician.
In Example 1 below, data have been taken from a basic Unesco questionnaire
referring to Cameroon in 1975-76.
and grades horizontally (raws).
Ages are shown vertically (columns)
At each intersection of a column and
row (cell) is the number (absolute frequency) of pupils of a particular
age in a particular grade.
Data, which have been collected by the
national authorities, are thus presented already aggregated by single
year of age, sex and grade.
Rows and columns have also been aggregated
(see the extreme right hand column and bottom raw respectively) and
the grand totals (for males and females together, and females only)
presented in the bottom right hand corner cell.
•
Example 1: a two-way table presenting enrolment by age, sex and grade
Table 1; Enrolment at the primary level, by age, sex and grade, Cameroon 1975-76.
Age of
pupils in
completed
years
Under
5 years
5 years
■yroTALy
Sex
7 years
Female
Both sexes
Both sexes
Female
Both sexes
Female
8 years
Both sexes
Female
9 years
Both sexes
Female
10 years
Both sexes
Female
11 years
Botli sexes
Female
12 years
Both sexes
Female
13 years
14 years
15 years
16 years
Both tfexes
Female
Both sexes
Female
Both sexes
Female
Both sexes
Female
Both sexes J
TOTAL
all ages
Grade 3
Grade 4
Both sexes
Female
6 years
Grade 1
Grade 2
Female
___ 6\2.l
3 2 !______<4 j
3ioj _____ 5
'
5481~ 34 J
>23 2Sfa] 2.X 002I
332
I
Grade 5
Grade 6
Grade 7
rar
grades)
|
|
1
<3 U14I
314 faU-g
I
mu 8^4
Sfa 804
3 %oa
234
va
45 413.1 60 Mi]
%
3otj
£12.
l<2 \3>M
34 244]
02S 4314
^40
3>\ 8541 SG Igd 41 424I U 02?
fesn __ 421 ZZjJ I lyl <0;
14 aaol
fasal 11 4?1] < mb
08_____ tg ______ 1
il lbs] 34 <4oj 4U 14 54334 ft
<31
1
195
SIP41
4 448] \< 44
fli 45 \3 owl 3 t> X
■auAj .....
<8
4 424] n <0'1 33 G4 3<
2' 3S3
am ^914 <413]
1 233] 4 fa48 \4 434] \b '<^ 4 gfao]
°l%
«4<22
b
49d
aa
bad
5.1
bfao]
28 goo! \4 <441 i <o4l ~o8<yd
\ fc241
fcUoJ
to Ofaol \3 481 '2 fog 4 4 2 34
fcaiy
12. 4aXT 2 ' "4! 28 4<8j 2fo 42'] 44 OU ] I G K3j
fcisl
< 2^ 1 3b(o '2 4fall IO b&fa.|
355]
\ bfc»l-4
Ul 130
3241 \ os<d
<I2] \a 243] 22^'81 3' <8d U- oog| ~44<6^
\ o&J 383, a asa S 3<4j IO 'o£ ' 2 14\] \ faSO 33 W
\ ^83] <34lJ '244^ 33 <184 a 8 3b 44
108]
38]
I Aol
433,1 a
< fa 28 \O O<1
v \oa.
\4ol
441
3o \8fcl
4-vo] a 332 rVbtj jmsuo] \ 4-41
2 a]
<3314
•4
<53
\ 2,O<lj
2 44g
341
fa'ol 2 824I \3 ood
mJ
4841 ~4 ^<141
'8^1
<13'1
H-j
tySj
\8G|
°>ai| u
a.oo
444j2o\ U.4OJ
K mo] Miaiod
<3> 4'4 %u. fc4<o
^lu-bl s'4^4 {00 04^
\oo s's
The table shows, for example, that (in 1975-76) there were 1102 girls in the seventh grade
aged m; and a total of 11028 children in the fourth grade aged 8 years. A total of
1,122,900 children were enrolled at the primary level.
--- ?
,
.
—;
O
II.
2.7.2
The four basic types of table discussed may of course appear
in a mixed form.
It would be possible, for example, to have a table
which presented regional enrolment data over a period of time.
classified two ways, in both frequency and percentage form.
But there
would be obvious dangers for clarity in presenting a table of such
complexity!
There are a number of simple principles which should be
observed in designing any table.
Bearing in mind that the fundamental
to
purpose of a table is always to help the reader^better
comprehend
the patterns and meaning of quantities of numerical data, the main
principles are listed in the following paragraph.
2.8
Ten practical hints on designing tables
a table should be unambiguous and as simple as possible
footnotes should be used to explain any omissions, necessary
approximations, or changes in definitions which might occur
over a period of time
units of measurement should be made clear, particularly any
changes in the units over time
a table should not be too large.
meal for the reader,
This makes an indigestible
The over-sized table should either be made
into two or more, or have its data further aggregated
if groups of data are to be compared they should be placed close
together in the table
summary statistics (see especially Sections 4 and 5 below) and
percentages, etc., should be placed'close to the data from which
they are derived
generally, a vertical, rather than a horizontal, arrangement of
data is preferable. The eye moves down columns of data more easily
sections of a table may be separated off by horizontal or vertical
lines, again to aid the eye
- totals should usually be given.
These are not only valuable in
themselves, but are often a useful check on the accuracy of their
components
sources should always be given,preferably immediately beneath the
table
tt.
3.
FREQUENCY DISTRIBUTIONS
3.1
Clasaifyin^ data
3.1.1 When bringing soae initial order to large quantities of
raw data it is very often useful to distribute the data into
groups.
These groups are known as classes or categories.
The
number of items of data falling into a class is known as the
class frequency<
If the data are arranged in a table by classes.
and the corresponding class frequencies are also tabulated, the
table is called a frequency distribution in tabular fom.
Example 2:
a frequency distribution
A survey of teachers was carried out in 1973 in Afghanistan.
Table 2. below is a frequency distribution of the ages of all the
teachers in the survey:
Table 2:
Ages of a sample of teachers in Afghanistan, 1973
Frequency
3.1.2
< 20
20 <25
25 <30
30 <35
35 <40
40<45
45
237
7284
8146
2166
1198
963
959
Total
20953
The following important points should be noted about the
so-called grouped data in Example 2.
)2.
the variable under analysis, age, has been grouped into
classes consisting of intervals of 5 years, except for
the first (<20) and last
/
which are called open
class intervals
-
the sign ,l < ” has been used to avoid any ambiguity about
the correct classes into which all individuals fall*
the intervals cq^) d
it
hm'H-Cu
:
20-2^
25-M
etc*
z
ah
uthe end numbers are known as the class limits (or boundaries).
For example, referring to the 20<25 class interval, 20
is the lower class limit and 25 the upper class limit
- the difference between the upper and lower class limits
of a class interval is the class width:
in this example
(apart from the first and last classes), the class width
is a constant 5 years*
Equal class width is convenient.
but not essential
the midpoint of the class interval is known as the class
midpoint*
It is obtained by summing the lower and upper
class limits and dividing by two*
The class midpoint,
of the (30 < 35) class is therefore (
3*2
304-35
•) = 32*5 years *
2
Hints for constructing frequency distributions
Find the largest and smallest values of the variable*
The difference is called the range of the data
split the range into class intervals*
Usually a minimum
of about 5, and a maximum of about 20, classes will be
m.
found best
find the class frequencies by counting the numbers of
observations belonging to each class interval •
EXERCISE 2
The reader should now
3.3
3.3.1
Exercise 2 on page 7/ •
Histograms
We have seen that a properly-constructed table is a first
step in the informative presentation of data.
representation is known as a histogram.
One useful graphical
A histogram is a set of
rectangles with the following characteristics:
the area of each rectangle is proportional to its class
frequency
each rectangle has as its base (along the horizontal X
axis) a class interval centred at the class midpoint.
If the class intervals are all of the same width, then the heights
of the rectangles may be made equal to the class frequencies.
If class intervals however are not of equal width, the heights
must be adjusted in accordance with the principle that areas
remain proportional to frequencies.
/$:
Example
comitruction of a histogram
Figure 1 presents the data from Example
histogram.
in the fora of a
Note that frequencies are scaled on the vertical,
(^)» axis, and the variable, in this case age, is scaled on
the horizontal (x), axis.
Rectangles are of width 5 years,
centred on class midpoints, 22.5, 27.5, .... 42.5.
the first and last rectangles been constructed?
How hare
In order to do
this, it has been necessary to sake an assumption about the
lower class limit of the " < 20" open class interval, and about
uU5 xaA ©•4<r'1
the upper class limit of the A
/
open class interval. For
the purposes of this Example, these have been arbitrarily assumed
to be 15 and 60 respectively.
(Ideally the statistician would
like to have further detailed information to determine the true
lower and upper class lisits).
Thus the first class midpoint is 15+20
17.5 years;
2
45+60
the last is
= 52.5 years. The first has a length of 5
2
years. as have all classes other than the last, which has a
width of 15 years.
In order to preserve the rule that the area
of each rectangle be proportional to its class frequency, the
height of the final rectangle must therefore be divided by 3
(i.e.,
) before plotting on the graph (remembering that area a
width x height).
959
3
Thus the final rectangle is plotted at height
= 319.7.
Figure 1 shows in a strikingly clear fashion how the teachers
were distributed by age, the great majority of them being between
20 and 30 years old.
«
•>O I............
- I
J0O
■J;
■■4—jow
7580
L .
mi rrprrr-T
I.. ir-M
:Zl
•:r'T t 1/
[
L/L.J
.. #.
I
..
f
_p
I ; ^TTiFT
r.m. :::
:.. i . I
1
■i ; \4
r:i
r^’’- r”-’
:
'
■ ’
;.....
■
i
I
i 'I
I ■
number
of
teachers
(frequency)
I
. -\
.. I
1-!
I
I : •i
.J...,J.
I
4
■ r
T
i........... ■
.
i
i
L
U0«o-
I-
i ”.'. ' " j -
“ •
i
S00V-
i
of a sample of teachers in
Afghanistan, 1973
.
I
... r
. :•?:. I
frequeincy polygon. Biioving frequency
A ^rrhntrl titnri
■
; i rf ; !i
4
T
;■.-41*. I
!
■"T-
.; : j.;:
dlstributiiori of
I
-ir
rrw
r___
i"'n__ ___________
,............ .
Wi,
.
Histogram and,
...
zL I
^0- . ..r
l>t00'
' 171 •' Ht-. - j r~
Histogram i
*
*
Figure 1
t
.t.L. .
1
I
i
r
/
| Frequency polygon
Jrto -
I
Jeoo-
^Voo -
1
Aeuo/Mo -
- -| - ;
,
/oio •
.. J.
....
t
•-1
$00 -
IS
r~
4o
xr
Source;gee Table 2^
Jo
bo
Mr
yo
io
Age (years)
___j
I
i
/7
3.4
Frequency polygons
3.4.1
The broken line joining the midpoints of the tops of the
rectangles in the histogram in Figure 1 defines a frequency polygon.
It can be seen that in the case of the first and last class
intervals the line has been extended beyond the original range
(15-60 years) of the variable.
Thia is necessary to ensure that
the area under the frequency polygon is exactly the same as the
area of the rectangles in the histogram.
EXERCISE 3
The reader should now
3.5
3.5.1
Exercise 3 on page 72. .
Relative frequency distributions
The relative frequency of a class is the frequency of the
class divided by the total frequency of all classes.
Usually it
is expressed as a percentage
Example Z};
a percentage relative frequency distribution
Using the frequency distribution in Table 2, Example 2, a relative
frequency distribution can easily be constructed:
Table 3:
Percentage distribution of ages of a sample of teachers
in Afghanistan? 1973
Age
(years)
Relative frequency
< 20
20 C 25
25 < 30
30 < 35
35 < 40
40 < 45
lyS
over
1.1
34.8
38.9
10.3
5.7
4.6
4.6
(%)
Total
100.0
The sum of the percentage frequencies should obviously be 100%.
However rounding errors may sometimes cause the total be be, e.g. 99.9%.
)2.
3.5.2 A graphical representation of relative frequency data
can be obtained simply by changing the scale on the vertical
axis from absolute to relative frequency.
exactly the same.
The diagram remains
Such graphs are known as relative frequency
or percentage histograms, and relative frequency or percentage
polygons.
a
3.6.
Population pyramids
3.6.1
One very widely-used form of a frequency distribution is
the population pyramid, a graphical representation of the
population classified according to sex and age—group.
this is a double histogram.
In effect.
However the rectangles now have
their bases on the vertical axis (age), frequencies (absolute
or relative) being measured along the horizontal axis.
This
axis is scaled positively in both directions from the origin;
one side for males, the other for females.
3.6.2
In Example
below we present a population pyramid.
The
pyramid technique may of course be applied to data other than
population»
For example, enrolment by sex and single year of
age, or grade, or by level of education (assuming certain age
limits to levels) could be shown.
In many less developed countries
the enrolment pyramid shape is distorted at the lower ages,
because children are late in enrolling.
This distortion can in
fact be a useful indicator of how far the educational system has
developed.
At a later stage of development (assuming no significunt
decline in birth rates) the classic pyramid shape is 1ikely to emerge.
Example 5~:
Table
construction of a population pyramid
presents the population of Indonesia in 1971 by 5-year
age group and sex, as percentages of total population^ Figure
2 presents the graphical representation of these data.
Table
Population of Indonesia by age and sex, as a percentage
of total population,1971
Age
(years)
Males: % of A-cftaX
Females: % of
population
-y-ol population
0 < 5
5 < 10
10 < 15
15 < 20
20 < 25
25 < 30
30 < 35
35 < 40
40 <45
45 < 50
50 < 55
55 < 60
60 < 65
65 < 70
70 < 75
75 and over
not known
8.9
7.5
6.2
4.5
3.6
3.1
3.2
2.9
2.6
2.0
1.5
1.1
0.7
0.6
0.4
0.2
0.1
8.7
7.3
6.0
4.7
4.2
3.8
3.7
3.1
2.6
2.0
1.5
1.2
0.8
0.6
0.4
0.2
0.1
Total
49.1
50.9
(/) Nq/z i
-A> Irvd.
I
3.6.3 The population pyramid of Figure 2 has arbitrarily
assumed a maximum age of 100.
Pyramids offer a clear visual
appreciation of the difference in age and sex structure of
different populations.
The demographic implications of this
figure cannot be discussed here
t but it might
be noted in passing that this Indonesian pyramid is typical of
a less developed country with relatively high rates of mortality
and fertility, giving the pyramid a characteristic broad base and
Figure g-
Population of Indonesia by age and
!
h-lr- Be
l,i;- I:i
■r
r '
-
1
:
:■
i
I
j
• T
;
’I
I
!
■
i
...
-
-
.
...
1 ■
11
~rT iT’i'- io
f,-
L
i
r
/o
u±i
f (r
I
: -r-
• ■ H.
L..'
.
I
si T' :]'■ 1”'
\
i
.1: i
■ i
: | r-H
i
F R:. 1
i
i
■ j
■
.. 1
+
i
i
females
IfO
Jr
]o
LC
Ifi
r A J i!
T
t
ir
lo
f
0 0
0
L
4
:
:
1
!
♦
■ 1
T
«
-rJU
-r
J
T
H
T-
r
7
1
-r
9
•
■•’
A>
!
Percent of total population
’
1
■ ■ I
f
TT’?
i
r trH-
!
nr
I
■ti
Sources 'see Table 4 '
:.rzgr
i
rr
c
7
'
lo
I I • r
H
•“if ■'i.... ;
—
i
lo
!
1
j
i:
/r ::R'
..
.... -.f
I.:*
I.
'•
<?r
i males
•
•
Elr
1- !■
I
___ L:
!<■
I
I
■
Li,.
/do
1
i- ’
;
LL
i' ;
t
S A6e :
•I
i
T J'j •
e..— .
11
...
. I
i' -U!■
T7.T " TJ "
? 4^
-’‘.a
: fff 1
!
I
1971-
/'■ "TfpTi-r—'.■•e
I
IEO 1
narrow top.
3.7
Cumulative frequency distributions
3.7.1
The total frequency of all values less than the upper
class limit of a given class interval is called the cumulative
frequency up to and including that interval.
For example, in
Example 2., the cumulative frequency up to and including the class
interval 35 <40 is 19031.
than 40 years.
That is, 19031 teachers were aged less
Similarly, considering Table 3 in Example//, in
relative frequency terms, 90.8% of teachers were aged less than
40.
Example 6 :
a cumulative percentage frequency distribution
Table C*presents the cumulative age-distribution of teachers as
A.
calculated from the data in Tables
of Examples^, and assuming
arbitrary lower and upper limits of 15 and 60 years.
Table 5"i
ctHoulative distribution of ages of a sample of teachers
in Afghanistan, 1973
Age
(years)
15 <20
20<25
23<30
30 <35
35<40
40 <45
45<60
3.7.2
Cumulative frequency
Absolute
237
7521
15667
17833
19031
19994
20953
%
1.1
35.9
74.8
85.1
90.8
95.4
100.0
The percentage cumulative frequencies have been plotted
on a graph in Figure 3..
Percentage cumulative frequencies less
than each upper class limit are plotted against that upper class
limit.
The resulting graph is called a (percentage) cumulative
Figure 3
"“I
%
FTTTT
4 T
' : i
Percentage cumulative frequency digtribution of ages of
a. sample! of teachers, in Afghani a tan f ;1973i-
:
L
Era 4
i /oo-i
•-1 j; r
r -ra r
4 ■;l i 1
..i.-.l. '....J
TWi
I
H. r •I..'
Cumulative
frequency
(%)
. c
...
!
; ■ i •
I
; J:
Lo d
Ez-t-
M-r
;d.'.
■
.. ... |
•
J ■
.•uLL.-l
I
i
!
4.
i
I FUj d
!
1 .
■
I ■. I
:
i-H:
r
1:.. ■
;
I”
___..i.
1
!
• -4......
r• i
/1
/ 1
/
1
3o -
1:
I
I...
T
. . !
/ * ’■
w-
1
;.
- -’--^4 -
...... !
r:
fo ----
|
.
’i
..[.. J ... _•
i
!
!
J.
■
I
..
■—r -F-r- ;
...4
p-:
i.
1
1
'
1
Jo '
1 <
1
1 1
/O ‘
0^4-
/f
J
r
^0
^0
Af
median age
-)------------------ j-----
Jr
lo
J....:;...J. .
24.3
I
Source s
---------- 1-------------- -7-
r......... 1
•
Bee Tabled"
1
!
rr
- -4
.
i-
;
-i .. .■.
• ? . . 1
d U
■!
' i
'....J..'. :
’■ -
Age,(years)
’
!
■4
;
.
•
,
I
i i i ' !
j
... I--^. 4 . . :
!
frequency polygon or a (percentage) ogive.
example of a ’’less than” ogive.
Figure 3 is an
A ’’more than” ogive can easily
be derived from a ’’less than” ogive.
In the context of Example
6 a ’’more than” ogive would be constructed by considering the
frequencies of teachers 15 or more years old, 20 or more, and so
on.
1 .
EXERCISE 4
The reader should now do Exercise 4 on page 73*
3.8
Frequency curves
3.8.1
r-
In Figures 1 and 3 the frequency polygon and the ogive have
been constructed by connecting the appropriate points by straight
lines.
In a situation in which the rectangles under the frequency
polygon become progressively narrower (i.e. the class intervals
become ever shorter), the lines joining the midpoints of the tops
I
i
of the rectangles come ever more closely to approximate a smooth
curve.
Such a curve is called a frequency curve.
Similarly, smooth
ogives may be obtained in principle.
L .
3.8.2
['•
frequency curve when dealing with a continuous variable (such as,
Theoretically, we can only obtain a perfectly smooth
e.g^ height, weight, time, intelligence, etc.).Whenever a variable
is discrete (such as, e.g., class size, grade enrolment, etc.), a
perfectly smooth frequency curve can never be obtained.
Nevertheless,
in practice, discrete variables may very often be treated as if
they were continuous, with very little loss of accuracy.
Such
treatment is clearly best justified where class intervals are narrow.
Further, where the data are sample data, the larger the sample
size, the more likely is it that chance fluctuations, causing
large differences between adjacent classes, will be diminished
in size*
3*8.3
There is a variety of characteristically-shaped frequency
curves.
Figure 4 shows forms commonly met in the educational
field.
Perhaps the most important, both in theoretical and
practical statistics, is the second curve shown, which is
symmetrical and bell-shaped.
If perfectly symmetrical, it means
that observations an equal distance on either side of the central
maximum have the same frequency.
For example, intelligence, as
measured by standard tests, is approximately symmetrically distributed.
In practical, descriptive statistical work however, any of these
types of curve may occur.
We have already seen in Figure 1 an
instance of the first curve in Figure 4, a frequency polygon skewed
to the right (defined as positively skewed)•
A U-shaped distribution.
as shown in the fifth curve, could result from graphing the
frequency of deaths (on the vertical axis) against age at death.
Relatively high mortality tends to occur among both the very young
and the old.
The seWctuWxcurve, S-shaped, might be encountered in
drawing a cumulative frequency curve of pupil age distribution.
We shall meet a curve of this shape again in the discussion of
non-linear curve?-fitting in Section 6 below.
Particularly useful in
time-series analysis, it is known as the logistic curve.
As a mental
exercise, the reader should think of practical examples of the third,
fourth and sixth frequency curves in Figure 4.
fekfefeps
feE.fei
fe
*r ^yrpfefe
-:::F
-
1
fe fefeT
Fl !stsi;s
fe ■■fefef'■■|FEE
:--B:
pssps!
7T•!:::; s'l
•:’!:::-: '.rr..;
iliP"
H
:::.-
w
..........sfE [feeg ;E'E;i
Tfe FePMsBbW•fe ;.; bee
fe
fep®
I ___ E-S
hEBs
EE-s-s- ; ;EEBE.jF;EEEEEi-FE
feFfefefefem
I- Pfe
FiEiBBEsF
EsBEFEF_________
rpfefefefefeBI
ifefe
vE-S-FEn
fefefe
•■■••ferfefefefefe
feifefefefefee
fe :\kfeifefefeBggfefefe-g
•' • E~lE
EgEE
Wfe-: sfefePsgfeip
—isfe'BtEppEpfeiE
. .... , fespEpl
TTfmn-fe ym f-rferm:
mrissh-a: s
i. " ;' " FBF n sr^nFiEE;
—r- : r- • - m • - ■ •
jEB -53-^■EiWn EjEsEs HBfeB’BprjfefgfeB'
nSilSlS^S^
SfS' £EE
EspEB;
fe:\fefc
fefen H®S
,
-HF
- '-~-r ■
:PWfeBfe
---'-jy'-.
-steEEE'i■Efefe
’Es- :sBp^aEEBEE
;EEnte fee “SS-El
mi-fefe EikE
.
SE
h' E
oifelegeg
s|E
,
tnrpf:.7
Hfe—
"ife
e^Bfe feiEdE rfe rfe
Bs:Bfehm
M- —
’■ ' sfe
tsiM;
Isb-Be
EBEEF
fefei rife:.’ Bfe
rngfeejifepfee
SEpEp
tSSt
Ep;BB'
sff-j
■' 'S? ::
BBhEE
, ■ .„ , ,, ■ ■ ifefeP-- ■ . ,,
FkfeEBfeL ,;-.osSn-SEsgEEs
fep
. sfe
fefefefefej
- -. t ' s|--r .; .;
■p Es-E-EsjS
j.__
E-.n-.;;.l----- ri-----:.-:rt :—• r.-irrnrr
feS
’.s;sfe
BE-iEsEI
-f-—.-rr. rri-rri-mrl
1' fetfei rfeifefefe-fegtfer
- BfesFsfefe
5553
fefefej
■ ytmsfeifesFFfeBI
npppEEfeEjfelfe
feilfejsEfetfefer
SB BBH'''feBPfePhOffe j Wwte
fefe
atMfeffefefe:
Fir iBBi Bspplss Sp
?
BFPsfeEsfeilgEpiE pEEpfelifefefeiE
Pferfeiikii- »
\
fekfe ifetl.m Li "i Ljj r
HFinBlfe. Pp—
?SEi
Ps
'p
BBHl
w
feferfefefefefesHfefefefefeferSHr
feknfe BffefePBfeBfekbfe;.:
:
spsq^Sgeg
Hgt
Upfe SH-isEE;
Pk^fefefeEfefe
;eB kn-kr ntifeEj
,fefe
pi
rSSs . mFiiB
fefejfe
fefel-W
iiBefe
ife
sss
]fefefefe
jBsiEtS
fe;
WfeWB
....-4^ I
ese
ie
fegSB
nFFfei,
T-H
.
yw
:fefefeHuhfefeH;fe
. ifefefek:: ‘
Btfe .
.. ipEE^F
IF:; rtBBEii+sv
ifel.
j
Jss i_- r:lnxr i:B,—-
* i* e x*! i'—- t"rt ** ~ * e
rim; trt;
rife ”?!
ifeswHfc
fefefe
—ife FWIEIss^e feB fee
kMfe fefe
'jss
ffllffi iEinisEEjElT
W1: ! Fj-Ffe
kb&i:
e.WE^n ;fefe
: EpfS
-i ;h.:trr:r yrimr:
fell?
..
ilLiJF- P--L-!
-HEE
Eat"
T—tr-'
tt;
fefefefe-si^S
SHSES
•• gflin -
< Sfe.~;; fejfe fe|':':t
-rg—
eje
P Bees
ifefefepSiR
**!
P
BE BEE Bs
i
sfefeEfeEs
fefei
■^1 R
nr:,
spsPfefeip
BsWE;
BE;|iFlBBEi
p r ri;:- fen‘Life -—---• -1
xrrrrrn
i"
” sEn
slEiEiE-i; BiElfe...
■ ferfe
BBiB
EpP^Epp |e
fefefefer
pBPfe -rpjrrF-'
:-J; 1::r rfeferz
rk-fez :;‘:-fe’ BFfF::
nESs
S^sc;;Esf? sEWB
EE
r s 's t: Jf:i;:E feie Be
■^^fefefefep
lg friM
.,.-...EnfekrA-EsE EBs-.
—
Hfe
'
Bs
nEssBfcss
BgEl
'
Sjs
ES
SKH
BBFpg
::7:uE0^f'
fe§
"’“■ Fife'
irfe:;
—
... H^.snn
:ut:-
Ci-r
1 T.
1; Es: k-stsr:.
^vT' ' ' | * e* ■ ■
fflw
st
literfiga
m
;u-'t'r:L;.T.~:liy/i
t4 —*
-*■*?+• t-r- j’’**"* I t “S' B* * *E* j~?
~~ '
ifefefe-B fefe
ferFTT
t;gjss4s|s|g
•Stigtwjg!
ePhifeiW
rn———fer
- " . |--sEiBBfefe
7^fefefe
... . PkEBjiEifefeEpEiBE:: WS
^fefe
3E
’
ssf
rFr^rr:
fe|ti
Olfe B
EB-P
OfefesE fefe
.... feppfe jffl
FOPippEkP
|g:
SB|S|sjjSIfH
^yg-j
p-fe-fe
fefefeji
lEBlFBpPPfe-,
sigpkLuj
-k- fe
•slE; EEslsOSgEEsE
;' jFfeiH
k
■'- BBss sr'E EE
2. B Er’E:'
■iilW nSpfgE :.; jrlrFB iifefe
•r ■ t- ’ • *■' BHferB-ZX
I TV- '■ '■■
lis ~:~’ n~irritl!~i . p.. jw'
t-u-s, | sss+smUur:
ii:
Fife-
-fml:r.::t ■:: rmFirt ji-BFrtil-j-tF-rrrr1-irnrii
ttz; rife r”’l "E_i"' ii'
i t:
•
■ferfe-fe.
wiwiifeifefes
feife
k
;B;iEs;'
ittfeKHife
Pfe
isW tw
; ssBpEEElfWHii,..
__^_ n-EBE:sWwFEsE EEEBB F
stsEEsp
Eis; ''■EE:'-'- k-- BWEEWl
EEEEEE; fe^k “4
sft
fe
BfefeisfekpEB
‘i-Eii W
EEBEE
e
B
m
fc BB
. sfe fel fr sfE: , t...4eBifeEg
fete
. ,
-ufefeh-:pi-FFpF-:b;.S7 .....
„_Q Ss!
^BfeFTj.'i -'tg. ■:
; .;<
Z; i: -'cr J-E.' :■' '.
ilsfe'
5Lu3:
trsjr-.
X-i: n:
P-fe
HSjPrE^^i?
....
""
'
— i---r:r;e*r:: :.-sk; fesp; fen;;;.. mg-n 4p-i B in Fu
;pf iFfefefesfes
w
fe
EkEEFEfe nlni EiBhEnpE-pEEniin |ii EpstpffpHfeng;Ei BsPPfe
fefei
inpkpEEpBpBBsEE E;|B jnipn^n BePBFPfPR n<v
--x ’; ElBfe pBfeS sW
rjir!
Ens
sF; fgr?Je
*■' n1"-,'
; r: ■ ■ i- T&’tttft
LiiHI
fefOnfeBPppinE; -fe; feife Efefefei|felkWi
;;< sBEpB
EifepEB;
; PpiEpssBs-n jlpn
‘ pp
fefefBih Hn b n! jbHBlrih jilt -js 'Htiji e niipfep
Bfeg fejlfe -geg n teife W?i ■
ferj sEHisfS
•fei
fefe
spsE;
:rttr .r.T.r:
;_
BE SiEssE iHEE (EEEtFil E
Lfe
—
i.JJ
ils
IB
fSfefeirfe
ksfenpuufefe:
Sislsiisn •-S
aE&! i'SKrF iEtEppE WiE spin 1
g-gg gSgr ggjB
;ip[ EB-nr Ifefefefefe
fesinfeEEBeF fefeh
eBnOI
lp
■B \r::.’J-r
J- 41 kirtti • •
ss
Ffe
^r.,wiHi&s
nEiEiiEsEEW
nissEEl- -Es- FifeF/
XsFP
s'
_
E~
P
EBFE——B
HfeFfefeB•■-.H: F fefet Ffefe
SSSBFESESEEF
__
eHi
EEsBis xmimi
BnlEiEEssEsi WE
fetfefe.-:': B
fegif- |fe fe;
s-siEkfetr
::
F fr-ri si; fef ifefe'lfe:
-i-E
■iprfeife b
;EB;EEEiiEii
B§te
pfeefeifeHsfe 1sis' o r:fe : s sE; Fehss: spss: I’i-kF-F: ii-FE skE knE BsP
___
JE
iEfe
fefefefefefe WWMW SEBEE BRbIBB iBpEE EEnlsji^
FFi:
sEWWs
En
ltF:
j;F.sj:’
Hi::;:F
F
s:
feBI
;:::
i n Ei in
hF;fe
ePee-BB
..EEFEEnr PEEkn
—
fe
E
l-.Er'tkn
(F
-.UTr'-:
F"'
EL1
:;
i
:
;
'?FT
~
fefe •’” ^• -j- - kfefeRir
BPsEEs
E
•EEsEEk .tinn nEEsES
SESrS, ssBEpEs^
F. Frr
fepB
ksssE
E
j
;
■
,
ssknEE
sHsP; fet B fe
4^
EjksEsFEiSjLii EiO
EEpEnpEiiii
-Sts E'-pip:' SEESEn-Es
. aw fiiffc
tl
■ ;r:;i
- *■*■*■^1 ■» • • • • ► <- •-! • • •**•♦ ■*
• •- b *-♦ —•
•—
h
’EiH:.
. ,
. .-^ ► w, .,
■
tr;;';!n:hrrnr;:; r* r; -;!
nr;; ^Vl;;::!: n iinji
i::: tn::
■
; i ■ ■ ?
• • f • b.
:iLVrr.(nEr|x.:rvn:hr.:;}rr?-'!E n::.:
-t’t-b 1r i
4-
MEASURES OF CENTRAL TENDENCY
4.1
Stbliul ary statistics
The reader has now been introduced to some of the basic
4.1.1
techniques involved in the organisation and presentation of raw
The concept of a frequency distribution has been shown to
data.
be particularly useful.
himself or herself:
But now the reader should pause and ask
what analytical help do these visual
presentations offer us?
and the answer is:
whilst they may well
suggest further analysis, alone, they can tell us little of
either a summary nature (e.g. what is the average age of teachers
in Table 2?) or of the relationships between variables (e.g. how
is the average age of teachers in a particular country changing over
this and the next Section we therefore examine the first of these
issues:
how frequency distributions may be usefully summarised.
In Section 6 we shall examine how we may quantify ’’correlation” and
’’regression” relationships between two variables.
4.2
The arithmetic mean
4.2.1 An average is a measure of the central tendency of a set
of data.
We wish to define a statistic giving, in one simple
number, a measure of the central tendency of a whole set of
numbers.
The ‘’average” of everyday speech is known by the
statistician as the arithmetic mean.
In the notation introduced
above, the arithmetic mean is defined as:
x.
i
(9)
that is, it is the summation of the n observed values x
variable x, divided by the number of values, n
written x, pronounced "x bar
i
of the
It is frequently
calculation of the arithmetic mean from grouped data
Example 7:
Where the data have been grouped in a frequency distribution,
such as is shown in Table 6, modifications are necessary to
formula (9).
Table 6:
Frequency distribution of ages of ^0 school children
¥
Age
Frequency
8
6
8 <10
10 < 12
12 < 14
14 < 16
30
Total
90
23
22
12
3
If the reader tried to apply formula (9) to this data, he or
she would immediately wonder:
of X (age) to use?
what are the appropriate values
To solve this problem, a numerical value.
representative of. each interval must be chosen.
The assumption
is made that observed ages in each interval are evenly spread
throughout the interval.
So
(in the absence of information to
the contrary), we take the class mid-point as the representative
value.
Hence Table 6 is re-written as follows:
Table 7
frequency
f.
eless midpoints
x.
J
f ,x.
30
23
22
12
7
9
11
13
15
210
207
242
156
J
6 < 8
8 <^10
10 < 12
12 <14
14 < 16
3
k
J J
us
k
f .x.=?6o
j=l 3 3
The arithmetic mean, x, is now calculated from the following
formula:
k
3=1
k
3=1
f .x.
3 3
(10)
f.
J
4
where:
k
>1
f. = n
J
k
3 = 1,
number of classes
Substituting in formula (10) the sums from Table 7:
x
860
90
9.6 years
4.2.2
In the event that one, or both, of the lowest and highest
class intervals are open-ended, (see Examples 2 and 3 and para.
3*1*2 above for examples of open-ended class intervals) the
statistician must make assumptions (in the lack of further
information) about the respective lower and upper class limits.
Having made these explicit assumptions, the necessary class
midpoints may then be calculated.
4.2.3
The arithmetic mean is one, very important, measure of
central tendency.
Its usefulness is underlined by its following
properties:
it has a clear intuitive meaning
it is easy to calculate
it uses all the observations
it is widely used in comparing frequency distributions*
w
central tendencies.
The reader will have come across
many such examples.
Average educational attainment
scores. average ages, average class size, average
spent in school, are but four examples of
time
comparative summary statistics used both to describe
situations and to investigate possible relationships.
%
Other measures of central tendency are, however, also available for
They include the median and the mode.
particular purposes
EXERCISE 5
The reader should now do Exercise 5
4.3
4.3.1
*
The median
The median is the middle value of a set of values of a
variable where the values are placed in order of magnitude.
Thus,
for example, if eleven individuals were ordered by their age:
2, 3, 7, 8, 9, 12, 14, 15, 16, 19, 20
the median age is 12.
For there are 5 younger and 5 older.
example refers to an odd number of values.
If there were an even
number of values, say:
6, 8, 14, 17
the median is calculated by taking the arithmetic mean of the
middle two values.
In this example, the median equals:
8 > 14
2
11 years
This
4-3.2
When dealing with frequency distributions, the median
value is that value above and below which lie 50% of all the
values•
Referring back to Figure 3, we can see that one way of
finding the median of a frequency distribution is to use the
percentage ogive.
A horizontal line has been drawn from the
4*
50% point on the cumulative percentage frequency axis.
A-
perpendicular drawn from the point at which the line cuts the
ogive shows the median age to be 26.8 years.
4-3-3
The advantage of the median as an averaging measure is that
it overcomes a problem which the arithmetic mean cannot avoid.
Relatively very large or very small figures in a set of values
can ’’distort” an arithmetic mean from being a ”representat ive”
statistic.
4. L) The mode
4.Z/.1
The final measure of central tendency examined here is the
mode.
It is, simply, the most frequently occurring value in a
set of values.
Thus, for example, the mode in a series such as
the following:
2,4,12,4,17,7,4
is 4, since it occurs most frequently.
In graphical frequency
distributions the mode is easily recognisable.
Referring
to Figure 4, the modes are the values of the variable at which
the frequency curves are at their peaks.
By definition, a set
of data may have only one mode.
frequency distributions
fryare loosely referred to as bimodal and
multi-modal respectively.
peaks;
In the former there are two distinct
in the latter, several.
Ji.
4.£.2
The mode is less widely used than the arithmetic mean
It shar.s the median’s advantage of not bein.7
or median.
influenced by very high j>r low values.
But it is not very
useful for further mathematical treatment;
it has no clear
mathematical formula to define it.
EXERCISE 6
The reader should now
&
OP
S’* I
LI
do
£>U
Exercise 6, on page
7^
■
v&hgz,
The measures of central tendency discussed, particularly
the arithmetic mean, are single numbers which summarise one
important property of a frequency distribution.
Another property
of a set of data is its spread or disoersion.
Figure 5 shows the
importance of defining a measure of dispersion in order to
distinguish and compare frequency distributions.
The upper of
the two diagrams shows two frequency distributions having the
same arithmetic mean x, but nevertheless having very different
They have markedly differing dispersions, denoted
appearances.
here by the symbols s
1
and s_.
The lower of the two diagrams
shows, conversely, tworfrequency distributions having different
means, X1 and x2;
but they have an identical dispersion, s.
5^-2 How can a measure of the dispersion, s, of a set of data
be developed?
the range.
Perhaps the most obvious measure of dispersion is
The range is defined as the difference between the
largest and smallest figures of a set of data.
But, whilst
useful for some purposes, the range is determined by only these
two, extreme, values.
in between.
It is not affected by any of-values lying
A good measure of dispersion would take account of
1
Figure 5
-r-r-p—-i
' "T”. p'
. ;..L Ji I
!
J_;. p- -
:
r- i
•; ' •'
; - :. h
i
.... r
! •I
’ V
Diagram showing two symmetrical frequency
distributions having an identical mean x
but two different dispersions,
i
i
and
i
i
X
Diagram showing two symmetrical frequency
distributions having two different means
and xo but the same dispersion s.
0
I
I
i
X1
I
X2
31.
all the observed values of the variable.
5^2- The standard deviation
The most commonly used measure of dispersion of a set of
data is the standard deviation.
s=
/T
V—
X.
1
The formula for this is:
-I)2
(11)
n
It is usually referred to by the letter "s”.
Why, the reader may well ask, this unpleasant-looking
formula?
Let us say it in words:
it is the square root of the
sum of the squared deviations of all the values from their
arithmetic mean, averaged by dividing by the number of values. n.
The fundamental idea, therefore, is to measure dispersion
by measuring the deviations of all the values of the variable
from their average value.
The wider the spread, the greater would
we wish our summary measure of dispersion to be.
But if the actual.
signed, deviations from the mean were simply summed, the total
would be found to be:
arithmetic mean.
zero
This is simply a property of the
Take, for example, the following 5 values
(numbers- of new entrants, in thousands, to the primary level of
education in Togo 1970-1974):
45-,53,56,53,56
The mean of these numbers is 52.6.
If the deviations from these
numbers are listed, they are:
(45-52.6), (53-52.6), (56-52.6), (53-52.6), (56-52.6)
which is:
-7.6, 0.4, 3.4, 0.4, 3.4
Adding the deviations. we see that they sum to zero.
The sum of
33.
the (signed) deviations from the arithmetic menu always equals
zero.
5.2.4
If we simply ignored the sign of the deviations,w
could consider the sum of the absolute deviations:
(7.6 ♦ 0.4 + 3.4 * 0.4 + 3.4)
15.2
and use this, divided by n=5, as our measure of dispersion:
15.2
5
= 3.04 thousands
This is known as the mean absolute deviation of a set of data. Where
lxi - ’I
(all
the mean absolute deviation is:
i.
I
I...
(12)
lall
i»l
71
Note that the Habsolute value" of any number a is written
|a|
It means, in effect, that we always regard it as positive;
any
.
negative sign is ignored.
5.3
The variance
L
5.3.1
This measure has mathematicsJdrawbacks, however, which arise
mainly from our having, somewhat arbitrarily, ignored the signs of the
deviations.
If instead we eliminate the negative signs by summing
the squared deviations and averaging them, we obtain a measure of
dispersion called the variance:
n
(di)2
*
i=l
n
n
(x .-x)
1
i=l
n
2
(13)
Example g:
calculation of the variance
Let us calculate the variance of the figures given above,
referring to new primary school entrants^in Togo, 1970-1974.
We set up a working table as below:
Table
9
$
Xi
(x.-x)
(x^—x)
45
53
56
53
56
-7.6
*0.4
+3.4
+0.4
+3.4
57.76
0.16
11.56
0.16
11.56
i
2(x.~x)2=81.20
Z x;=263
x=52.6
The variance is, from (13^ above:
81.20
5
« 16.24 (thousand pupils)
2
13115 measure of dispersion (compare it with the range
(11 thousand) and the mean deviation (3.04 thousand)^ is,
however, measured in the rather strange units of (thousands of
pupils):
a peculiar and hard-to-comprehend measure!
In order
to reduce the variance to the original unit of measurement —
thousands of pupils - we therefore take the square root and
obtain:
«
(11)
1
i=l
n
which is our definition of the standard deviation as presented
above.
Using the result from Example 8 above:
s = V 16.24
s =
4.03 thousand pupils
Ve have, to summarise, four measures of the dispersion
of the set of 5 figures presented:
r~
range
11 thousand
mean
deviation
3.04 thousand
variance
16.24 (thousand)
standard
deviation
4.03 thousand
x2
/
I'
3.L) Of these alternative measures of dispersion, the standard
r
deviation is, for the majority of purposes, the most useful.
It has the following important properties:
it is straightforward to calculate
D
it uses all the observations
-
it measures dispersion from the arithmetic mean.
This gives a smaller measure than from any other
average, such as the median
in more advanced analysis of frequency distributions.
the standard deviation plays an important mathematical
role
-
L_
it allows comparisons of dispersion to.be made between
different frequency distributions (or within a
distribution as it changes over time).
5.^./
The last point in the list above raises the question
of the comparison of dispersions.
Say, for example, the
earnings received by male and female teachers were being^ compared.
The dispersion of male and female earnings could be an issue, of
some interest to various parties.
It would, however, be
potentially misleading simply to compare the standard deviation
of male teachers' earnings, sffl, with that of female teachers’
earnings, s^.
The difficulty arises because absolute dispersions
would be under comparison^
the mean salary of male teachers
Cx ) could be considerably higher than that of female teachers
m
(x^) -
If, for example,
and
s^
8f
the equality of the absolute measures of dispersion would be
misleading.
What is needed is a measure of relative dispersion.
This
is provided by the coefficient of variations, V, which is defined
as the standard deviation divided by the mean:
V
s
3
(14)
—■
X
Thus, in the case under discussion:
s
m
Vm
xm
and
vf
sf
xf
but
xa
and
xf
77
8
m
sf
vf
V
m
which is a more
comparison.
Generally, the
coefficient of variation is expressed in percentage terms.
Example 9:
calculation of the coefficient of variation
In a hypothetical country, male teachers earn, on average.
95.00 per week, with a variance of $100.00.
females are $75.00 and $64.00 respectively.
dispersion of male and female earnings.
s
m
=
$10.00
®f
=
$8.00
s
The figures for
Let us compare the
In absolute terms:
Sf
m
However, in relative terms, using the coefficient of variation,
V:
6
V
m
m
x 100%
x
m
10
X
25
100%
= 10.5 X
sf
vf
x 100%
Xf
x 100 %
8_
75
*
10.7%
vf
V
m
5^5 Calculation of standard deviation of grouped data
S-O. I
Where data are in grouped form, adjustment to formula
As before in calculating
(11) is necessary in calculating s«
the arithmetic mean from grouped data, the class mid-points must
be found.
Denoting class mid-points by x. and class frequencies
J
by f^,7the formula for s now becomes:
k
U.-7>
2
3
X
j=l
(15)
k
f.
3
j=l
it can be shown that this equals:
f k
X '3*3
d4)
j=l
3=1
n
n
Finally, if
dj
X
- A)
(x
J
where d. are the deviations of x. from 2A arbitrary constant
a
3
3
A, it can be shown that (1^) may be written:
k
z
2
' k
r
2
f .d.
3=1 3 3
n
f .d.
J J
3=1
(17)
n
This last ’’short-cut” formula (17) should be used in
calculations^J'^lx saves a considerable amount of time.
(SEE EXAMPLE ON THE FOLLOWING PAGE)
EXERCISE 7
The reader should now
JLo
Exercise 7 on page 7 6-
&
Example 10:
calculation of s with grouped data
In Table 9 below, years of service of a sample of teachers in
Afghanistan in 1973 are given.
Column (1) groups the years
senitce,
column (2) shows the class frequencies*
Columns (3) - (6) have
been calculated from these original data.
Class mid-points,
have been calculated (column (3) ).
An arbitrary constant,
A = 15, has then been chosen to ease calculation, and column (4)
d. = (x4 - A),
Next, columns (5) and (6) have
w
3
been calculated. Column (5) is (column (2) x column (4) )•
calculated:
I
I
Column (6) is (column (5) x column (4) )•
Finally the necessary
’-**■--* ’ The standard deviation is found to
sums have been calculateo^
I
be:
s » 6.45 years of service.
Table 9:
Years of service of a sample of teachers, Afghanistan
1973
(1)
'Yea.rs
(2)
frequency
X
f.
J
x.
3
d.
3
0 < 6
6 <12
12 <18
18 <24
24 <-30
30 <40
10990
6828
1402
880
668
185
3
9
15
21
27
35
-12
-6
0
6
12
20
Totals
20953
(3)
class mid-points
(4)
(x -A)
(5)
(6)
f .d.
3 3
f.d2
3 3
-131880
-40968
0
5280
8016
3700
1582560
245808
0
31680
96192
74000
155852
2030240
J
^f .d.
3 3
Hence, from (17):
s = /2030240
7 20953
s = 6.45 years of service.
1
Z-155852\2
20953 7
1^0.
6-
RELATIONSHIPS BETWEEN VARIABLES
Ci c
6.1 firjtxoCCA.
6.1.1
Up to now, we have considered techniques for the description
and sununarisation of distributions of a single variable.
As
indicated earlier however, the practical statistician will also
seek to analyse relationships between variables.
In considering
how variables relate to one another, it is valuable to make an
early distinction between relationships involving association and
those involving causation.
In ££-Z.^below we shall consider /A/ee-
important statistics (1) which measure association:
and
correlation.
covariance}
These measures are designed to quantify
how two variables vary, that is change, together (if at all).
No
assumptions will be made or inferred about any causation involved
between two variables.
6.1.2
However, in circumstances in which causal relationships are
hypothesised^the very important statistical technique of
regression
analysis is available for quantifying such relationships.
The methods of
6.2
6.2.1
regression are discussed in 6/7*-
.
The importance of the theory of probability
It was pointed out in the Introduction that there exists a
body of statistical theory of a relatively formal nature founded in
*
the mathematical theory of probability.
to be discussed here.
That theory is too advanced
Nevertheless, it should be pointed out at
this stage that further development than is possible here of
statistical association and regression analysis would be founded
in an explicit probability framework.
Such a framework enables
the statistician formally to incorporate the unavoidable existence
(1)
See para. 6.2.2. below
hl
of errors from various sources into his analyses.
error^include
human
and data-processing.
Sources of
made in the process of actual^cofleetion
Unavoidable errors also occur in sampling.
Much of the formal theory of probability has been developed in
order to enable the statistician to quantify the probable size of
errors resulting from random (1) sampling from populations.
By
sampling we mean collecting data on only a part of a population,
population” not necessarily referring to people, but to any
totality under investigation
r
6.2.2
The statistician will take a sample or samples from a
population, perhaps in the course of a survey, in order to draw
probability inferences from his samples about unknown population
quantities in which he or she is interested (such as means, variances,
etc.).
The unknown characteristics of a population are referred to
as the population parameters.
The known corresponding sample
quantities are called sample statistics (or, briefly, statistics).
FI
Such sample statistics are used in making probability estimates of
the unknown population parameters.
If the sample is taken in a
random fashion, the statistician will be able to make estimates of
L
the population parameters which incorporate quantified statements
about the probability of their accuracy.
6.2.3
Of course, the statistician danot Tzly o/\. random sample
surveys on/y .jv-ir
4aNevertheless, in assessing the
relationships between variables (as we discuss below), he or she
may well make explicit assumptions about the probabilistic behaviour
of the errors in the
(1)
A.
which are to be
By doing so,
A (simple) random sample is one in which each member of the
sample has an equal, non-zero chance of selection.
)
it is possible to improve on simply making point estimates (i.e.,
estimates composed of one, particular figure) of parameters, wk<ck g/ve
no indication of the estimated likely accuracy.
The statistician.
utilising formal probability theory together with certain explicit
assumptions about the probability distributions of the errors
involved, can make interval estimates*
That is, it is possible to
make quantified statements about the probability that the population
parameter under investigation lies in a specified range around
the known sample statistic.
6.2.4
In the remainder of Section 6, we cannot treat the ana lysis
of correlation and regression within an explicit framework of
probability theory.
however, would.
A more advanced treatment of these topics,
The reader should nevertheless bear in mind that
the statistics developed below, such as the coefficient of linear
correlation and the coefficients of linear regression, when applied
to actual data
point estimates of the unknown population
parameters (1).
In our introductory treatment which follows, our
estimates appear exact.
In a more advanced statistical treatment,
they would have attached to them quantified statements of probability
about their accuracy.
6.3
6.3.1
Association between variables
What is meant by a statistical ’’association” between two
variables?
The underlying idea is very simple.
Assume that we have
n pairs of values of two variables, x and y:
(x^).
(1)
(x2,y2),.4x.,yi)
,(x ,
’
n
Unless, of course, the data are from a complete enumeration
(or census) of a population.
(Even then, human error can creep
in, though sampling error is absent).
As discussed in the Chapter on Basic Mathematics, Section 6 on
graphs, these pairs may be plotted on a graph.
The pairs of
values may be obtained by observation over a period of time (time
or-
series
hey may be observed at
point in time (cross-
sectional analysis).
6.3.2
For example, in a time series analysis of the association
between two variables, at each time period t (t = 1, ..., n, where
there are n time-periods), xt and y might be observed, and a
V
pair of values (x. ,y. ) obtained.
t t
These n pairs could then be
plotted on a graph in a so-called scatter diagram.
A specific
example could be to observe, over a period of time, n pairs of
values of the crude birth rate (x) and income per head (y), and
L
plot them on a scatter diagram.
6.3.3
In a cross-sectional analysis, n pairs of values are observed
of the two variables at one point in time.
I
For example, the crude
birth rate (x) and income per head (y) could be observed in n
different regions of a country.
n pairs of values would again
result, and could be plotted on a scatter diagram.
6.3.4
Figure 6 presents the 5 general patterns which may result
from plotting pairs of observations (x^,y^.).
should be noted:
scatter 1
The following points
9
shows a positive association;
as
x increases, so (generally) does y
scatter 2
shows a negative association; as
x increases, y (generally) decreases
scatter 3
shows a perfect positive linear
association
_______
ifeT.-rRjzSEzj
id i 3
IRidate i ■! RRtetetete'
117
iWnr'm
^ d3sd; BlEbBit-E Ei fc
:7-7n: r
g#
4 feWtete# WfeteRfeHte
fewtetw
4e -KOpS
WdLLiS stefeRtete
iLLdtebfeted#
_______
i
dfeifeiW
fe7;7teteaK.-#tes
gSjdLgggg:
glfgtrg^lilBLLrg
________
.
____ teteteteRddfeOte
- hfedfeter S H~ • l-StHfr:; jrH
feRfetefeaks^testetetetegl# BpOfeL
ggteptedtej#Efe##Sd^d:;grwr4-gtg jU4l3
feswijtetegEtedtedtegte^tedliSSSO
III
Lnistj
tetetes# a# j jRteilatee fei##tet
LgS-tL
" teteigRR 3d~s
LEZZ
n rH ItH SB
gtiii
- Rtewte
-—dirts
t
'Ertrf
7#r7pz
t- '•!• H. rTrlr nil
zrrr
IRtetete#fe xg^EEfefe
fww
” -tes
3-S-:
Q^EEEisMis^
tetedpS
OteZJ - ..' THiS#n~3^SrdEd.-ftetej. >*?'>. r ptei
LdtefteSfetetefe
Sfe ggrSS?: J hSsjiriSiarta
p.ii;Sup
.
...........
jsawte-rzr -p.—arww zzz:: -. :teU wzzd-R
r-^- ~ i -! - r-<-T- r- J~44?|'1'^''*‘Zi 7^ —
■ ;3#4te Egtexz::
... fetetebbtes dfenT=fetefete fete
WteW^p.
tetefeWife##
L, ,.-d E; dOtewRRdSSsElhH
fetetetetetedEtetesp.W'*#fe
S#ggsStei#
W&S;r
teste#
■• i ■ •' f' J' <-rr •-*-^
:'>~*'r^‘[ *7
■ktSiSfe-
.
WaWtete
r-3-# -d7*-- ;?■* r;: rBp
i
fflJ-J - --• ■■ '
fe tefe^fegfete i g
SiW
teSst^dte
#s##~#SaItefete
''
Egths
zw ffixB-'r4'-ISdltet:tdbltrr ;
.
L. .■BpSpPPfet Eppdjpk:pPOOw PPg
Tizr^S yferg’ SL#
Sc
SO
r~" ~'
ZsteStetSrSwterF L: J: ::te
ttn W
1 Ojwi
ZlS.jl-.rS F
tetetetefeteW
t
„, BSzSSr-rT r.Stefete
teteteLaitefehte^dsfetetedfegfe
sj#r.u
4r4TSi.fe#p#F#
tefe 7gpr» '7-'.
gteteg-RzgrtteHd
tetefestete##?
;333WB3
n-: ^TrWirzrTTFFrrzqxiii^f ^pipxxq-Trt:
pte-.,. j wd-#-■ wd^dF'
SadteO Ot
SdOdtefete
teRfe'<^gfete Ewtetefebte
Stefe
1 ... dw
tew
iBh#
teWfefefesgte
teteRted
gfebteSgteRtefeas
w
L
#•-##
- zztetefe
,i-| 4-r4frd U-teTy fed- r~! •—tfe;—.!]-^'.p.!.fefe..felU■WL-S&fe7jt—
^*#7- :£
•i-i-LU-Tir
rrE #.7
SEEdd XT4:S T#7
-=4~
±77.r~:
xH .dzd Edited
a:::.
*n
S
o
rddte-b-^rd-.-teHE~-~: rrtetejd- ziite'
WfeteTtefefe;
o
ibfeteteifete
■rr#rpSF#r=tr
tgfefe'gj.
. , fefewte
Spf
tetefetete#teddbMbZdfetefete#W* ■tgL
BL
teSESrjEHH
SXte___
tepgfeRtebbkH7
. Tr^nSTd##### £&7hS
dESfefefe^^wtefeafewteE
teWbRdH^ife
r-r-"- 4 n'l m Him’nMH1
j | 1 i; 1 1 1 f : ■ ;'
#1#
I
■
3 tetegfetetea
.r—rgte:; -jOtgO: ur.:
trzjS
tfBtm.
3Bdte3~.A3^S ■ terd,1:,1- :,J
dteteigst
fefe^|te_____ te1feteggg#rtejfeW
#esesE##
S#
■as
*g~fs
ss^EsrssteEStefete teteteS
.fete
pEfete
SteRstei? ogtefegSiBfesrSdgggSr
L.-
ggStete
s
• i-iJ i-l^-1
fejd -ddStj’l-^ :1 Sfetefete
■ tejteteisJ-te-Mg'te I,.,.,..
M ' 1 f 1 r r ■ p^HgW^n'
teteRfetetefefe^
# 7. r rti s#-
WfteteiW
,■ irl
tefeaStetete
.fetetetefefe-Wfete P~
""'■'tedO fefetefesfeteteteafetetetefete
3—
gdgEE’
gL-ZZ
tetesg
___ ^^bbkb#tetetefeEfete?sb
tetetebfegfe#'fetefe jgpteg&OlB tegbtetitebtefe fegfesfetei
WB
feted rtete# teterdS
■ttefebfetebteEfe fedgdteibtdte^fe
ssgsgsgiH Sr^feSfe s
tedrzr:w # - tetep
tetefefe
““
_■ ICT. L-S st
ffiii li""!
#g#fe
KHt Hyr1
rrr.’ fcz
3
pHnS
3te 331 SSh
gt!EtE3
rwd
•» I jj j-
g
g##§?5E#agS
||ibfeEteg!^Ri£|teg^w.w g#wte#####g
i|iEWaste#
•tzg Hgw7
r.rrrSBS^
| 'tetkgSg# =#teteW#si#g|g
.
j p~E
xQd:
ssaiEg
~~te|#tegte#
Ew##wteite#te
te^ WWtefci-W;
B
siMisggt^O tewteB##i£
__________
__
.
,
L^-dd
7?
3^3
^tefete' ST^SST:
feR##^ pteteWteltefeRte
:#33#g
Wafefefetedtetetete
W####WteBR
WwtegSte# fe#
SfeteM
teteteR gte| gfeteR dw#dwdtetefefe-ij
few tgsttega Sfefe xHtizirr
SteSteg
i#g|igggj^i
j IjWfetetelifetesr 'I l"i S ?*' 1 *! i'. T
fe^tetete Tt trnSt S-H+hn^- itetetete £teS teW#
^tti^LfeteLdte#
OteBteW
4X rr
^|xr-S=4xxh.-t
r-iirp
^-—r
►—LI
. - -..
h i
lidBtei! teter#:■ iHflHteEEK:
;:: ■ L terfer"’
tt•7'|r-tHtHH
h
Tteteates
; tj piSr 'tnrfes
IE#
swgiw
trc7
teteWtefeste
r:
te|d#gRfe
M
-w
##Ww O
Wig;##
7TT33#^.. “L
iSfej#
; tfer#
St:
rSS u#xrr[t:
^fefetestejste-
SrtefeOfeW
?:d#tesfe—W dw Mte
teRfeRRtetetefeF
...,T.teh#Ri#S&tefe
si:;] ttHBWBdteitedd##dRfew
s##te#
~~teWfete :ixSnJSuT
i.ur—:; '7 7~-‘pt--------- r—.................... ........
scatter 4
shows a perfect negative linear
association
scatter 5
shows no, or very little, linear
association
6.3.5
The underlying meaning of association is simply whether
x and y vary around their respective means together.
If, where x^
is greater or less than x, then the other member of the pair y^
also tends to be respectively greater or less than y, we define a
positive association.
For example, in scatter 1, where y*
exceeds y, corresponding values of x^ also generally exceed x.
Similarly when y^ is less than y, x^ is generally less than x.
We have a positive association.
Scatter 2 shows a negative
association, because when x. is above (or below) its mean, y^
i
tends to be below (or above) its mean.
Scattes 3 and 4 show
perfect positive and negative linear associations respectively,
for given changes in x are always associated with a constant change
i. •
in y (and vice-versa).
L
Scatter 5 shows no, or very little, evidence
of any linear association at all between x and y.
6.4
L_
Covariance
6.4.1
How may we measure this fundamental idea of strength of
association?
The simplest useful statistic is the covariance.
This is defined as follows:
n
(\-x)(yi-y)
( S3 )
i=l
n
*
6.4.2
Formula ( If ) tells us to multiplyand then average
the deviations of the x and y values from their respective
means.
Consider Figure 7.
If we apply formula ( /£ ), we will
'O
swaBaaff;
miWSBW
i;E- ••hi: gr
iu: >1H
W iiii
Li
TL Ll
i
M
IBiil^
WtlEsB
iph
1ft
Bl
pfeftSi
MBi
■«|.■|IB&
u lit
I
1
B ®!Mg fe ll felfe ifti |fe 11 g ft1 ft J ft ft ilillililiww
fen
ft
I
I ihffi
Bpfffe
f#ilU
M Ofc I;
Btetosi
1 fes felipipi
ifiiilii
ftiliii
HIS
1
HIBHIS!
liiiiiM «B«S
»BiiB 0
0zfflB 14
piSSil
W
■■'BE'
JI
BBSS"ESI
•aih 11 «fili fellg® wijfc
W
1
B
r
....
n
.
®BHEp5«|WIip
iiife
8i
111!
:
....
g|l
Matt
iiBiiL
WHO
H11WIPL,
ft ft ft® 1 MjSHl111 iSI
1 Fn B f
1
rt fefePl
■
Pl
ill
iii
r
i !th'Pl
!! Pl
li 1 Fl
Mp- Mfenw
iih h nMphMMr
PE
p-‘ p
l-i
-"
Kj Sr p
K ip pglBgl
RSwl
ffimP
mm
msi
TRF K*i
BI
ffi MS
..hi
:it
EH
^LipiL. Bl Ki
PP
-EH
• iffi Hi EE-ii
•
tp Pj; B y| ipiipEHE
;•! itr p f
PlSll
- J*--
hw
gEtLh'ili_
LHii
:■ 11 ...
Rt! L.H hr Ik MpPE
RH
iiiipfeTH
li-U
:• ir
t
KU
!! ! ru
mm
,■ 1
pm LiP
• I •
BE
HH
in! fe M !
il!fe‘u
Rph
?-r4“
t-rri-tp
.:! • •
i J il
r "
Pr
Bl IKfIB
p IrH
! H-i’ tri i- i-t-f-i
J L
--- 't'lf*--------
PRii gi&fenl.. .J.,. .......... _H..
i-*----
.11
;il+ iRplTPImn
.,:.PP
Si
J*!sSs
WP' SB
ft Sift!
HlrHilllS • Ill
ws®
..fe
Ip
3 tfB h Kii s-PM
ip Lil ImP pw
f
ijjjj IB Hip
:*.a
.; t
1
ippllB
iiii!
1 T. ft ft fti ■h■ hi
■•■uLHH tp: ::ti •• -1, •
feKS KiK
"t+ m - •^'FPhH-l-ri-bH-
Hl ItB Ht pt PI -H-u am. ::n _
ggjBgiK Hll'y i QS itJ
4xu ut i n-j;; hnixii
Kl j,?p1 jp
iip
; t * t- ; r r • : r: t r: • ; •
Mil
w:
KM
11 »
OWirll
■R p
fpfi-p p! ptflTH
ii ; p Hi
L :i~L iTh tih-
eifppi
flii: -il
1
HR ^rt
ipn Epp h 1
PPMpppm
."•n|!ir:
Hi:
Ut ifti ijil
44 m
rhi
qt: iziimtn mipijKtjp
.,p .Xi
-Uhi mu
irh -_;r: Em rt H-h-r:1 ‘rri ‘-rij
ppp«ifiSpmE-rn dRfrrd R.H Sf
Wlpp
■ i.h ILL
!-■
4
Figure
F igure 7
...........
g if 01
■ ill.
ilrmi
Kl t
it Hl.! j:'.;
R;
LU X-d
’
I hit •_;..
W;.11;
Li. 1 mrt
thlpft
_L!L
i-L-i■tijt'ty
sB a■IH
liii ftfe’
w
LkIJ'
HE HE
IllflSBIHifh
ap..h p jh
1
iiriikh
nPt
p IJ;; pl Hi.
di H1 Pii
hi! iip III 1Bl PP; ® R i ’[Liiih
, .tnll
1 Up
.1
. .piKp
1 HU
BP
IB hfep^
P<ml
tt ng ih -RH
:4.p;tT5:
pp
STtfit
Ri:
, ,,;,'T-
,uri.j h-’i iR:
LH
:.’!
hR
KF IB
iEL
Pirn Pi
Rt{
t
Bi :m: ii:
it >
P ill
01 PtP
3 H
iffl
•th Hi!
mil Eg EE
lit!
iP
f Pl
-*1- p
h i! TH
i:H
.... •‘•it Hl
shT Hl p P
;,.pg .iK
.it:
ph
1 i:.: ijl7
K&rhp tt:T
1
iii
lai
fl
p p ‘ sMil
1 tii'm p
IP
111
ft
lip
p p ‘K
i»W
ii
‘
:
Ml
HP
1
.idl
pit
Bl 1
ini
IfsSi
E
ER
1
iBIiifiBSB®
fHH
HT llHp
i! t;l M
hH
iKjftp 111 J
'EBBSE1
’iH K:
ihife
PM
p
t-LLiipt. ttibij! m
W
eb
a
M,.
u
.
L e
Ki-ppp
K
Hife
1
Pt
L
I
r’i 1 .
;fesfe
p
iK
p
EliEiEP
SpoM
•:
Pl
;hf TP
m nidi'll
p IL
tRl .l-q EH Lp
;4:.lj
Wffl'figlTtrn^n -H: fpl
PPM :- SL iiH HSPill
_ Ptr
t .1 ‘ip
•p P.
w
St
pm
h?p
;
p
mm:
1 !HI 1 11
h Li
P {I'
p HR
EHHJiii
P
MpP
Pbi
WBH'
pp
fM
is
1
i::'!;.;
rp;th
!L
4B
■ill.,
r
;
1'1
ifiB
mm.
s
E
m
H
i
i
1 w pF
p. _h__ Bdp -KKM
fefp
PkH
71;
RR
id:
’
EHag
EffiEggtiM ■feL.
liftia 1
|- -IK.
1 .ppgfell 81!ipi
pip- 1
MM: fipz
:'!feE
fciB
Ilia
■
iK.
1
iiri th Li
.P
pfibili
g
Mg
PR
lip
rggUpg
u:
Hit
t
p
:.
HR
I.
PI'
1
>&
TFM-ra-V
p
SiiWPl
nftte :Sp
BldtiM 11 fell
K:
mit-m. ip mm
LK
itP
.g
;P#
hl
Silin JI Hr bp.
p-iftiliBo
H
p
Bi
P: HW
1 BE
: :HH RH B; l -RK Hip:
;lpp
Ki fsiHPP P- Ill
....
Pit
11
o
p:
KUi'
PPPRm
RK
i£|; trrj iSL :trt itt:
KfMMB; Br
:1ml
PwmLPi
FHF
PL K* sp llPai
IP
ip
11 TH7 ~g.
41^ Pl ip
M
?t
Bp
: i it
&
■Pl-P mgs ip: pi
I
pp
p
ip
gz
pK
ii
life;
IB
11
p liji li.H
Pih
MP
Hhfp
S
f
11
pip
hl
HP Pi p liPhil'i
^ipp
ii+: St p IP tlfePIE Ph. ife B; TfeH-thH? 1 i'
-P 1
Mg
Ipp
lllip ■■KlKip
E
P; giO Bp
T TT 1. pH:[
K
IP
__
PIKBP- MlWPip
ipgP
B
P
®i
IB
i Hl liR. P:. : ?r
k ^PP
£
UH .P
■
:fa.|.
p:
H
a H
P
lit
pfelli g
p p:
'ddfe
I
1 k.i.i•p nUi iu
1K
Plil
•ii:.
pt t th
oil®,-
■
t:f;
Pp_j
zlELth. -L-l
• f •• !•■
ihr
f
IftHB
it.:
.. .ftti:
■ • • ‘ trp*
711 p Pf
;.H 1 : !.;.-: L_'
•SB
asBiojgfife
I
.... II.
q:-.
t th
h. rp-
'it;
r;:l ft:! :.ll-
tii:
p
iM
*• i-llT
X.p
LLLii
I
ilp .:..
tziix'.;.:.:
~rt
i
l
Lil hit
r.-t ph
t:
tit:
1:-.
j ..■ 11
j;
.ri-.±
.‘th
Pt
.... .,i. P
.t>. ‘ !. ..'H hr,
tT“
TT
•ttr.-p h;
tip
t
4. LX XL X_lv
pp
ziij*
Fr t-tl H ?-,rFri• t I: r‘ -? ♦-►
1:1
TttT
i:i.
- p 1m7
w
1 pt p- ' ;
llPHhlpl
:?! Hph: .-.‘I
I
th
th
PI ■
pH -Fp
EfSja
amw
«iai
Eh
M-i h r.
;*b n;.:
■
Jit
rtt 7'-- *
:-i::
•th! i
• II; TT.. TTTt TttT ‘-hi Hit
••“ ~-H”M mhp: pdL!;i. 42:2 irrf: lit
•■"• ‘?tl Err-
:.h: :
U-
® 1
h
h
pip
-P
'__ 1
,_J
iH
h
I
ME
t:
HR
4^
get positive products in sectors 2 and 4, and negative products
in sectors 1 and 3.
If these answers are susmed and averaged
according to formula (
):
- when all or m^st of the points are in sectors 2 and
4, as in scatter 1 of Figure 6, a high positive sub
is obtained
£
- when all or most of the points are in sectors 1 and
i
3, a high negative sum is obtained
- when some of the points are in 1 and 3, and some in
2 and 4, a small positive or negative sum results.
since the positive and negative items will tend
to ^cancel out.
If we were to calculate only the covariance, a problem would
6<4.3
be that the size of^absolute number which would result would
depend both on the number of observations, n, and the specific
location of the data with respect to the means*
r
L
i
Ideally, we would
like to transform the absolute answer into a relative foim in order
to allow comparisons of linear association in differing situations*
The coefficient of linear correlation
r
6.5
L.
6.5.1
|'
by the standard deviation of the x and the standard deviation of
It can be shown that if we divide the covariance of x and y
the y distributions; and finally average the result, we obtain the
coefficient of linear correlation (1), r:
n
r
=
__
_
(xi-x)(yi-y)
i=l
(
(xi-x)2(yi-y)2
(1)
i
Sometimes called the Pearsonian^or product-moment, coefficient
of linear correlation.
tig.
This expression has the extremely important property that:
r < +1
-1
r is a dimensionless number, i.e., it is a pure number with no
associated units-
6-5.2
In terms of the scatter diagrams presented in Figure 6:
scatter 1
indicates positive linear correlation;
0 <r < 1
scatter 2
indicates negative linear correlatioo?
-K r <0
scatter 3
indicates perfect positive linear correlation;
r = ’f’l
scatter 4
indicates perfect negative linear correlation;
r =*1
scatter 5
indicates zero (or approximately zero)
linear correlation;
6.5*3
It can be shown that, for more rapid computational purposes.
) reduces to:
formula (
n
n
xiy
yi
Uo)
n
2
yi
i=l
6.
Two important points about r should be remembered,
it is simply a measure of^association.
causal relationship.
First,
It is not a measure of a
Two variables may have a very high correlation,
but may have no causal connection whatever.
For example, the
number of pupils enrolled in primary school in Brazil and the
price of coffee have shown a positive correlation over recent
years.
But this is no evidence of a causal relationship^
6^.5” Secondly, r is a measure of linear association.
Two
variables may be highly associated in a non-linear way.
But formula
(/^) will not necessarily indicate any association.
Example 11;
i.
calculation of correlation coefficient, r
Assume that we wish to calculate the coefficient of correlation
between real income per capita and the gross primary level enrolment
I.. •
ratio in a country over a period of 5 years.
Table 10 shows data
for a hypothetical country.
r
Table 10:
real income per capita
and gross primary level
enrolment ratio over a 5-year period
Year
1
2
3
4
5
Real income per
capita (thousands
of units of currency)
gross primary level
enrolment ratio
(%)
x.
i
yi
10
11
14
13
17
61
62
66
66
70
Looking at formula Wo) for r, we see that we need to set
out a working table as follows:
Sb.
Table 11
(1)
(2)
(3)
(4)
X.
1
yi
x.y.
X.
1
10
61
610
100
3721
11
62
682
121
3844
14
66
924
196
4356
13
66
858
169
4356
17
70
1190
289
4900
2
i
Zyi2
875
21177
Vi
i
65
325
Hence, from (-10 ):
4264
r =
(5)
2
2
yi
5(4264) - 65(325)
-
-—
Vg(875) - (65)^[s(2117^-(325)^
0.99 (to two decimal places)
The example shows that there is a positive, and very high,
linear correlation between real per capita income (x) and
the gross primary level enrolment ratio (y).
It does not
of itself demonstrate that x causes y, or that y causes x
(or indeed that they are both caused by some third factor).
6.6
6.6.1
Spearman’s coefficient of rank correlation
In some circumstances it may not be possible to obtain
the precise values of variables, or for other reasons it nay-
only be possible to rank (i.e. list in order) the variables in terms
of size, importance, placing or some other attribute.
be ranked using the numbers 1, 2, .
n.
The data may
If two variables x and
y are ranked in such a way, the coefficient of rank correlation
between x and y is given by:
Si
n
r
where:
d.
i
rank
=
6£ d 2
i
(-ZO
i=l
n(n2-l)
difference between ranks of corresponding values
of x and y
n
=
number of pairs (x i’yi)
n
i
The coefficient calculated from formula (^/ ) is known as Spearman * s
rank correlation coefficient.
correlation. r
Example 12:
As with the coefficient of linear
. lies between -1 and *1.
rank
rank correlation
Assume we wished to calculate the rank correlation between the
proportion of total population living in urban areas (x) and the
pupil-teacher ratio
countries in 1975.
the elementary level (y), in seven
below shows the ranking of the
Table
seven countries for each variable and two further columns giving
the absolute difference d and the squared difference d , between
rankings.
Table 12:
rankings of seven countries by percentage urban population
(x) and elementary level pupil-teacher ratio (y), 1975
Rank
d
d.2
X.
1
yi
i
(x.i -yi)
Afghanistan
6
4
2
4
Congo
1
1
0
0
India
3
2
1
1
Indonesia
4
5
1
1
Nepal
7
7
0
0
Philippines
2
6
-4
16
Sudan
5
3
2
4
Country
i
K ■26
Applying formula (A/ ):
r
. x
rank
r
. =
rank
6(26)
7(49-1)
1
0.54
* j , Gu
nTS
£l
EXERCISE 8
The reader should nov do Exercise 8 on page 7T.
6>7
Causal relationships:
6.7.1
regression analysis
Regression analysis is one of the most powerful tools at
the statistician’s disposal for quantifying causal relationships .
The reader will recall the discussion of mathematical functions in
Section 5 of the Basic Mathematics Chapter, involving two or more
variables.
The general equation of a linear function was introduced:
(JU-)
y = a fb x
In this expression, an example of which was- plotted as a graph:
y
is the ’’dependent” variable
x
^s the ’’independent” variable
a and b
6.7.2
are the ’’coefficients”
The primary function of regression analysis is to estimate
the line which in some sense (defined below) best fits the scatter
of data that has been observed.
In the real world, observed data.
if plotted on a scatter diagram, rarely lie conveniently on a
straight linei
But regression analysis offers a method of
calculating the coefficients, a and b, which are estimates of
the parameters of the underlying functional relationship hypothesisedto have given rise to the observed data points.
The reader will
recall, from the discussion of the general linear function CiZ),
that once a and b have been quantified, the equation represents
a unique straight line which may be drawn as a graph.
6.7.3
Let us proceed by example to explain the fundamental ideas
of regression.
Imagine that you
were investigating
the relationship between two variables:
t = time
and
y
intake rate of pupils into first grade of primary level
Assume you had the following table of data, collected over the 6 year
period 1968-1974:
Table 13:
Intake rate, 1968-1974
t.
Year
1
Intake rate(%)
yi
I .
1968
t1 = l
26.49
1969
to=2
y2 = 26.46
1970
t =3
y3
27.67
L.
1971
1972
t_=5
0
y5 = 30.97
1973
t6=6
y6 = 30.48
1974
I
L_
6.7.4
1
y4 = 29.32
*7
31.74
first step in investigating the relationhip between t
and
is to draw a "scatter diagram".
8.
Two things are immediately evident:
This is presented in Figure
- there is a positive correlation between t and y
- the data points appear to be reasonably well-approximated
by a straight line.
We make the hypothesis that a linear
functional relationship exists between two variables
It is important to note that, although we hypothesis an underlying
linear relationship between y and t, the observed data do not lie
0
Figure 18
(J
............ Scatter diagram of
............... !’
'
•j
1
y i - _
i ■
.i
4"“1 r^1—4—— •
4 . j -i .
n
■■■■!,
Intake rate
11
and time
J—
ICd
I
:
11
•i
!
• -J
I
1
luo
■I
i!
..J. ..
ilo
I/O
X
d
■.
Jo.o
I
L-.
•T
■5 il.o
Alo
Mo
II
►>
1-
0
i
^r
A
-T*
1
-r-
-r-
f
t =
Source:
see Table /3
7
6
time
7
IO
exactly on a straight line.
This is because^ in reality^ other
variables in addition to t have their influence on y.
However, we
make the assumption that their separate influences_are small and tend
to cancel out.
Thus, their net effect may therefore be regarded as
purely a chance variable, causing the random fluctuations of the
observed data points around the true linear relationship.
The
regression line is our estimate of the true, underlying linear
relationship.
r_J The question now arises: how do we actually fit a straight line
to this scatter of points?
fit?
Which is the ’’best” straight line to
These questions, the reader should by now appreciate, are
equivalent to asking:
what are the estimated values of the coefficients
a and b which will give us the best-fitting straight line?
I .
6.7.6
In other words, we wish to estimate the following equation:
I .
y = a + b t
(AJ)
Notice that the equation is not:
t = a + b y
(-2^)
In regression analysis, the dependent variable is on the left
hand side.
The ^explanatory^1 (independent) variable is on the
right hand side.
We do not believe that time is dependent on
the intake rate!
6.7.
How, then, are a and b to be calculated?
One, unsatisfactory
way would be to draw, by eye, that straight line through the
scatter of points which seemed, subjectively, to be the best-fitting
line.
This has the disadvantage that different persons (and even the
same person on different occasions) will obtain different results.
The method of least squares
6.S
6.8.1
A much more satisfactory, and widely-used, method is that
of estimating the so-called regression line by the technique of
ordinary least squares.
The method of ordinary least squares gives that
straight line which, when drawn through the scatter of points, minimises
the sum of the squares of the (vertical) deviations of the points from
the line.
6.8.2
It can be demonstrated, by mathematical techniques too
advanced to be used here, that the values of b and a which give
the line having this ’’best-fitting” property are defined by the
following formulae:
(y.-y)
b =
i=l___________
i(ti’7)2
i=l
a =
where
y
(54)
b t
i = 1, 2,
n
n = number of data points
6.7
6^.1
General formulae of the regression coefficients
When estimating the linear regression of y on x, that is,
the regression line:
y = a + b x
(2a)
the ordinary least squares formulae for a and b are:
h
i-x) (yi-y)
i=l______
(x^ -x)2
(X})
i=l
a
y - b x
(A?)
S’ z
For more rapid calculation, the following equivalent formula for
b may be used:
n
r
n
yi
b =
n
r
n
x
i=l
6.?.2
i
—
i=l 1
(2^)
2
i
It may be noted that the formula for the regression coefficient
e
b is equivalent to:
covariance of x and X
variance of x
that is,
I.
I•
I
■
L
1-
formula (/8)
formula (zj)
Example 13 ■
Estimation of regression line
Let us calculate a and b, given the data in Table 13 above*
shall first estimate b, using (2-7) above.
The following working
table is drawn up:
Table If
(1)
(2)
(3)
(4)
(5)
(6)
t.
yi
(t.-T)
1
(yj-y)
(ti-t)(y1-y)
(tj-T)2
1
26.49
-3
-2.53
7.59
9
2
26.46
-2
-2.56
5.12
4
3
27.67
-1
-1.35
1.35
1
4
29.32
0
0.30
0
0
5
30.97
1
1.95
1.95
1
6
30.48
2
1.46
2.92
4
7
31.74
3
2.72
8.16
9
^(ti-t)(yi-y)
Stj-t)2
27.09
28
1
23
203.13
t=4
y=29.02
Hence, from formula
:
b = 27.09
28.0
= 0.968
and from formula (26):
a = 29.02 - 0.968 (4)
= 25.15
Hence the estimated regression line is:
y = 25.15 + 0.968 t
We
^0)
IQ Interpretation of the estimated regression coefficients
The straight line, y = 25.15 + 0.968t, may now be drawn on
a graph.
This has been done in Figure 9.
only 2 points are necessary.
With a straight line,
Hence the line has been drawn from
the point at which it intercepts the vertical axis, (0,25.15) through
the point (4, 29.02).
The latter is the point of means (t, y).
All
regression lines pass through the point of the means of the variables.
F ’
The line has the property of all regression lines:
it minimises the
sum of squares of the vertical distances from the observed points to
the line.
That is, it is the line which minimises
n
-2
X/yry) •
i=l 1
It is in this sense the best-fitting line
6.ID.2
In the Basic iMathematics chapter we discussed the interpetation
of the coefficients of a linear equation , and if necessary the reader
should refer back to this.
6J0.3
L‘
c
intercepts the axis.
The line has been extrapolated back in
time from the period during which data were observed (t = 1,2, ...,7)
to the point (t=0, y=25.15).
6./0.4
I '
a is the point on the vertical axis at which the line
b is the slope, or gradient, of the regression line.
that in this case it is positive and equals 0.968.
interpreted as saying:
Note
This is to be
for a unit change in the variable t, there
is an average + 0.968 unit change in y.
Thus, for example, as time
changes by one year, the regression line tells us to expect an
average increase of 0.968 percentage points in the intake
L
rate.
In 3 years, we can expect a change of (3x0.968) percentage
points in the intake rate;
and so on.
EXERCISE 9
The reader should now do Exercise 9
oa
%
4
Figure 9
Regression of
F^f
I
-'.FF-jTrpTTHFF
' e^t.r;
-Lt£2M
•q.i/
-drt
r KO
H'.:.
J
f_L__
f;-1
1
.. i—
.■
I. « 1 ■-i-i' I
3 ! J
period OI
of
... pcFJioa
;• ;i i
’
e
Jio
(10^1,.!$)
' -r -:
L_.L
' I ! I ' ■
I
.'. H-r;
•H
period of
extrapolation
L.
-1
Ia
Xi/
observed data
1 I i
o
,.
i. :h-T
rr;'pF
! T‘-
intake rate on time, and
. .i
i
I
X
: regression line;
y ± 25.15 + 0.^68 t
i
X
I
(/iSh)
i
X
i
i
Xl.o
X
i^o
■'t
•
I-
■
I
i
6
!
y~pi)^
I
r
Sources
see Table 13 ctrZ £x<x
F
I
i
i
47
i = time
7
—r
ft
6).
6. I|
6.)LI
Extrapo 1 ation
Figure 9 shows one of the most important uses of the
regression method.
This is its value in forecasting.
example, given the equation calculated
regression line of
For
in Example 13 of the
intake rate on time:
y = 25.15
0.968 t
©
we can substitute in values of t.beyond those observed (i. e
beyond the period t = 1, 2,...., 7).
For example, the regression
line, if extrapolated beyond its period of observation passes
through the point
(10, 34.83)
as is shown in Figured.
For, when t=10 (in 1977):
y = 25.15 + 10 (0.968)
34.83%
G.n»2
This is our forecast of the
intake rate in 1977.
M based
The forecast
two very important assumptions: firstly,
that the observed trend is linear, and secondly that it can be
extrapolated safely in a linear fashion.
The
below on
non-linear curve-fitting discusses the first of these assumptions.
As regards the confidence that can be placed in simple linear
extrapolation, it must be said that this should only be done with
care and thought.
All forecasting, however done , is subject to
error and uncertainty.
Beyond a few years into the future the
uncertainties become so considerable that relatively sophisticated
statistical techniques tend to lose their advantages over informed
guesswork.
6. 12^ Interpolation
6./L*l
The regression line allows us not only to extrapolate but
also to interpolate.
That is, if we wished to have an estimate
of the intake rate at some time within the observation period
other than at those times at which it was actually observed, by
substitution of the appropriate value of t we obtain a value of
For example, when t = 3^,
25.15 * 0.968 (3.5)
T
y
X
28.538 %
This has also been shown on Figure 9.
6.12.2. The reader is reminded that linear regression of a dependent
variable on time as the independent variable is a very important,
but not the only, use of the least squares method.
Regression
analysis deals not only with the estimation of trends in time-series
data, but more generally with the estimation of causal relationships
between any quantifiable variables.
may be estimated.
Further, non-linear functions
And regression analysis is not limited to
functions' involving only one independent variable.
However, we
can deal here only with bivariate regression, in which there is one
dependent, and one independent, variable.
Nevertheless, the more
advanced techniques of multiple regression allow the statistician
to estimate functions of (in principle) any number of variables:
y
6.12.3
=
f (Xj, X ,
(3/ )
x)
n
In fact, practical considerations will usually limit the
analysis to only a few independent variables.
In particular,
lack of sufficient data is almost always the constraint facing
the practicing statistician.
Other things being equal, the more
data (observations) available the better.
More confidence can
b« placed in statistical estimates the larger the number of
observations, n.
6.12.4
This is intuitively reasonable, but can only be rigorously
demonstrated by mathematical techniques in the theory of probability
which are too advanced to be presented here.
As a rule of thumb $
in estimating bivariate linear functions, though it is possible
with as few as 3 observations, little confidence can be attached
to the results unless at least 5 observations are available.
More
are always to be welcomed, particularly if the data are well
scattered.
6.13
Non-linear curve-fitting
6.13.1
Vhenanalysing the relationship between two variables, the
hypothesis that the data, when plotted on a scatter diagram, may be
approximated by a straight line may become obviously unreasonable.
A non-linear function may be called for.
In the Chapter on Basic
Mathematics the idea of a non—linear function was introduced in
5O3.
The simplest non-linear function is
y
U
a + b x
2
(3L)
the graph of which describes a parabola (1).
This function may
easily be fitted to data if the scatter seems to have a parabolic
form.
By a simple transformation
of the variable x2 , we obtain
the linear function ( 2^-), and can apply the least squares
formulae () and (A7 ) for a and b above.
We transform x 2 by
simply re-naming by a symbol in the first power such as w, hence
( 32 ) becomes:
y
a + b w
(J3)
which has exactly the linear form of (^i).
(1)
See Figure 3 in the Chapter on Basic Mathematics for the graph
of y = x4-.
/
61.
6.13.2
Application of the least squares formulae
for b and a respectively gives:
n
(*.-*)
L
X
i=l
n
b
(w -w)
i
i=l
a
y - b w
Having now estimated a and b, we may transform w back to x
plot the estimated, curved, regression line.
2
and
We may say we have
"fitted" a parabola to the data by least squares.
6.13.3
Variables which are growing exponentially over time when
plotted on a graph will show a curve with an ever-increasing
positive slope.
The variable is growing over time at a constant
proportional rate of growth (1) and has an equation:
a bX
y
)
Again, by using a simple transformation to obtain a linear form.
the method of ordinary least squares regression may be applied.
) by taking logarithms:
In this case we transform (
log y » log a * log b (x)
6.13.4
(3$*)
If we now write:
log y = Y
*
log a = A
log b = B
x
= X
equation ( 2^) may be written:
Y
(1)
A
B X
See Chapter on Basic Mathematics, Section 4.
(36 )
6^
which may be seen to be a linear equation.
A and B may now be
estimated by application of the least squares formulae.
First
we may estimate B, and then transform back (by taking its
antilogarithm) to find b.
Similarly we find a.
We have therefore
calculated the least squares regression line:
y
«>
0^)
a b
That is, we have fitted an exponential function to the data.
The
fitted curve may be used for interpolation and prediction, as
described above.
6.13.5
The final non-linear function discussed here is the logistic
function.
The general form of this function may be seen in the
seventh frequency curve of Figure 4 above.
functional form to fit to certain data.
This is a useful
It can be seen that the
curve begins at a low value of y with a slope close to zero.
The
slope increases steadily, then begins to decrease again towards
zero at the ii saturation” level of y.
to behave like this over time.
Certain data may be expected
For example, consider the net
enrolment ratio for primary education.
This is defined as the
ratio between the number of pupils at this level who belong to the
official age-group and the total number of children in this age-group.
In percentage terms, it therefore has a theoretical minimum of 0%
and a maximum of 100%.
It cannot, i.e., grow without limit.
Over a long period of time, it may be expected to develop
approximately according to the
shown in Figure 4.
Eventually
the saturation level is reached beyond which the ratio cannot go.
One general formulation of the logistic function is:
k
I
(3^)
6.13.6
logarithmic transformation, jJi^Aa necessary
After
assumption
oeen made about the value of k, this function
may again be estimated by the method of least squares.
However,
readers who wish to utilise the logistic function in their work
*
are recommended to consult a more advanced text.
*
6.14
6.14.1
Regression:
a summary
Regression analysis is an extremely valuable tool in the
statistician’s kitbag.
However) like any tool, it can be misused.
The most obvious danger is that it is applied in circumstances where
there is no underlying theoretical justification for the causal
relationship which is estimated.
It is always possible to regress
any quantified variable on any other such variable.
But there
can be no useful interpretation of the estimated coefficients unless
the causal relationship under investigation has been carefully
set out and justified from the beginning.
6.14.2
The reader should always remember that, however seemingly
sophisticated the mathematical and statistical techniques are,
the results depend on good original data.
Poor, inaccurate and
insufficient data cannot produce results in which the educational
planner can have any confidence.
The development and maintenance
of a good data-base are the first and last tasks of the statistician.
*•
6.14.3
Despite these warnings, the regression method is the best
available for quantifying causal relationships.
easily understood.
It is reasonably
The method of least square* is an intuitively
attractive method of fitting a line to observed data.
It lends
itself to simple methods of interpolation and extrapolation which
are important at a number of stages in the production of educational
projections.
6.14.4
Regression analysis, like all statistical techniques.
can be misapplied and misinterpreted.
There are choices which face
the statistician in using the technique, to which there are no
**
straightforward, technical, answers.
satisfactory?
For example, is the data-base
Should some data be rejected - or further
information sought?
In deciding on the form of functional
relationships, which variables should be included, which excluded?
Should a linear or non-linear function be fitted to the data?
In making these and other important analytical decisions, the
statistician will always be guided by experience and intuition
as well as by understanding of purely technical methods.
EXERCISE 10
The reader should now do Exercise 10 $rx
/
EXERCISES
1
10
ANSWERS TO EXERCISES
PP*
1
10
PP-
69^9
EXERCISE 1
1.1
Write each of the following as summations:
i)
X1 + X2 + x.. + X4 + X5
ii)
(21 - 4) -<• (Z,2
iii)
X1
iv)
X1
v)
y4 + *5 + y6 * y7 + ^8
Vi)
2
3b 1 2 ♦+ 3b
3b2 + 3b32 + 3b4 2
vii)
(X3 “ y3 ) + (x
' y4}
4CX,
+ x9)
2
2
2
+ X2
4) + (z
+ X3
2
* x.3 o + X4 2 f4 * X5 2 f5
8
1.2
X5f5 + X^f
6 6
3
x.
i=3
1
5
ii)
i=l
3
iii)
(x^ +
41
i=l
12
iv)i^ f
i
v)
x.
1
5
1^2
/
X7f7 ~ X8f8
Write each of the following without the summation signs:
4
i) 51
*
(X5 -
y.) + (x4 -y4))
ix) 3((x,
3
x)
2
+ X4
‘4
viii)
4)
x.1 yi
7o.
1.3
Given
i)
= 3, X2 = 2, X3 = 4 and y
3
iti
= 6, y2 = 4, y- = 7, calculate:
X.
1
3
ii) 1=2 x.i
3
iii) g (x i+ yp
iv)
3
^9
2y*
i=l
v)
X.
1
2
4
7/,
EXERCISE 2
2.1
4
50 students toox an examination and obtained the following scores :
9
21
43
39
44
31
41
27
71
78
61
63
89
76
54
37
57
69
42
66
62
40
0
51
57
57
65
56
88
80
44
64
51
59
48
63
57
44
18
69
52
41
50
58
49
84
79
57
99
57
Divide the scores into 10 equal class intervals.
the frequency distribution.
Write down a column of the
class mid-points
9
/
"12
EXERCISE 3
3-1
Using the table of frequencies you have calculated in
Exercise 2, construct a histograa after having aggregated
the first 3 classes-
Draw in the frequency polygon-
o
73.
EXERCISE 4
4.1
Table 4.1 below gives the
years of service of
a sample of teachers in Afghanistan in 1973.
Table 4.1
Years of service
Frequency
0 < 6
10990
6 <12
6828
12 <18
1402
18 <24
880
24 <30
668
30 <40
185
Total
20953
i
i)
calculate the percentage frequencies
in each class (to one decimal place)
ii)
calculate the percentage cumulative
frequencies (commencing with the 0<6
years of service class frequency)
iii)
draw on a graph the percentage cumulative
frequency polygon (ogive).
iv)
from your graph, what percentage of
teachers do you estimate have given
less than 15 years of service?
of service or more?
/
5 years
7^.
EXERCISE 5
5.1
Using the data presented in
Exercise 4 ,^calculate/
to one decimal place^the average number of years of
service given by teachers.
w
EXERCISE 6
6.1
Use the percentage cumulative frequency polygon you constructed
in Exercise 4.1(iii) to calculate the median number of years
of service of teachers.
*
6.2
What is the class exhibiting the modal frequency in Table 4.1t
Exercise 4?
6.3
What is the value of the mode in the set of data presented in
Exercise 2.1?
I
76,
|
EXERCISE 7
7.1
Using formula ( /7) calculate the standard deviation
the set of 50 examination scores^given in Exercise 2.1^
10
{KfyxroJs
h>
7.2
Calculate the arithmetic mean score. x
7.3
Calculate the coefficient of variation.
I
s
of
11.
EXERCISE 8
8.1
Using the data in Table 8.1 below, calculate the coefficient
of linear correlation^r, between x and y where:
x. = distance of home of pupil i from school (kms)
i
yi = number of days absent by pupil i from school
in a year
Table 8.1
8.2
X.
1
yi
1
3
2
1
4
7
8
3
5
15
4
7
20
23
17
4
Rank the data in Table 8.1 in ascending order.
(Where there
are ties, assign to each of the tied observations the ranks
which they jointly occupy).
rank correlation, r rank*
Calculate Spearman’s coefficient of
EXERCISE. 9
9.1
Over the period 1970-1976, total primary level enrolment
in- thousands in Gabon was as follows:
Table 9.1
*
Total primary
level enrolment
(thousands), E.
1970
1971
1972
1973
1974
1975
1976
t=l
t=2
t=3
t=4
t=5
t=6
t=7
94
101
106
110
114
121
129
u
Calculate, using formula (29) for ease of computation, the linear
regression of primary level enrolment, E^, on time, t.
9.2
Interpret carefully the meaning of the estimated coefficients.
a and b.
19
EXERCISE 10
10.1
Using the equation of the regression line calculated in
Exercise 9, draw it on a graph.
10.2
By linear extrapolation of the regression line, use the
graph to predict E^ in 1977 and 1980.
Check the accuracy
of your answers by substitution of the appropriate values
of t into the estimated regression equation.
SoANSWERS TO EXERCISE 1
1.1
i)
£
x.
ii)
a
z. — 3(4)
i
i
2
iii)
Xi
5
iv)
2
x. f.
i 1
1=1
8
v)
1=4 yi
vi)
3
A
b
1=1
2
1
1
5
vii)
1=3
x
1=3 yi
1
9
viii)
x.
i
lx)
1.2
V
4
4
X.
1=3 1
1=3
yi>
6
8
x)
S xifi
1=7
1)
3
3
X3 + X4
11)
X1 + yl + X2 * y2 + X3 + *3 + X4. + y.4 + X5 * y5
iii)
4 + 42 >43
iv)
f9X9 + £10X10 + f11X 11 + f 12X12
v)
X2y2 + X3y3 +
x.f.
i i
84
4V4 + X5y5
1.3
i)
X1 + x 2 + X3 = 3 + 2 + 4 = 9
ii)
X2y2 + X3y3 = 2(4) * 4(7) = 36
iii)
xi + yl + X2 + y2 + X3 + y3
iv)
2(yl+ y: + y3) =
2
v)
2
2
2
X1 yl + X2 •y2 + X3 ^3 =
¥
2( 6+4+7)
3+6 + 2+4 + 4+7
=
26
34
9(6) + 4(4) + 16(7)
=
182
e
ANSWER TO EXERCISE 2
2.1
Frequency distribution of scores of 50 students:
Scores
Frequency
Class Mid-points
0 <10
2
5
10 < 10
1
15
20 < 30
2
25
30 < 40
3
35
40 <50
10
45
50 <60
14
55
60 < 70
9
65
70 <80
4
75
80 < 90
4
. 85
90<100
1
95
Total
50
w
f?.
ANSWER TO EXERCISE 3
3.1
See Figure/0.
Note that the rectangles of the histogram are centred on the
class mid-points (5, 15,
95).
The frequency polygon.
joining the mid-points of the tops of the rectangles, is
continued at the ends of the distribution so that the
area under the frequency polygon is exactly equal to the
area of the histogram.
is 1.66 (i.e.
to frequency.
’ 3
The height of the first rectangle
) to obey the rule that area is proportional
I"
J
t
L_„4—--4
T.
■
'■
■-1 -
b—
: I
w
i
4.---------- !
I
J
'
■1
-
i
>....
i- -
I. .
\
f
3
r :-
..:... r_
£
<D
o
o'
o
S’
—-j—
o
®
C5
O
4;.l
O
•
I---
GJ
i
»
.1
z.
iF
/
IF
-F--? | =
• -.1
.. r......
• •
-XI-s-. -?
F eaS
, 7 g « §' ■!
tr
r ■
j
4
■
«sF^
2 ..._..
S.S- ;
: ...; i ■
... .
3
o
o
ANSWERS TO EXERCISE 4
4.1
i) and ii):
percentage frequencies and percentage
cumulative frequencies are given in Table 4.2 below:
*
Years of
service
percentage
frequency
percentage
cumulative
frequency
0< 6
52.4
52. 4-
6 <12
32.6
85.0
12 <18
6.7
91.7
18 <24
4.2
9J. ?
24 <30
3.2
99.1
30 <40
Total
iii)
100.0
100.0
See Figure It.
Note that the following points
have been plotted:
(6, 52.2}), (12, 85.0), ..
(40, 100.0).
iv)
See Figure 1I•
The graph indicates that
approximately 89% have given less than 15
years service, and approximately ^6% have
given 5 years service or more (i.e., (100 - 44) %).
/
r" ■
: ?i
; •-
*->Q
..... -j.
•**’ <*
•
~ Q ~7
i
*
I..
Jr
I!
i
"p-7.p".".
rr—
f Lp
V p. o3 :::: o":t
o
i
j
——
-i-
s
—t-
, M-
Q W
,;4p Hp-pL
Lt: d
—..... Q|;
ZJ-...
C
h*--- 59
4—f-h-
I
■!
I
:1
- i
I
I
W4 HL_ _
<A
10
L- . ?- £:; -
w
i
TLj
I
ffc
«
■“
:.Tr :.ij
-
S’ !
i ' MT:' ■■
Iffi
F
-i.4-
_
r- .
•5 j
O
r.
w ■■
------ :r
7<i« r 5
! -" 4"
p; ■
-Wp:
-—-4 4
r.*?
r
[
H-i
--::i-.xs
224^-R
t— ■
i4
■:
• h>$*:
—-e*-:
! <77L::- . _
■r3:a’»2'<
:
‘
'
----- ?_
i
T
-v
r •
i
I
. - 7
Li
—
-7-4--
p 1L
• r • 4- • ■•___ £
I
4
-4
4-4
r i
r
J
L..4..7-
-.j—
i
'
’
-
I
H;4r:
:
•-
.
I
,..<-p
....................................... ■
i--T z-i-••
t
c
S:!
■*-*
O
f>i
;
1
Sa !i
; -:.
◄
4
4-
-F
- rp-Q
• *>'
•
f .
*1
fi
H*
ffl
»•:
-4—- p-h|g'i
ST s>i
. ._J_X;--- u:-..~ —J —
A.
i’-
■:
IB” <»‘
L .t-n:...
•.TT’
ly
L...'
:®' E’-;
*-*
V-V . Hrp-
jZL-L--.:—
?
i
451*^
-.4^2; S« ’I;
!
i
.
L|
|44/:’o L. :
44.0
*
• -XK-Llp--p—
74-''■ 1
_.4 .
-I
.. 4-
I
j
i-.
j.
.
........
i
i
■
___ 1
ANSWER TO EXERCISE 5
5.1
The reader should construct the following table:
Table 5.1
Year^ of
service
frequency
f<
J
class
mid-points
x.
J
f ,x.
J J
Q
10990
3
32970
6 < 12
6828
9
61452
18
1402
15
21030
18 < 24
880
21
18480
24 C 30
668
27
18036
30 < 40
185
35
6475
Totals
20953
0 <
12
The arithmetic mean,
k
itf. x.
5=1 J 3
£
j=lfj
158443
20953
7.6
years
x,
158443
ANSWERS TO EXERCISE 6
6.1
By dropping a perpendicular from the point on the polygon (see
Figure 11, in Answers to Exercise 4) at which a line drawn from the
50% value on the vertical axis intercepts it.
we see that the
median is approximately 5.7 years.
6.2
The first class, (0<6) years of service, shows the highest
frequency, 10990.
6.3
The modal score is 57;
it occurs 6 times, which is more
frequently than any other value.
zlNSWEliS TO EXERCISE 7
7.1
The reader should set out a working table similar to Table 9
A=5o, has been chosen.
in Example
10 .
Table 7.1
Examination scores of 50 students
An arbitrary origin,
x.
(x .-A)
3
=d.
3
f .d.
J J
f.d2
3 3
2
5
-50
-100
5000
10 <20
1
15
-40
-40
1600
20 <30
2
25
-30
-60
1800
30 <40
3
35
-20
-60
1200
40 < 50
10
45 ■
-10
-100
1000
50 4 60
14
55
0
0
0
60 < 70
9
65
10
90
900
70 < 80
4
75
20
80
1600
80 < 90
4
85
30
120
3600
90 <100
1
95
40
40
1600
IZ7f .d.
3 3
-30
2If d^
frequency
class mid-points
f.
J
0 ^10
Scores
Totals
J
J
50
From formula ( r/),
Tk
k f.d2
s
s =
3^1 J 3
n
2
-3=1 3 -1
years.
\
h
2
^•3 3
18300
'0 .
ANSWERS TO EXERCISE 7 (continued)
7.2
Using the formula
4
x
3
4-
-30
50
55
54.4
7.3
n
years.
The coefficient of variation, V
s
. 100) %
X
Kax . loo) %
54.4
V
*
J
x 3^.1 %
w
I
■)
I
I
ANSWERS TO EXERCISE 8
8.1
The reader should set out a working table as below in Table
8.2, similar to Table 11 in Exqm^^
11.
Table 8.2
*
yi
Xiyi
x.
i
yi
1
3
2
1
4
7
8
3
5
15
4
7
20
23
17
4
5
45
8
7
80
161
136
12
1
9
4
1
16
49
64
9
25
225
16
49
403
529
289
16
2^
^yi
y.
29
95
454
1
<
2
2
X.
-5-
Tx.
2
153
-T
Zyi
2
1549
From formula (20)
r =
8(454) - 29 (95)
- (29)^/[8(1549) - (95)^]
. . r = ±0.77
There is a
moderate positive linear correlation between
distance of pupils’ homes from school and the number of days
absent per year.
8.2
9
A
A working table as below should be set out, similar to Table
12 in Example 12.
9Z
Table 8.3
Rank
<L.I
Xi
yi
(x.-y 1.)
d
4
3
1
6
7
8
4
3
5
1
4
7
8
6
1
-2
-1
2
-3
-1
-1
2
3
4
1
4
9
1
1
4
9
2
i
»
7 d.2=33
Applying formula (21):
r .
rank
1
rrank
* 0.61
6(33)
8(64-1)
<
ANSWERS TO EXERCISE 9
9.1
In order to use formula (29), the reader should set out a
working table as uelow:
Table 9.2
i
t2
t
Et
1
2
3
4
5
6
7
94
101
106
110
114
121
129
1
4
9
16
25
36
49
2ZE.
2
28
775
140
t=4
E =110.71
tEt
94
202
318
440
570
726
903
^tEt
V
A
3253
V
for yi), we
From formula (29), (substituting t for X. and E
1
V
obtain:
b
. < b
7(3253) - 28(775)
7(140) - (28)2
5.46
Having calculated b, we may now substitute the values of b,
c and E
into formula (28) to obtain a:
V
a
a
=• 110.71 - 5.46 (4)
88.87
Hence the equation of the estimated regression line is:
Et
88.87 + 5.46 t
‘Hi.
9.2
interpretation of a
Note that when t = 0, ~by substitution into the equation of the
estimated regression line, it can be seen that E^ = 88.87.
That is, we have found a, the intercept of the regression line
on the vertical (E.) axis.
It is our estimation of what
j
enrolment was (in thousands) at time t = 0 (1969), assuming a
backward linear projection of the regression line.
Interpretation of b
b is the slope of the estimated regression line.
The estimation
of b = 5.46 implies that a unit increase in time, t (one year)
gives rise on average to a 5.46 unit (thousand) increase in primary
level enrolment, E., in Gabon, over the period of observation,
t
1970-76.
<6
A
ANSWERS TO EXERCISE 10
10.1
See Figure 12.
The
'
A
may be drawn by passing it through
the points (0, a) and (t, E^), i.e., (0,88.87) and (4,110.71).
4
10.2
In 1977, t = 8.
The graph shows
to be approximately
f32. :T »
*
1980, t
149.
= 11.
The graph shows E^ to be approximately
Graphical methods are inevitably limited in their
accuracy.
Predicted values are best calculated by direct
substitution of the values of t into the regression equation
Thus, when t
E8
i
. . EQ
when
8 (1977):
88.87
5.46(8)
132.55 (thousands)
t
11 (1980):
E11
88.87 + 5.46(11)
“ E11 =
148.93 (thousands)
i-.-P—5P-'-S4-O-
iiiji
br ...
g - qLq:
g
la
t
a IE
■i-
|;:77tr-:: *:
J
.;. I. L..
»
EWW
::77r: & 5
• -Z:-:
!■
iy
-IfZ
a. 2::. WP
'Ba"
w
'
BLBBiBiBjB
-tr-i Bz
LBBBiBBBb
:;:.: ~— b—*1 z: L"7~7 ’ ’ •; '-■ * •
_______
BpZa
'•W
—
•-t• • .....................
B>OB77LB
l' ___ Tea ...
BL-Bi’fBnzrEbT-EBB-B iiiziz?feQ
’zgzi
LhBLiBIzaKW
Zzpi;
B-BB
Z-TUg WEO WEW[WI|tWE
BLZZ WjW WW-WhWWW
zigg ::::.fSi7 WEgiWiWW
- ■ B:Z
zEWpW£-EWW
EiOiEi^^
,
ipigfeWWP Wsag WwW
pFb“P;ZEZi: V Wt^EE iz^Vzgg
E®
:z:el bL
®w ;"rH
________,
77 7 Z77|b.77.££r: ^77:.
^lttbtxtx-zbtzbb
ggrEizIg
—pbfzz
—"
"B WgFFb:!, Azzzzzz
P^L^zzfzzW
LpgWgbE BBBB
.Bi IpZBBiipB
weweew ■:B„
..
.,..
zgg.gznrg
::-7T777g{.L.: rxr:: Z LfesgTTTZZZZZZTrZTTrT "Z-fe-Tt TrTTtTzZ agg-F rTTt.ZH'
BBI :;BX 'BBIBB..’ |:rr7tZr7
f>:^bi
BBBBBbfeF ^ —ZZ: L-zzTiZzxrrTZTL-ZjZxZiinTrxritf 7g.zizmiinm4itSH:I :
•BB nBLBBIBB
1
EqE BB- ZBifeBZblhbbbb lbllbbb
IZZiTbifBzb': 1:43"::’ jiirl:
zfeibsiM z^
EEiE BB BBLLBfeigfe' ::7:p.7~4:b:r_'j
i
Z
z
L
b
BBpiaB
...zWzWL1: b BiiBiBB
ZB HZ :7.Tf4 77.:t:: r: L"
ZfeBBiife BjB
■ps.
bfeWfe
,Lj
^iLL
BBiiBiB
:n
.
_
_
lEiB yBipL ilB
Bl
T ._. .7. ...
‘
Bi.E
bITb’b
~
i
iBjLiBiB
EtW
WIbz -IT
Zb bb
.bi:
-e
•t;:p
___
ZZ
e
Zx
IP:
F;::::
i
:,
r
:p:
■:
yBZBB -UJ
<P::gt:.:
BLBb
B
i-sfe Bb •aizIzz^zlE-EzE •ZgilE® iBisB oLBB
WSP WE <r ZyB
e^7
4
”
r:::b:::
B BbB> r:i oiBB■pg
BBi HB; Lbb bl -zzfej-jWgzEiz
• -7-B'i
:;::rr:> zzjgi
Q-cbz
t;
CO-pb:
BBBfelBg tLfZL
BiB BB iiMifrbivii ' ZZZZ :Zi:; 7777. B'-ZZ BL :Tq~UH:::: g Ly [BriB
y-.bBB
BBI
t—u^x
BBbBB blIbL bbb. a WWW
LOlliBiB iBoBBiB LBb pBlLB
g i-bEb? 7-bhrb
gWEE n:j:
BbBb
-i-fHZn?—.~
Lb bi.: BiBLgg
japz?
i BlLgzLHiiBB zb;
BfeBL
77: -.: i •
-iZizzTzz-rr: bb
BilgBB
BLLBii
tWW. r
g-'Ez H
::Z 'ZZ|Z'ZZ'--’ 7 .:■ i egfi
auEB’.;; :r^:?
BpiLBiLsB £££ zzfeX
b BBI
**1
frt'p^
LZ-fez
::i
Zb
WgigWa
Bi gl
firi? wpOew
:77 : 77 L h 7
.ft.77.g-H-t
ZHIpiEl 77 77 : r-B.77.iZ7.-. 1
HZFL?
7 ' - b-B'■SlasSaat.
!.ZZ;g(-EPi ^<rEI lz izz ;.7 77:p"7p?iZ
EEizEEEEEE BBBBfe; ytn •H4.H:
’
0
-U :'.-~ HHlgr bblLg4g
bb i ife BB
iii-i IBbBbz zzfe ggffgfe^|sir;
•r::
7i.?rr. jo 7^zr ZrzfbZ •Bb. bbLlBb bB zBbLl
IZbW
BBS2B
g
‘bit
BBipsliitBp ■BL Fbi;
:z^
feZfSLZ- 774.7 T7 -77 Z I-'- ZT BT-Z ~tZ tZ: S-*^’.' ■ /{Efife
SBB
•ZZFbl-i ■
EE^IZLZ zilWzzggWjzz. B s [L; B ipg pB
Z|gB iP
L
lleEbIl—
.
.^..bZHbB
BZiLBBlBH
zhz
^I
z
E
zzEWW
lBbl
E
BBBzfejjfeBi
XEiM
’: '"it: Ttr!; r• • rriT;:: l: rr
:::: : :L. ::
: : :
LLBB;
WWEW
BLLL
i
WB
O»- '——
.':
Hy:: Fr|if ,
Lbbbb
Z^bb: .bbhb^PL
b
I; K-felg ..•rrrrr:
Pina LizbBBBBZ
b
iEEEEE
■■'■I.B-.feb:
birrHr
’
W77
g;-:
e*
'
Ww
■h!b BIB
zzE
y-. 7 :
e e
■gjW hTizZHZibziiBprWh;; inn
.:Z Eg: L:
ir:
:;.7p-7lii7u7i
QB7
Bifiiii
.BL
t^B
aaial t: ZZr;:~
iBU
r
^rfer
EW'EEW Bt 'WWEEEEE BiBI
® Bii
BgifeBLfeBb
LlT
siB
:: ::.t.-.7
I
7.
bi'
7771:7:: iziEzL
ZZ—.IWWala
yisS-SiBipE B iiiiBiliZ
■ : Z: f ■
KrErU:::
■4^'
HZrT
Z^gbPiZ
j T ■
17 777
: 7-:!i. j-TTj-BrJ::- T-fi-H:
ZHg^IiBBLi -BB 7!!rB:
I::
r
:jB
B:i:7:
::.
N:A,
ZiiBZ
EZbZ
Zbiii
I__ L_ ZJ.L
: cn
.•■■EE?: BBji-iB
gLtrt.:.
LB |!
:: 01..
BiB I
::: 77:71
iiUZbl
iWEE
I
ZZIZZ
[EETEE
B7
iiiifiiii >
-^FzZ
■fl r 7-7 H z:'
I
r ~ 7-: r :• EghEE
WeW
Biii
BZ^LBBiB IBB
■4
ZB
BB?
LiiB
plifeiiM ibB
L"
4¥
:!::
:r
:Bzz
Li EB
i\L Lbb!!-ZBiilBBBH
.......
Lb
lb. |::::j::::|;:
A
b
i
|e
217 Hh Hi; 77 HBiii
X
Bpl iibiiz®
::L
'■
B
"b iiiiiiii:
ZL
Zbii
J; . iiiiiiiii
iBBi :7::!!--|:gjZ::
tbWbt;:: i "■i;.
BBB
bLhibii!
7 Zb
BBng Zg
WibBLW
ar •’ bit g
Q. QU....
zfta
tzt i
1
K-
W
H
tttt
■ -•
■ BifeStefeiB
1
ilO
•-♦ *• L F-*— r
zg®.
zh
I
■
t
ww
EzM
H
I
rffe re
Wb
E
jg
I
“z“SB1
1
*|:’
B
1 g
g
i b
I■
BE
EL
_7_» ' '
y=
L
1® W^WWI
-• • ’r • • k •
z
'•• '1 ••\
BBI
I
I
ww
L -4
—lZZlbl
:
* ‘ < -f•-
r
iffiffi
ZB
nt
Lit
lbB
L
r
Lift
____ Bfc B
t
I
H ■w
w
I . yZ
life
®R
. co i .::: i
: EEIb:.: j;j --zIB-J! BipBiLi BHB
Export
Position: 3507 (1 views)