STAM101 :: Lecture 11 :: Attributes
Contingency table – 2x2 contingency table – Test for independence of attributes – test for goodness of fit of mendalian ratio
Test based on distribution
In case of attributes we can not employ the parametric tests such as F and t. Instead we have to apply test. When we want to test whether a set of observed values are in agreement with those expected on the basis of some theories or hypothesis. The statistic provides a measure of agreement between such observed and expected frequencies.
ChiSquare
The test has a number of applications. It is used to
 Test the independence of attributes
 Test the goodness of fit
 Test the homogeneity of variances
 Test the homogeneity of correlation coefficients
 Test the equaslity of several proportions.
In genetics it is applied to detect linkage.
Applications
– test for goodness of fit
A very powerful test for testing the significance of the discrepancy between theory and experiment was given by Prof. Karl Pearson in 1900 and is known as “chisquare test of goodness of fit “.
If 0i, (i=1,2,…..,n) is a set of observed (experimental frequencies) and Ei (i=1,2,…..,n) is the corresponding set of expected (theoretical or hypothetical) frequencies, then,
It follows a distribution with n1 d.f. In case of only one tailed test is used.
Example
In plant genetics, our interest may be to test whether the observed segregation ratios deviate significantly from the mendelian ratios. In such situations we want to test the agreement between the observed and theoretical frequency, such test is called as test of goodness of fit.
Conditions for the validity of test:
test is an approximate test for large values of ‘n’ for the validity of test of goodness of fit between theory and experiment, the following conditions must be satisfied.
 The sample observations should be independent.
2. Constraints on the cell freqrequency, if any, should be linear.
Example:=.
3. N, the total frequency should be reasonably large, say greater then (>) 50.
4. No theoretical cell frequency should be less than (<)5. If any theoretical cell frequency is <5, then for the application of  test, it is pooled with the preceding or scecceeding frequency so that the pooled frequency is more than 5 and finally adjust for degree’s of freedom lost in pooling.
Example1
The number of yiest cells counted in a haemocytometer is compared to the theoretical value is given below. Does the experimental result support the theory?
No. of Yeast cells in the square 
Obseved Frequency 
Expected Frequency 
0 
103 
106 
1 
143 
141 
2 
98 
93 
3 
42 
41 
4 
8 
14 
5 
6 
5 
Solution
H0: the experimental results support the theory
H1: the esperimental results does not support the theory.
Level of significance=5%
Test Statistic:
Oi 
Ei 
OiEi 
(OiEi)2 
(OiEi)2/Ei 
103 
106 
3 
9 
0.0849 
143 
141 
2 
4 
0.0284 
98 
93 
5 
25 
0.2688 
42 
41 
1 
1 
0.0244 
8 
14 
6 
36 
2.5714 
6 
5 
1 
1 
0.2000 
400 
400 


3.1779 
\=3.1779
Table value
(61=5 at 5 % l.os)= 11.070
Inference
<tab
We accept the null hypothesis.
(i.e) there is a good correspondence between theory and experiment.
test for independence of attributes
At times we may consider two charactertistics on attributes simultaneously. Our interest will be to test the association between these two attributes
Example: An entomologist may be interested to know the effectiveness of different concentrations of the chemical in killing the insects. The concentrations of chemical form one attribute. The state of insects ‘killed & not killed’ forms another attribute. The result of this experiment can be arranged in the form of a contingency table. In general one attribute may be divided into m classes as A 1,A 2, …….A m and the other attribute may be divided into n classes as B 1,B 2, ……B n . Then the contingency table will have m x n cells. It is termed as m x n contingency table
A B 
A1 
A2 
… 
Aj 
… 
Am 
Row Total 
B1 
O11 
O12 
… 
O1j 

O1m 
r1 
B2 
O21 
O22 
… 
O2j 

O2m 
r2 
. 







Bi 
Oij 
Oi2 
… 
Oij 

Oim 
ri 
. 







Bn 
On1 
On2 
… 
Onj 

Onm 
rk 
Column Total 
c1 
c2 
… 
cj 
… 
cm 
n= 
where Oij’s are observed frequencies.
The expected frequencies corresponding to Oij is calculated as . The is computed as
where
Oij – observed frequencies
Eij – Expected frequencies
n= number of rows
m= number of columns
It can be verified that
This is distributed as with (n1) (m1) d.f.
2x2 – contingency table
When the number of rows and numberof columns are equal to 2 it is termed as 2 x 2 contingency table .It will be in the following form

B1 B2 
Row Total 
A1 A2 
a b c d 
a+b r1 c+d r2 
Column 
a+c b+d c1 c2 
a+b+c+d 
Where a, b, c and d are cell frequancies c1 and c2 are column totals, r1 and r2 are row totals and n is the total number of observations.
In case of 2 x 2 contigency table can be directly found using the short cut formula,
The d.f associated with is (21) (21) =1
Yates correction for continuity
If anyone of the cell frequency is < 5, we use Yates correction to make as continuous. The yares correction is made by adding 0.5 to the least cell frequency and adjusting the other cell frequencies so that the column and row totals remain same . suppose, the firat cell frequency is to be corrected then the consigency table will be as follows:

B1 
B2 
Row Total 
A1 A2 
a 
b 
a+b=r1 
c 
d 
c+d =r2 

Column 
a+c=c1 
b+d=c2 
n = a+b+c+d 
Then use the  statistic as
The d.f associated with is (21) (21) =1
Exapmle 2
The severity of a disease and blood group were studied in a research projest. The findings sre given in the following table, knowmn as the m xn contingency table. Can this severity of the condition and blood group are associated.
Severity of a disease classified by blood group in 1500 patients.
Condition 
Blood Groups 
Total 

O 
A 
B 
AB 

Severe 
51 
40 
10 
9 
110 
Moderate 
105 
103 
25 
17 
250 
Mild 
384 
527 
125 
104 
1140 
Total 
540 
670 
160 
130 
1500 
Solution
H0: The severity of the disease is not associated with blood group.
H1: The severity of the disease is associated with blood group.
Calculation of Expected frequencies
Condition 
Blood Groups 
Total 

O 
A 
B 
AB 

Severe 
39.6 
49.1 
11.7 
9.5 
110 
Moderate 
90.0 
111.7 
26.7 
21.7 
250 
Mild 
410.4 
509.2 
121.6 
98.8 
1140 
Total 
540 
670 
160 
130 
1500 
Test statistic:
The d.f. associated with the is (31)(41) = 6
Calculations
Oi 
Ei 
OiEi 
(OiEi)2 
(OiEi)2/Ei 
51 
39.6 
11.4 
129.96 
3.2818 
40 
49.1 
9.1 
82.81 
1.6866 
10 
11.7 
1.7 
2.89 
0.2470 
9 
9.5 
0.5 
0.25 
0.0263 
105 
90.0 
15 
225.00 
2.5000 
103 
111.7 
8.7 
75.69 
0.6776 
25 
26.7 
1.7 
2.89 
0.1082 
17 
21.7 
4.7 
22.09 
1.0180 
384 
410.4 
26.4 
696.96 
1.6982 
527 
509.2 
17.8 
316.84 
0.6222 
125 
121.6 
3.4 
11.56 
0.0951 
104 
98.8 
5.2 
27.04 
0.2737 
Total 
12.2347 
\=12.2347
Table value of for 6 d.f. at 5% level of significance is 12.59
Inference
<tab
We accept the null hypothesis.
The severity of the disease has no association with blood group.
Example 3
In order to determine the possible effect of a chemical treatment on the rate of germination of cotton seeds a pot culture experiment was conducted. The results are given below
Chemical treatment and germination of cotton seeds

Germinated 
Not germinated 
Total 
Chemically Treated 
118 
22 
140 
Untreated 
120 
40 
160 
Total 
238 
62 
300 
Does the chemical treatrment improve the germination rate of cotton seeds?
Solution
H0:The chemical treatment does not improve the germination rate of cotton seeds.
H1: The chemical treatment improves the germination rate of cotton seeds.
Level of significance = 1%
Test statistic
Table value
(1) d.f. at 1 % L.O.S = 6.635
Inference
<tab
We accept the null hypothesis.
The chemical treatmentwill not improve the germination rate of cotton seeds significantly.
Example 4
In an experiment on the effect of a growth regulator on fruit setting in muskmelon the following results were obtained. Test whether the fruit setting in muskmelon and the application of growth regulator are independent at 1% level.

Fruit set 
Fruit not set 
Total 
Treated 
16 
9 
25 
Control 
4 
21 
25 
Total 
20 
30 
50 
Solution
H0:Fruit setting in muskmelon does not depend on the application of growth regulator.
H1: Fruit setting in muskmelon depend on the application of growth regulator.
Level of significance = 1%
After Yates correction we have

Fruit set 
Fruit not set 
Total 
Treated 
15.5 
9.5 
25 
Control 
4.5 
20.5 
25 
Total 
20 
30 
50 
Tet statistic
Table value
(1) d.f. at 1 % level of significance is 6.635
Inference
>tab
We reject the null hypothesis.
Fruit setting in muskmelon is influenced by the growth regulator. Application of growth regulator will increase fruit setting in musk melon.
Download this lecture as PDF here 