Franz Hilpold, Evaluationsstelle für die deutsche Schule in Südtirol, Burgstall, Italy Email: Franz.Hilpold@gmail.com
Lecture at the International Educational Science Congress in Samos, July 2011
In the social sciences we are often trying to measure things that cannot directly be measured (so-called latent variables). Researchers might, for example, be interested in measuring the teacher-student relationship. Said relationship, however, cannot be measured directly because it has many facets and the term “relationship” is not clearly defined. The items in our questionnaires are directly referred to school-related topics and always remain within the boundaries of the predefined quality framework. Since each of the nine questionnaire types generates very many items, it is to be expected that by combining the resulting variables a number of less obvious motives, reasons, latent characteristics and attitudes of the people who are directly concerned with school matters will emerge. Indeed, the exploratory factor analysis method applied for this data set, which actually consists of nine disjoint and immiscible data sets, furnished some very interesting findings:
|
v1 |
v2 |
v3 |
v4 |
v5 |
v6 |
v7 |
v8 |
v9 |
v10 |
|
v1 |
|
1 |
|
|
|
|
|
|
|
|
|
v2 |
|
0,34 |
1 |
|
|
|
|
|
|
|
|
v3 |
|
0,41 |
0,12 |
1 |
|
|
|
|
|
|
|
v4 |
|
0,28 |
0,16 |
0,22 |
1 |
|
|
|
|
|
|
v5 |
|
0,78 |
0,22 |
0,35 |
0,28 |
1 |
|
|
|
|
|
v6 |
|
0,67 |
0,28 |
0,17 |
0,33 |
0,59 |
1 |
|
|
|
|
v7 |
|
0,43 |
0,11 |
0,16 |
0,57 |
0,15 |
0,22 |
1 |
|
|
|
v8 |
|
0,31 |
0,41 |
0,19 |
0,61 |
0,09 |
0,13 |
0,68 |
1 |
|
|
v9 |
|
0,72 |
0,40 |
0,32 |
0,28 |
0,63 |
0,61 |
0,31 |
0,43 |
1 |
|
v10 |
|
0,19 |
0,21 |
0,35 |
0,26 |
0,13 |
0,47 |
0,36 |
0,22 |
0,36 |
1 |
We consider only the half of the triangle below the diagonal.
The significance level of each correlation is not stated here, but it must still be considered. Correlations that are not significant are marked in italics. Variable pairs with no significant correlations were not considered.
Obviously there is a rather strong correlation between v1, v5, v6 and v9, the same is true for v4, v7 and v8.
V1: My child is provided with the opportunity to develop his/her personal abilities and talents. V5: The rapport between teachers and studentsis characterised by respect and approachability. V6: In my view gifted students’ talents are fostered appropriately.
V9: My daughter’s /son’s teachers adopt replicable evaluation standards.
Already it is becoming obvious that the two item groups could somehow be combined, but it is at the discretion of the researcher to find a generic title under which the four items are summarised. Possible titles could be “Fostering children’s talents” or maybe even better “Respecting a child’s personality”. This is not a factor yet, since another series of steps are required.
The second group of variables (marked in green here) is more difficult to be combined:
V4: The school provides the pupils with ample opportunity to catch up on failed educational objectives V7: As far as I know the school is well organised and well administered.
V8: The school offers a lot of extracurricular activities.
The whole process of an exploratory factor analysis shall be explained here by using a concrete example. For this purpose we choose a parent questionnaire in lower secondary school. It contains 30questions, one of which is an open answer question: “Please feel free to write down any additional comments or suggestions“. It will not be considered in this context. All the other 29 questions reflect the quality framework, assuming that the parents know about what is going on in the school. The questionnaire was distributed to 3282 parents and 2888 valid questionnaires were returned, which corresponds to a return rate of 88%.
First of all we create a correlation matrix for all variables (items). Unfortunately it cannot be shown here entirely, as it is a 29x29 matrix.
|
ANS_E |
NIV_E |
INT_E |
ABL_E |
SEL_E |
PER_E |
KLA_E |
LER_E |
|
Correlation |
ANS_E |
1,000 |
,476 |
,390 |
,244 |
,176 |
,368 |
,409 |
,426 |
|
NIV_E |
,476 |
1,000 |
,474 |
,286 |
,177 |
,370 |
,540 |
,508 |
|
INT_E |
,390 |
,474 |
1,000 |
,221 |
,238 |
,402 |
,400 |
,393 |
|
ABL_E |
,244 |
,286 |
,221 |
1,000 |
,029 |
,228 |
,263 |
,349 |
|
SEL_E |
,176 |
,177 |
,238 |
,029 |
1,000 |
,408 |
,221 |
,105 |
|
PER_E |
,368 |
,370 |
,402 |
,228 |
,408 |
1,000 |
,463 |
,377 |
|
KLA_E |
,409 |
,540 |
,400 |
,263 |
,221 |
,463 |
1,000 |
,528 |
|
LER_E |
,426 |
,508 |
,393 |
,349 |
,105 |
,377 |
,528 |
1,000 |
Kaiser-Meyer-Olkin Measure of Sampling Adequacy. |
,962 |
Bartlett's Test of Sphericity Approx. Chi-Square |
22022,034 |
df |
406 |
Sig. |
,000 |
The
KMO takes values between 0 and 1 and is the higher the more adequate
the items are for factor analysis. In our specific case the level of
adequacy is very good.
There
are two more criteria that are of importance and must be added to the
settings. First of all, the Kaiser- Meyer-Olkin criterion (KMO)
indicates how adequate the items are for factor analysis. Applying our
data we have a KMO value of0.962. The second criterion is
the Bartlett-Test,which we only use to know the level of
significance. It
takes the value 0.000 and thus indicates that in the population
there must at least be correlations between some of the
variables. Therefore
the null hypothesis, according to which all correlation
coefficients in
a population take the value zero, can be rejected. This can also be
observed, however, by directly looking at the R-matrix and the
respective significance matrix.
There are several statistical methods for determining factors. As mentioned above we have chosen the Principal Component Analysis. We can imagine the set of items as an N-dimensional coordinate system in which, at first, the dimension equals the number of items. The next step is to reduce the dimensionality by extracting the component (=factor) which contributes a maximum to the total variance. The next main component to be extracted is the one contributing a maximum to the residual variance, and so forth. Theoretically one could extract as many factors as there are items, which, of course is hardly reasonable if we want to achieve a reduction in dimension. In order to decide about how many factors to create, we look at the factor loadings. The factor loading of a single variable in Prinipal Component Analysis is the correlation between the factor and the variable. In order to allocate the variables to a factor we consider the items that have a large loading on that specific factor. The ideal would be the so-called simple structure where each variable loads on only one factor and the loading is large. The successive extraction follows mathematical algorithms, which may have the consequence that the factors cannot be meaningfully interpreted. Since the extraction of factors is about explaining total variance and residual variance, an unfavourable distribution of the factor loadings is a common result. Rotating the axes leads to a better distribution of the factor loadings and thus improves the possibilities of interpretation. There are a number of different rotations that could be applied. We choose orthogonal rotation, which, in contrast to oblique rotation, preserves the independence of the factors.
Item Code |
Initial |
Extraction |
Item Code |
Initial |
Extraction |
ANS_E NIV_E INT_E ABL_E SEL_E PER_E KLA_E LER_E EVA_E ZIE_E LBG_E LSW_E LST_E FEE_E |
1,000 1,000 1,000 1,000 1,000 1,000 1,000 1,000 1,000 1,000 1,000 1,000 1,000 1,000 |
,467 ,595 ,536 ,371 ,743 ,590 ,592 ,567 ,722 ,352 ,526 ,530 ,477 ,525 |
BEW_E |
1,000 |
,454 |
PRÜ_E |
1,000 |
,502 |
|||
RUE_E |
1,000 |
,440 |
|||
STÖ_E |
1,000 |
,608 |
|||
RES_E |
1,000 |
,606 |
|||
LVH_E |
1,000 |
,503 |
|||
UMF_E |
1,000 |
,598 |
|||
UMG_E |
1,000 |
,561 |
|||
SGE_E |
1,000 |
,600 |
|||
FZI_E |
1,000 |
,642 |
|||
VER_E |
1,000 |
,524 |
|||
ELT_E |
1,000 |
,629 |
|||
UNT_E |
1,000 |
,621 |
|||
STP_E |
1,000 |
,395 |
The eigen values of the factors tell us which portion of the total variance of all variables is explained by a factor.
Figure: (excerpt)
Component |
Initial Eigenvalues |
Extraction Sums of Squared Loadings |
Rotation Sums of Squared Loadings |
|||||||
Total |
% of Variance |
Cumulative % |
Total |
% of Variance |
Cumulative % |
Total |
% of Variance |
Cumulative % |
||
|
1 |
10,648 |
36,719 |
36,719 |
10,648 |
36,719 |
36,719 |
5,035 |
17,362 |
17,362 |
|
2 |
1,808 |
6,234 |
42,953 |
1,808 |
6,234 |
42,953 |
3,371 |
11,625 |
28,987 |
|
3 |
1,226 |
4,229 |
47,181 |
1,226 |
4,229 |
47,181 |
2,944 |
10,152 |
39,139 |
|
4 |
1,158 |
3,992 |
51,174 |
1,158 |
3,992 |
51,174 |
2,278 |
7,855 |
46,993 |
|
5 |
1,000 |
3,448 |
54,622 |
1,000 |
3,448 |
54,622 |
2,212 |
7,629 |
54,622 |
|
6 |
,837 |
2,888 |
57,510 |
|
|
|
|
|
|
dimen sion |
7 8 9 |
,821 ,807 ,756 |
2,831 2,783 2,607 |
60,341 63,123 65,731 |
||||||
|
10 |
,706 |
2,436 |
68,167 |
||||||
|
11 |
,681 |
2,347 |
70,514 |
||||||
|
12 |
,659 |
2,272 |
72,785 |
||||||
|
13 |
,626 |
2,160 |
74,945 |
||||||
|
14 |
,599 |
2,064 |
77,009 |
||||||
|
15 |
,564 |
1,946 |
78,955 |
From this table we understand that by extracting 5 components (=factors) 54.62 % of the total variance are explained after rotation. The total explanation is the same before and after rotation, only the distribution of the loadings on the single factors has changed.
Let us now take a look at the single items and their loading on the factors:
|
Component |
||||
1 |
2 |
3 |
4 |
5 |
|
ELT_E |
,662 |
,128 |
,396 |
,129 |
-,005 |
LBG_E |
,652 |
,221 |
,042 |
,126 |
,185 |
PRÜ_E |
,639 |
,105 |
,166 |
,221 |
,081 |
LSW_E |
,638 |
,220 |
,029 |
,147 |
,226 |
FEE_E |
,629 |
,212 |
,285 |
,048 |
,019 |
UNT_E |
,603 |
,069 |
,491 |
,094 |
,049 |
BEW_E |
,592 |
,217 |
,200 |
,103 |
,078 |
KLA_E |
,557 |
,417 |
,116 |
,227 |
,207 |
RUE_E |
,501 |
,250 |
,277 |
,102 |
,197 |
LVH_E |
,499 |
,380 |
,276 |
,160 |
,086 |
ZIE_E |
,443 |
,305 |
,129 |
,199 |
,076 |
NIV_E |
,299 |
,655 |
,192 |
,113 |
,163 |
INT_E |
,199 |
,631 |
,089 |
,061 |
,294 |
ANS_E |
,224 |
,585 |
,191 |
,074 |
,183 |
LER_E |
,357 |
,580 |
,283 |
,140 |
,051 |
LST_E |
,286 |
,573 |
,131 |
,177 |
-,135 |
ABL_E |
,032 |
,462 |
,334 |
,186 |
-,100 |
SGE_E |
,211 |
,268 |
,641 |
,262 |
,057 |
VER_E |
,338 |
,084 |
,610 |
,156 |
,079 |
FZI_E |
,362 |
,366 |
,598 |
,129 |
,057 |
STP_E |
,108 |
,199 |
,584 |
,016 |
,052 |
ERG_E |
,398 |
,347 |
,415 |
,196 |
,279 |
STÖ_E |
,166 |
,251 |
,013 |
,712 |
,100 |
UMF_E |
,100 |
-,002 |
,207 |
,702 |
,230 |
RES_E |
,433 |
,266 |
,108 |
,580 |
,016 |
UMG_E |
,282 |
,154 |
,401 |
,545 |
-,011 |
SEL_E |
,092 |
,021 |
,021 |
,053 |
,855 |
EVA_E |
,189 |
,116 |
,035 |
,102 |
,813 |
PER_E |
,202 |
,291 |
,221 |
,353 |
,540 |
The variables are distributed disjointly on the factors and the desired one-dimensionality is given.
Our next task is to find a name for the factors by interpreting their meaning. We have chosen the following interpretation:
Factor 1: The teachers are responsive to requests of parents mean = 4,08 |
There is readiness to talk among teachers and parents. |
,662 |
The teachers are responsive to my child’s strengths and weaknesses. |
,652 |
|
As far as I can tell the way of testing is fair. |
,639 |
|
In my opinion teachers are considerate of pupils who take more time. |
,638 |
|
The parents are well informed about their children’s progress in their learning and their personal development. |
,629 |
|
Parents have their say on school issues that concern them closely. |
,603 |
|
Tuition is of good subject-specific quality |
,655 |
Factor 2: In my |
My child finds the topics covered in the lessons appealing and challenging. |
,631 |
view gifted pupils' |
||
The achievement demanded from pupils is appropriate. |
,585 |
|
talents are fostered |
||
appropriately |
||
As far as I know the teaching methods and forms of learning adopted by the teachers are varied. |
,580 |
|
mean = 4,01 |
||
|
In my view gifted pupils' talents are fostered appropriately. |
,573 |
Factor 3: School management Factor 4: Good manners
Factor 5: Working autonomous
The factors consecutively show what parents are most interested in. The factors are in a hierarchical order: Most important is that teachers respond to the parents‘ wishes. Second most important for parents is that the children are challenged appropriately.
Limitations
_______________________________
1 In English specialist literature PCA is often not classified as a special variant of factor analysis, but considered an independent method.
Coolican, H. (2004) ‘Research methods and statistics in psychology’, London, Hodder Arnold (4rd edn).
Bortz, J., Döring, N. (1995) ‘Forschungsmethoden und Evaluation für Sozialwissenschaftler’, Berlin, Heidelberg, New York, Springer-Verlag.
Baur, N., Fromm, S. (2008) ‘Datenanalyse mit SPSS für Fortgeschrittene’, Wiesbaden, VS Verlag für Sozialwissenschaften.