Franz Hilpold, Evaluationsstelle für die deutsche Schule in Südtirol, Burgstall, Italy Email: Franz.Hilpold@gmail.com
Lecture at the International Educational Science Congress in Samos, July 2011
In the social sciences we are often trying to measure things that cannot directly be measured (socalled latent variables). Researchers might, for example, be interested in measuring the teacherstudent relationship. Said relationship, however, cannot be measured directly because it has many facets and the term “relationship” is not clearly defined. The items in our questionnaires are directly referred to schoolrelated topics and always remain within the boundaries of the predefined quality framework. Since each of the nine questionnaire types generates very many items, it is to be expected that by combining the resulting variables a number of less obvious motives, reasons, latent characteristics and attitudes of the people who are directly concerned with school matters will emerge. Indeed, the exploratory factor analysis method applied for this data set, which actually consists of nine disjoint and immiscible data sets, furnished some very interesting findings:

v1 
v2 
v3 
v4 
v5 
v6 
v7 
v8 
v9 
v10 

v1 

1 









v2 

0,34 
1 








v3 

0,41 
0,12 
1 







v4 

0,28 
0,16 
0,22 
1 






v5 

0,78 
0,22 
0,35 
0,28 
1 





v6 

0,67 
0,28 
0,17 
0,33 
0,59 
1 




v7 

0,43 
0,11 
0,16 
0,57 
0,15 
0,22 
1 



v8 

0,31 
0,41 
0,19 
0,61 
0,09 
0,13 
0,68 
1 


v9 

0,72 
0,40 
0,32 
0,28 
0,63 
0,61 
0,31 
0,43 
1 

v10 

0,19 
0,21 
0,35 
0,26 
0,13 
0,47 
0,36 
0,22 
0,36 
1 
We consider only the half of the triangle below the diagonal.
The significance level of each correlation is not stated here, but it must still be considered. Correlations that are not significant are marked in italics. Variable pairs with no significant correlations were not considered.
Obviously there is a rather strong correlation between v1, v5, v6 and v9, the same is true for v4, v7 and v8.
V1: My child is provided with the opportunity to develop his/her personal abilities and talents. V5: The rapport between teachers and studentsis characterised by respect and approachability. V6: In my view gifted students’ talents are fostered appropriately.
V9: My daughter’s /son’s teachers adopt replicable evaluation standards.
Already it is becoming obvious that the two item groups could somehow be combined, but it is at the discretion of the researcher to find a generic title under which the four items are summarised. Possible titles could be “Fostering children’s talents” or maybe even better “Respecting a child’s personality”. This is not a factor yet, since another series of steps are required.
The second group of variables (marked in green here) is more difficult to be combined:
V4: The school provides the pupils with ample opportunity to catch up on failed educational objectives V7: As far as I know the school is well organised and well administered.
V8: The school offers a lot of extracurricular activities.
The whole process of an exploratory factor analysis shall be explained here by using a concrete example. For this purpose we choose a parent questionnaire in lower secondary school. It contains 30questions, one of which is an open answer question: “Please feel free to write down any additional comments or suggestions“. It will not be considered in this context. All the other 29 questions reflect the quality framework, assuming that the parents know about what is going on in the school. The questionnaire was distributed to 3282 parents and 2888 valid questionnaires were returned, which corresponds to a return rate of 88%.
First of all we create a correlation matrix for all variables (items). Unfortunately it cannot be shown here entirely, as it is a 29x29 matrix.

ANS_E 
NIV_E 
INT_E 
ABL_E 
SEL_E 
PER_E 
KLA_E 
LER_E 

Correlation 
ANS_E 
1,000 
,476 
,390 
,244 
,176 
,368 
,409 
,426 

NIV_E 
,476 
1,000 
,474 
,286 
,177 
,370 
,540 
,508 

INT_E 
,390 
,474 
1,000 
,221 
,238 
,402 
,400 
,393 

ABL_E 
,244 
,286 
,221 
1,000 
,029 
,228 
,263 
,349 

SEL_E 
,176 
,177 
,238 
,029 
1,000 
,408 
,221 
,105 

PER_E 
,368 
,370 
,402 
,228 
,408 
1,000 
,463 
,377 

KLA_E 
,409 
,540 
,400 
,263 
,221 
,463 
1,000 
,528 

LER_E 
,426 
,508 
,393 
,349 
,105 
,377 
,528 
1,000 
KaiserMeyerOlkin Measure of Sampling Adequacy. 
,962 
Bartlett's Test of Sphericity Approx. ChiSquare 
22022,034 
df 
406 
Sig. 
,000 
The
KMO takes values between 0 and 1 and is the higher the more adequate
the items are for factor analysis. In our specific case the level of
adequacy is very good.
There
are two more criteria that are of importance and must be added to the
settings. First of all, the Kaiser MeyerOlkin criterion (KMO)
indicates how adequate the items are for factor analysis. Applying our
data we have a KMO value of0.962. The second criterion is
the BartlettTest,which we only use to know the level of
significance. It
takes the value 0.000 and thus indicates that in the population
there must at least be correlations between some of the
variables. Therefore
the null hypothesis, according to which all correlation
coefficients in
a population take the value zero, can be rejected. This can also be
observed, however, by directly looking at the Rmatrix and the
respective significance matrix.
There are several statistical methods for determining factors. As mentioned above we have chosen the Principal Component Analysis. We can imagine the set of items as an Ndimensional coordinate system in which, at first, the dimension equals the number of items. The next step is to reduce the dimensionality by extracting the component (=factor) which contributes a maximum to the total variance. The next main component to be extracted is the one contributing a maximum to the residual variance, and so forth. Theoretically one could extract as many factors as there are items, which, of course is hardly reasonable if we want to achieve a reduction in dimension. In order to decide about how many factors to create, we look at the factor loadings. The factor loading of a single variable in Prinipal Component Analysis is the correlation between the factor and the variable. In order to allocate the variables to a factor we consider the items that have a large loading on that specific factor. The ideal would be the socalled simple structure where each variable loads on only one factor and the loading is large. The successive extraction follows mathematical algorithms, which may have the consequence that the factors cannot be meaningfully interpreted. Since the extraction of factors is about explaining total variance and residual variance, an unfavourable distribution of the factor loadings is a common result. Rotating the axes leads to a better distribution of the factor loadings and thus improves the possibilities of interpretation. There are a number of different rotations that could be applied. We choose orthogonal rotation, which, in contrast to oblique rotation, preserves the independence of the factors.
Item Code 
Initial 
Extraction 
Item Code 
Initial 
Extraction 
ANS_E NIV_E INT_E ABL_E SEL_E PER_E KLA_E LER_E EVA_E ZIE_E LBG_E LSW_E LST_E FEE_E 
1,000 1,000 1,000 1,000 1,000 1,000 1,000 1,000 1,000 1,000 1,000 1,000 1,000 1,000 
,467 ,595 ,536 ,371 ,743 ,590 ,592 ,567 ,722 ,352 ,526 ,530 ,477 ,525 
BEW_E 
1,000 
,454 
PRÜ_E 
1,000 
,502 

RUE_E 
1,000 
,440 

STÖ_E 
1,000 
,608 

RES_E 
1,000 
,606 

LVH_E 
1,000 
,503 

UMF_E 
1,000 
,598 

UMG_E 
1,000 
,561 

SGE_E 
1,000 
,600 

FZI_E 
1,000 
,642 

VER_E 
1,000 
,524 

ELT_E 
1,000 
,629 

UNT_E 
1,000 
,621 

STP_E 
1,000 
,395 
The eigen values of the factors tell us which portion of the total variance of all variables is explained by a factor.
Figure: (excerpt)
Component 
Initial Eigenvalues 
Extraction Sums of Squared Loadings 
Rotation Sums of Squared Loadings 

Total 
% of Variance 
Cumulative % 
Total 
% of Variance 
Cumulative % 
Total 
% of Variance 
Cumulative % 


1 
10,648 
36,719 
36,719 
10,648 
36,719 
36,719 
5,035 
17,362 
17,362 

2 
1,808 
6,234 
42,953 
1,808 
6,234 
42,953 
3,371 
11,625 
28,987 

3 
1,226 
4,229 
47,181 
1,226 
4,229 
47,181 
2,944 
10,152 
39,139 

4 
1,158 
3,992 
51,174 
1,158 
3,992 
51,174 
2,278 
7,855 
46,993 

5 
1,000 
3,448 
54,622 
1,000 
3,448 
54,622 
2,212 
7,629 
54,622 

6 
,837 
2,888 
57,510 






dimen sion 
7 8 9 
,821 ,807 ,756 
2,831 2,783 2,607 
60,341 63,123 65,731 


10 
,706 
2,436 
68,167 


11 
,681 
2,347 
70,514 


12 
,659 
2,272 
72,785 


13 
,626 
2,160 
74,945 


14 
,599 
2,064 
77,009 


15 
,564 
1,946 
78,955 
From this table we understand that by extracting 5 components (=factors) 54.62 % of the total variance are explained after rotation. The total explanation is the same before and after rotation, only the distribution of the loadings on the single factors has changed.
Let us now take a look at the single items and their loading on the factors:

Component 

1 
2 
3 
4 
5 

ELT_E 
,662 
,128 
,396 
,129 
,005 
LBG_E 
,652 
,221 
,042 
,126 
,185 
PRÜ_E 
,639 
,105 
,166 
,221 
,081 
LSW_E 
,638 
,220 
,029 
,147 
,226 
FEE_E 
,629 
,212 
,285 
,048 
,019 
UNT_E 
,603 
,069 
,491 
,094 
,049 
BEW_E 
,592 
,217 
,200 
,103 
,078 
KLA_E 
,557 
,417 
,116 
,227 
,207 
RUE_E 
,501 
,250 
,277 
,102 
,197 
LVH_E 
,499 
,380 
,276 
,160 
,086 
ZIE_E 
,443 
,305 
,129 
,199 
,076 
NIV_E 
,299 
,655 
,192 
,113 
,163 
INT_E 
,199 
,631 
,089 
,061 
,294 
ANS_E 
,224 
,585 
,191 
,074 
,183 
LER_E 
,357 
,580 
,283 
,140 
,051 
LST_E 
,286 
,573 
,131 
,177 
,135 
ABL_E 
,032 
,462 
,334 
,186 
,100 
SGE_E 
,211 
,268 
,641 
,262 
,057 
VER_E 
,338 
,084 
,610 
,156 
,079 
FZI_E 
,362 
,366 
,598 
,129 
,057 
STP_E 
,108 
,199 
,584 
,016 
,052 
ERG_E 
,398 
,347 
,415 
,196 
,279 
STÖ_E 
,166 
,251 
,013 
,712 
,100 
UMF_E 
,100 
,002 
,207 
,702 
,230 
RES_E 
,433 
,266 
,108 
,580 
,016 
UMG_E 
,282 
,154 
,401 
,545 
,011 
SEL_E 
,092 
,021 
,021 
,053 
,855 
EVA_E 
,189 
,116 
,035 
,102 
,813 
PER_E 
,202 
,291 
,221 
,353 
,540 
The variables are distributed disjointly on the factors and the desired onedimensionality is given.
Our next task is to find a name for the factors by interpreting their meaning. We have chosen the following interpretation:
Factor 1: The teachers are responsive to requests of parents mean = 4,08 
There is readiness to talk among teachers and parents. 
,662 
The teachers are responsive to my child’s strengths and weaknesses. 
,652 

As far as I can tell the way of testing is fair. 
,639 

In my opinion teachers are considerate of pupils who take more time. 
,638 

The parents are well informed about their children’s progress in their learning and their personal development. 
,629 

Parents have their say on school issues that concern them closely. 
,603 

Tuition is of good subjectspecific quality 
,655 
Factor 2: In my 
My child finds the topics covered in the lessons appealing and challenging. 
,631 
view gifted pupils' 

The achievement demanded from pupils is appropriate. 
,585 

talents are fostered 

appropriately 

As far as I know the teaching methods and forms of learning adopted by the teachers are varied. 
,580 

mean = 4,01 


In my view gifted pupils' talents are fostered appropriately. 
,573 
Factor 3: School management Factor 4: Good manners
Factor 5: Working autonomous
The factors consecutively show what parents are most interested in. The factors are in a hierarchical order: Most important is that teachers respond to the parents‘ wishes. Second most important for parents is that the children are challenged appropriately.
Limitations
_______________________________
1 In English specialist literature PCA is often not classified as a special variant of factor analysis, but considered an independent method.
Coolican, H. (2004) ‘Research methods and statistics in psychology’, London, Hodder Arnold (4rd edn).
Bortz, J., Döring, N. (1995) ‘Forschungsmethoden und Evaluation für Sozialwissenschaftler’, Berlin, Heidelberg, New York, SpringerVerlag.
Baur, N., Fromm, S. (2008) ‘Datenanalyse mit SPSS für Fortgeschrittene’, Wiesbaden, VS Verlag für Sozialwissenschaften.