Factor Analysis of Cross-Classified Data
In many scientific fields, notably psychology and other social sciences, we are often interested in quantities, such as intelligence or social status, that are not directly measurable. However, it is often possible to measure other quantities which reflect the underlying variable of interest. Factor analysis is an attempt to explain the correlations between observable variables in terms of underlying factors, which are themselves not directly observable. For example, measurable quantities such as performance on a series of tests can be explained in terms of an underlying factor such as intelligence.
At first glimpse, factor analysis closely resembles principal components analy-
sis. Both use linear combinations of variables to explain sets of observations of many variables. In principal component analysis, the observed variables are themselves the quantities of interest. The combination of these variables in the principal components is primarily a tool for simplifying the interpretation of the observed variables. Principal components analysis is merely a transformation of the data. No assumptions are made about the form of the covariance matrix of the data. On the other hand, factor analysis assumes that the data comes from a statistical model which can be expressed in terms of a few underlying, but unobservable, random quantities called factors and some additional sources of variation called error. Factor analysis can be considered as an extension of principal components analysis. Both can be viewed as attempts to approximate the covariance matrix. Applications of PCA and factor analysis have become very popular in many fields such as psychology, economics, sociology, meteorology, medicine, political science, taxonomy and archaeology. Both of them have been successfully used in acoustic and phonetic research on tongue position by Harshman et al. (1977) , Jackson (1988), Nix et al. (1996), and Stone et al. (1997).
The PARAFAC model was pioneered by Harshman et al. (1977). It is a technique for extracting “articulatory prime” shapes from data allowing non-orthogonal components to scale differently for different speakers. The main concern underlying the PARAFAC model is how to modify the small set of prime shapes with large variance of sound production for different speakers, without requiring large numbers of parameters for all speaker and sound combinations. PCA might do well in reducing the dimension without extracting the behaviors for individual speaker differences. On the other hand, the PARAFAC model succeeds in decomposing tongue shape data into tongue shape factors. In my thesis, PCA, Factor Analysis and the PARAFAC model are introduced. A model hierarchy is defined, and then is applied to coronal tongue cross-section ultrasound data of multiple subjects collected in the laboratory of Dr. M. Stone. We also discuss the interpretation for the tongue data of the assumptions defining the models presented. Then we present data analytic results to distinguish which model is adequate.