4. Factor Models and Dimensionality Reduction
4.1. Factor Analysis
Factor models assume that the returns of a large number of assets can be explained by a smaller number of unobserved, or hidden, common factors.
- Model: yᵢₜ = aᵢ + bᵢ₁f₁ₜ + … + bᵢₖfₖₜ + εᵢₜ
- fₖₜ are the common factors.
- bᵢₖ are the factor loadings, representing the sensitivity of asset i to factor k.
- εᵢₜ is the idiosyncratic (asset-specific) residual.
- Assumptions: Factors and residuals have zero mean, and residuals are uncorrelated with factors. In classical factor models, residuals are also mutually uncorrelated.
- Fundamental Relationship: The covariance matrix of asset returns (Σ) can be decomposed as Σ = BB’ + Ψ, where B is the matrix of factor loadings and Ψ is the diagonal matrix of residual variances.
- Factor Indeterminacy: In finite models, factors cannot be uniquely determined. Instead, factor scores are estimated to approximate them.
4.2. Principal Components Analysis (PCA)
PCA is a statistical technique for data reduction, not a formal statistical model. It transforms a set of correlated variables into a set of linearly uncorrelated variables called principal components.
- Process: PCA finds the eigenvectors and eigenvalues of the data’s covariance matrix. The principal components are the original data multiplied by the eigenvectors.
- Interpretation: The eigenvalues represent the variance explained by each principal component. Typically, a small number of components corresponding to the largest eigenvalues can explain most of the variation in the original data.
- Key Differences from Factor Analysis:
- PCA is a data transformation; factor analysis assumes an underlying statistical model.
- Principal components are observable linear combinations of the original data; factors are generally unobserved.
- The residuals from a PCA representation are generally correlated, whereas in classical factor analysis they are not.