5. Model Building, Selection, and Implementation
5.1. Model Estimation Methods
Several methods are used to estimate the parameters of econometric models:
- Least Squares (LS) Method: A best-fit technique that minimizes the sum of squared residuals.
- Ordinary Least Squares (OLS): Used for standard linear regression. It is the Best Linear Unbiased Estimator (BLUE) under standard assumptions.
- Weighted Least Squares (WLS): Used when residuals are heteroscedastic (have non-constant variance).
- Generalized Least Squares (GLS): Used when residuals are both heteroscedastic and autocorrelated.
- Maximum Likelihood Estimation (MLE): Selects the parameter values that maximize the likelihood (probability) of observing the actual sample data, given an assumed probability distribution. For normally distributed variables, MLE and OLS yield identical results.
- Method of Moments (MOM): Estimates parameters by equating the theoretical moments of a distribution (which are functions of the parameters) to the empirical moments calculated from the sample. The Generalized Method of Moments (GMM) is a more robust extension.
5.2. Model Selection and Key Pitfalls
The process of choosing the best model is fraught with challenges due to the scarcity of financial data relative to its complexity.
- Overfitting: This occurs when a model is overly complex and fits the random noise in the training data rather than the underlying structural process. Such models perform well in-sample but have poor out-of-sample forecasting ability.
- Data Snooping: This is the failure to perform out-of-sample validation. It involves training and testing a model on the same data, which leads to a severe upward bias in perceived performance.
- Survivorship Bias: This bias arises when a sample is selected based on criteria from the end of a period, thus excluding entities that failed to “survive.” For example, a database of currently listed stocks omits those that went bankrupt, biasing historical return estimates upward.
- Model Selection Criteria: To combat overfitting, criteria like the Akaike Information Criterion (AIC) and Bayesian Information Criterion (BIC) are used. These criteria select the best model by balancing goodness-of-fit with a penalty term for the number of parameters estimated.
5.3. Formulating Investment Strategies
A disciplined quantitative research process is essential for converting econometric models into profitable investment strategies.
- Three Phases of Quantitative Research:
- Ex Ante Justification: The process must begin with a justification based on sound financial economic theory, not just a pattern observed in the data.
- Survivorship-Free Sample: All backtesting must be performed on data that is free from survivorship bias to accurately reflect historical conditions.
- Model Estimation: The methodology should be chosen carefully, with a preference for simplicity and parsimony (using fewer explanatory variables).
- Common Fallacies:
- Overmining of Data: Testing too many variables or models on the same dataset almost guarantees finding spurious patterns.
- Statistical Significance ≠ Alpha: A statistically significant relationship in a historical backtest does not guarantee future excess returns. The economic magnitude of the effect and transaction costs are critical.
- Safeguards Against Data Snooping:
- Rigorous Out-of-Sample Testing: Strictly separating training, testing, and confirmation periods.
- Testing Against Random Walk: A valid strategy should produce no excess returns when tested on artificially generated random data.
- Independent Risk Control: Risk management (e.g., minimizing tracking error) should be an independent overlay applied after stock selection, not an ad hoc adjustment to the return prediction model.