Answer Key
- The development of financial econometrics was made possible by three key factors: (1) the availability of data at any desired frequency, including the transaction level; (2) the availability of powerful desktop computers at an affordable cost; and (3) the availability of off-the-shelf econometric software. The combination of these elements put advanced econometrics within the reach of most financial firms.
- The three steps are model selection, model estimation, and model testing. Model selection involves choosing a family of models based on statistical properties and financial theory. Model estimation uses sample data to determine the optimal model’s parameters. Model testing, or backtesting, assesses the model’s forecasting abilities on new data not used during estimation.
- The data generating process (DGP) refers to the underlying quantitative law or relationship that a financial econometric model seeks to capture. The basic principle is that, similar to the physical sciences, there are relationships between financial variables that hold consistently across different moments and asset classes.
- A spurious regression is a problem that arises from data that seem to be correlated but are not, an effect caused by accidental comovements. This problem occurs specifically when all the variables included in the regression system are nonstationary, which may lead a researcher to conclude that a relationship exists when it is merely a statistical artifact.
- The coefficient of determination (R²), is a measure of the goodness-of-fit of a regression, indicating the percentage of variation in the dependent variable that is explained by the independent variable(s). A value of R² = 0 signifies that there is no discernible linear relationship, while a value of R² = 1 signifies a perfect fit where all of the variation is explained by the model.
- Multicollinearity is a problem that arises from high correlations among the independent variables in a multiple regression. Its primary consequence is that it prevents the researcher from obtaining insight into the true contribution of each individual independent variable, as the model cannot produce distinct regression coefficients for the variables involved.
- The key distinction is that in a multiple regression model, all variables (both dependent and explanatory) are observable. In contrast, a factor model assumes that the observed variables can be represented as a multiple linear regression on a number of unobserved, or hidden, variables known as factors.
- Cointegration exists if there is a linear combination of two or more nonstationary variables that is stationary. Its presence suggests that the variables share long-run links and a common stochastic trend, meaning they may deviate from each other in the short run but are likely to return to a long-run equilibrium relationship.
- In a standard ARCH model, the current conditional variance (volatility) is modeled as a function of only the past values of the squared process returns. A GARCH model generalizes this by modeling current volatility as a function of not only past squared returns but also past values of the volatility itself.
- Survivorship bias is a bias found in samples that are selected based on criteria valid at the last date of the sample period, thereby excluding entities that did not survive until that date. This is a problem because it often leads to overestimation of past returns and performance, as poorly performing funds or companies that closed down are systematically excluded from the analysis.