Glossary of Key Terms
Glossary of Key Terms
| Term | Definition |
| Active Portfolio Strategy | Uses available information and forecasting techniques to seek a better performance than a portfolio that is simply diversified broadly. |
| Active Return | The portfolio’s actual return minus the benchmark’s actual return. |
| Adjusted R-squared (Adjusted R²) | A redefined version of the coefficient of determination that penalizes the inclusion of independent variables that do not contribute to the explanatory power of the model. |
| Akaike Information Criterion (AIC) | A criterion for model selection that is calculated for every candidate model, where the “best” model is the one with the smallest AIC. It involves a trade-off between the model’s goodness of fit and a penalty for the number of estimated parameters. |
| Autocorrelation | The correlation between levels of a variable at time t and different times from the same variable. It is also referred to as serial correlation and lagged correlation. |
| Autoregressive Conditional Heteroscedasticity (ARCH) Model | A model of conditional heteroscedasticity where the variance of a variable at any time t depends on a constant term plus the product of a constant and the square of the previous time period’s return. |
| Autoregressive Moving Average (ARMA) Process | A combination of an autoregressive (AR) and moving average (MA) process used to describe the time series structure of data, modeled as a combination of past values and/or past errors. |
| Basis | The difference between the cash price and the futures price. |
| Basis Risk | The uncertainty about the basis at the time a hedge is to be lifted. Hedging involves the substitution of basis risk for price risk. |
| Bayesian Information Criterion (BIC) | A model selection criterion, also known as the Schwarz information criterion, that is similar to AIC but imposes a greater penalty for the number of parameters. |
| Best Linear Unbiased Estimate (BLUE) | An estimator that has the lowest possible variance among all linear unbiased estimators, a property of the Ordinary Least Squares (OLS) estimator according to the Gauss-Markov theorem. |
| Beta Factor (β) | A measure of the sensitivity of a security’s return to the market, formally defined as the covariance of the security with the market divided by the market’s variance. |
| Breakdown Bound (or Point) | The largest possible fraction of observations for which there is a bound on the change of an estimate when that fraction of the sample is altered without restrictions. |
| Categorical Variables | Variables that represent group membership, used to cluster input data into different groups. |
| Characteristic Line | A model for security excess returns, expressing the excess return of a security as a linear function of the market’s excess return. |
| Chow Test | An F-test used to gauge if all dummy variables in a regression are collectively irrelevant. |
| Coefficient of Determination (R²) | A measure of the goodness-of-fit of a regression line, indicating the percentage of the variation in the dependent variable that is explained by the explanatory variable(s). |
| Cointegration | A property of two or more nonstationary variables where there exists a linear combination of them that is stationary. It suggests the variables share long-run links and a common stochastic trend. |
| Conditional Frequency Distribution | The distribution of one component of a bivariate dataset given a certain value for the other component. |
| Confidence Interval | A random interval, computed from a sample, that contains the true but unknown parameter value with a pre-specified probability called the confidence level. |
| Contingency Coefficient | A measure based on the chi-square statistic used to determine the degree of dependence for any type of data, including qualitative data. |
| Correlation Coefficient | A measure of the degree of correlation obtained by dividing the covariance by the product of the respective standard deviations of the component variables. It can take on any value from -1 to 1. |
| Covariance | A measure that indicates whether two variables vary in the same or opposite direction. |
| Cross Hedging | The practice of hedging with a futures contract that is different from the underlying asset being hedged. |
| Cross-Sectional Data | Data collected by observing many subjects (such as individuals, firms, or countries) at the same point in time. |
| Data Generating Process (DGP) | The basic tenet of quantitative science that there are relationships that do not change regardless of the moment or the place under consideration; the underlying “law” a model seeks to capture. |
| Data Snooping | The failure to perform an out-of-sample validation on a separate test set; performing training and tests on the same data set. |
| Dichotomous Variable | An explanatory variable that distinguishes between only two categories. |
| Dummy Variable | A numerical variable that can assume the value of either 0 or 1, used to represent a dichotomous categorical variable in a regression. |
| Durbin-Watson Test | The most popular statistical test for the presence of autocorrelation of residuals in a regression model. |
| Dynamic Asset Allocation | An asset allocation strategy where the asset mix is mechanistically shifted in response to changing market conditions. |
| Econometrics | The branch of economics that draws heavily on statistics for testing and analyzing economic relationships. |
| Empirical Duration | A measure of an asset’s interest-rate sensitivity estimated empirically from historical returns using regression analysis. It is also referred to as regression-based duration. |
| Estimator | A function computed from sample data that approximates a parameter to be estimated. |
| Factor Analysis | The process of estimating factor models, where observed variables are explained by a smaller number of unobserved, hidden variables called factors. |
| Factor Indeterminacy | The problem in factor models where factors cannot be uniquely determined from the data. |
| Factor Model | A statistical model where observed variables are represented as a multiple linear regression on a number of unobserved, or hidden, variables called factors. |
| Financial Econometrics | The science of modeling and forecasting financial data such as asset prices, asset returns, interest rates, and risk exposure. It has been described as the econometrics of financial markets. |
| Forward-Looking Tracking Error | An estimate of tracking error that reflects the portfolio’s risk going forward, computed using factor risk models. It is also referred to as predicted or ex-ante tracking error. |
| Generalized Autoregressive Conditional Heteroscedasticity (GARCH) Model | A model where volatility depends not only on the past values of the process (squared returns) but also on the past values of volatility itself. |
| Generalized Least Squares (GLS) | An estimation method that applies when residuals are both heteroscedastic and autocorrelated. |
| Hedging | The employment of futures contracts as a substitute for a transaction to be made in the cash market, with the goal of transferring price risk. |
| Heteroscedasticity | A condition where the variance of the error terms in a regression model is not constant across observations. |
| Homoscedasticity | The assumption that the variance of the error terms in a regression model is constant across all observations. |
| Instrumental Variables (IV) | An estimation approach used when regressors are correlated with errors. It involves using new variables (instruments) that are correlated with the regressors but independent of the errors. |
| Least Squares (LS) Method | A best-fit estimation technique that chooses the parameters that minimize the sum of the squares of the distances between observed data and the model’s predictions. |
| Least Trimmed of Squares (LTS) Estimator | A robust estimation method that identifies a certain number of points with the largest residuals as outliers, discards them, and then performs a least squares estimation on the trimmed data set. |
| Likelihood | In Maximum Likelihood Estimation, it is the probability distribution of the data computed for a given sample. |
| Linear Probability Model | A regression model where the dependent variable is categorical, interpreted as a model of the probability of an outcome. Its main drawback is that predicted probabilities can be outside the [0,1] range. |
| Logit Regression Model | A probability model for a categorical dependent variable where the predicted value is a standard cumulative probability distribution of a logistic distribution. |
| Long Hedge | A hedge undertaken to protect against rising prices of future intended purchases, executed by buying a futures contract. |
| M-Estimators | A class of robust estimators obtained by minimizing an objective function of the residuals, which can be defined to give less weight to outliers. |
| Market Capitalization (“Market Cap”) | A measure of a firm’s size in terms of the total market value of its common stock, found by multiplying the number of common stock shares outstanding by the price per share. |
| Maximum Likelihood (ML) Estimation | An estimation method that involves maximizing the likelihood of the sample given an assumption about the underlying distribution of the data. |
| Mean | A measure of the center of data, given by the sum of all values divided by the number of values. |
| Median | A measure of the center of data that divides the data by value into a lower half and an upper half. |
| Method of Moments (MOM) | An estimation method that estimates the parameters of a probability distribution by equating its theoretical moments with the empirical moments computed from the sample. |
| Mode | The value that occurs most often in a data set. |
| Model Risk | The risk that a model is misspecified and will not work as expected in the future, leading to forecasting errors. |
| Multicollinearity | A problem in multiple regression that arises from high correlations among the independent variables, preventing insight into the true contribution of each variable. |
| Multiple Coefficient of Determination | The coefficient of determination (R²) in a multiple linear regression, measuring the percentage of variation in the dependent variable explained by all independent variables. |
| Multiple Linear Regression | A regression model with more than one independent variable. |
| Nonstationary Variable | A variable that may wander arbitrarily far from its mean and exhibits a stochastic trend. |
| Normal Distribution | A continuous probability distribution characterized by two parameters: mean (μ) and standard deviation (σ). It is also referred to as the Gaussian distribution. |
| Ordinary Least Squares (OLS) | The application of the least squares method to simple or multiple regressions under standard assumptions, where parameters are estimated by minimizing the sum of squared residuals. |
| Overfitting | A phenomenon where a model matches the unpredictable noise in the sample data too precisely, leading to poor out-of-sample forecasting abilities. |
| Passive Portfolio Strategy | A strategy that involves minimal expectational input and instead relies on diversification to match the performance of some market index. |
| Policy Asset Allocation | A long-term asset allocation decision in which an investor seeks to assess an appropriate long-term “normal” asset mix. |
| Polytomous Variable | An explanatory variable that distinguishes between more than two categories. |
| Principal Components Analysis (PCA) | A data-reduction technique used to parsimoniously represent data by transforming a set of correlated variables into a set of linearly uncorrelated variables called principal components. |
| Probit Regression Model | A probability model for a categorical dependent variable where the predicted value is the cumulative standard normal distribution function. |
| Qualitative Data | Data obtained by ascribing to each item a non-numerical attribute. |
| Quantitative Data | Data where the value of a variable is numerical. |
| Quantile Regression | A regression tool that aims at minimizing the weighted sum of absolute deviations from a specific conditional quantile, allowing for analysis of relationships across the entire distribution of the dependent variable. |
| Random Walk | A process where the next period’s value is determined by the previous period’s value plus a random change (disturbance). The best estimate for the following period’s value is the current period’s value. |
| Regression Hyperplane | In a multiple linear regression, the estimated k-multidimensional hyperplane that expresses the functional linear relationship between the dependent and independent variables. |
| Resistant Estimator | An estimator that is insensitive to changes in a single observation. |
| Robust Statistics | A field of statistics that aims to find descriptive concepts and models that are little affected by outliers, small changes in the sample, or mistakes in distributional assumptions. |
| Short Hedge | A hedge used to protect against a decline in the future cash price of an underlying asset, executed by selling a futures contract. |
| Simple Linear Regression | A regression model with only one independent variable. It is also called a univariate regression. |
| Skewness | A measure of the asymmetry of a distribution. The Pearson skewness is defined as three times the difference between the median and the mean, divided by the standard deviation. |
| Spurious Regression | A problem arising from data that seemingly are correlated but actually are not, often occurring when all variables in a regression are nonstationary. |
| Standard Deviation | A measure of variation defined as the positive square root of the variance. Its units correspond to the original units of the data. |
| Stationary Variable | A variable whose mean and variance are constant and whose autocorrelation depends only on the lag length. It exhibits mean reversion and displays no stochastic trend. |
| Stepwise Exclusion Regression | A model-building method that begins by including all independent variables and then sequentially eliminates the insignificant ones until only significant variables remain. |
| Stepwise Inclusion Regression | A model-building method that begins with no independent variables and adds them one at a time based on their contribution to the regression’s explanatory power. |
| Survivorship Bias | A bias exhibited by samples selected based on criteria valid at the last date in the sample time series, which ignores entities that ceased to exist before that date. |
| Tactical Asset Allocation | Active strategies that seek to enhance performance by opportunistically shifting the asset mix of a portfolio in response to changing patterns of reward in capital markets. |
| Time Series Data | A set of data collected from the same quantity of interest or variable over successive periods of time. |
| Tracking Error | A measure of the dispersion of a portfolio’s returns relative to the returns of its benchmark; it is the standard deviation of the portfolio’s active return. |
| Trimmed Mean | A robust estimator of the center of a distribution, calculated by removing a certain percentage of the lowest and highest observations from the sample before computing the mean. |
| Variance | A measure of variation that averages the squared deviations from the mean. |
| Vector Autoregressive (VAR) Model | A model used to describe the dynamic interrelationship among several time series variables, where each variable is explained by its own lagged values and the lagged values of all other variables in the system. |
| Weighted Least Squares (WLS) | An estimation method used when residuals are heteroscedastic. It seeks the minimum of the sum of squared weighted residuals. |
| Winsorized Mean | A robust estimator of the center where the most extreme observations are not removed but are replaced by the values of the most extreme observations that remain in the sample. |