Part I: The Foundations of Econometric Analysis
1.0 The Core Principles of Econometrics
1.1. Introduction: Bridging Economic Theory and Empirical Analysis
Econometrics is the application of statistical methods to economic data. It serves as the strategic tool economists employ to move beyond the qualitative predictions of economic theory into the realm of quantitative estimation and hypothesis testing. While economic theory provides the essential structure for evaluating relationships—predicting, for instance, that higher interest rates will reduce investment spending, ceteris paribus—econometrics provides the means to quantify that relationship, estimating precisely how much investment might fall for a given increase in interest rates. Within this discipline, we distinguish between theoretical econometricians, who are akin to mathematicians developing new statistical procedures, and applied econometricians, who utilize those tools to examine real-world economic phenomena.
The central goal of econometric analysis is to evaluate economic relationships by linking an economic model to empirical data. This process commences with economic theory, which helps identify an outcome of interest (the dependent variable) and the causal factors (the independent variables) that are logically connected to it. For example, a microeconomic model might posit that economic profits in a competitive market will induce new firms to enter, while a macroeconomic model might suggest that higher interest rates reduce investment spending. In both cases, econometrics provides the framework to test these theoretical predictions with actual data. A key strength of this discipline is that it is “school of thought neutral.” A properly applied econometric approach allows the data to speak for itself, providing an objective method to test the predictions of various economic theories for consistency with real-world observations. The following sections will establish the fundamental statistical concepts that underpin all econometric work.
1.2. Probability and Random Variables: A Review
At its core, econometrics seeks to explain outcomes with uncertain values, making a firm grasp of probability theory essential. Economic variables such as wages, profits, or demand are considered random variables. We classify these variables into two types: discrete random variables, which have countable integer outcomes, such as the number of jobs an individual has held; and continuous random variables, which can take on any real value within a range, resulting in infinite outcomes, such as an hourly wage.
To describe a random variable, we use a Probability Density Function (PDF), which assigns probabilities to all of its possible values. A PDF has two key properties: the probability of any single event, f(X), must be between 0 and 1, inclusive (0 ≤ f(X) ≤ 1), and the sum of probabilities for all possible events must equal 1. For a discrete variable, the PDF can be depicted in a table or a bar graph. Consider an experiment of tossing three coins where the random variable X is the number of heads. There are eight possible outcomes, leading to the following PDF, which shows a 3/8 probability of observing either one or two heads:
| Number of Heads (X) | Probability f(X) |
| 0 | 1/8 = 0.125 |
| 1 | 3/8 = 0.375 |
| 2 | 3/8 = 0.375 |
| 3 | 1/8 = 0.125 |
| Total | 8/8 = 1.0 |
Building on the PDF, the Cumulative Density Function (CDF) gives the sum of probabilities up to a certain value of the random variable. For a simple 2-coin-toss experiment, where X is the number of heads, the CDF is constructed by summing the probabilities of each outcome and all smaller outcomes. If f(0)=0.25, f(1)=0.50, and f(2)=0.25, the corresponding CDF values are F(0)=0.25, F(1)=0.75 (0.25+0.50), and F(2)=1.00 (0.75+0.25).
To analyze relationships, we use bivariate (or joint) probability density, f(X, Y), which gives the probability of two events occurring simultaneously. From this, we derive conditional probability density, f(Y|X) = f(X, Y) / f(X), which calculates the probability of Y occurring given that X has occurred. This leads to the concept of statistical independence, which holds if f(Y|X) = f(Y), meaning the occurrence of one event does not statistically affect the other. These concepts are fundamental to understanding how economic variables interact. From this theoretical basis, we now turn to the practical application of summarizing data.
1.3. Descriptive Statistics and Summary Measures
Descriptive statistics are measurements that summarize sample data and allow us to make inferences about the population of interest. Key summary measures for a random variable include:
- Expected Value (Mean): This is a measure of central tendency, representing the weighted average value of a random variable, where the weights are the probabilities of each outcome.
- Variance and Standard Deviation: These are measures of dispersion that quantify the spread in the data. Variance is the average squared difference between the value of a random variable and its mean. The standard deviation is the square root of the variance, measured in the same units as the variable itself.
- Covariance: This measures the direction of the relationship between two random variables. A positive covariance indicates the variables tend to move in the same direction, while a negative value indicates they move in opposite directions.
- Correlation: The correlation coefficient measures both the strength and direction of a linear relationship between two variables. It standardizes the covariance to fall between -1 (perfectly negative linear relationship) and +1 (perfectly positive linear relationship). A value near 0 indicates no clear linear relationship. Crucially, the correlation coefficient only captures linear associations. As illustrated in the source text’s Figure 2-9, a strong nonlinear relationship, such as a U-shape, can exist between two variables even if their correlation coefficient is close to zero.
In econometrics, it is vital to distinguish between a parameter, a descriptive measure calculated from population data, and an estimator (or statistic), the corresponding measure calculated from sample data. Since we almost always work with samples, we rely on estimators to make inferences about population parameters. A “good” estimator possesses three desirable properties:
- Unbiasedness: An estimator is unbiased if, in repeated estimations using the same method, the mean value of the estimator coincides with the true population parameter value.
- Efficiency: An estimator is efficient if it has the smallest variance among all estimators of its kind.
- Consistency: An estimator is consistent if it approaches the true parameter value as the sample size grows larger.
Understanding these properties allows us to evaluate the quality of our statistical estimates before using them to test formal hypotheses about economic relationships.
1.4. Distributions for Hypothesis Testing
To make valid inferences from sample data, a firm understanding of key probability distributions is necessary. These distributions serve as the theoretical foundation for hypothesis testing in econometrics.
The Normal Distribution is a continuous, symmetrical, and bell-shaped distribution. Its properties are often summarized by the empirical rule: for a normally distributed variable, approximately 68% of measurements fall within one standard deviation of the mean, 95% fall within two standard deviations, and 99.7% fall within three.
The Standard Normal Distribution (Z-distribution) is a special case of the normal distribution with a mean of 0 and a variance of 1. Any normally distributed variable X can be standardized to a Z-variable using the formula Z = (X – μX) / σX, which allows us to use a single standard table to calculate probabilities for any normal variable.
One of the most important concepts in statistics is the Central Limit Theorem (CLT). It states that for a sufficiently large sample size, the distribution of the sample mean will be approximately normal, regardless of the underlying distribution of the population. This profound result provides the theoretical justification for why the normal distribution is so central to statistical inference, as it allows us to test hypotheses about population means using sample data even when the population’s distribution is unknown.
From the normal distribution, we derive other critical distributions for hypothesis testing:
- Chi-squared (χ²) distribution: This distribution is derived from the sum of squared standard normal random variables. It is non-negative and its shape, which depends on its degrees of freedom, is typically right-skewed.
- t-distribution: This distribution is derived from the ratio of a standard normal random variable and the square root of a chi-squared random variable. It is bell-shaped and symmetrical like the normal distribution but has thicker tails, especially in small samples.
- F-distribution: This distribution is derived from the ratio of two independent chi-squared distributions, each divided by its respective degrees of freedom. It is non-negative and right-skewed.
These distributions are the essential workhorses for conducting the hypothesis tests that are central to the primary tool of econometrics: regression analysis.