Statistical Measures of Asset Returns

Quantitative Methods

Measures of Shape of a Distribution

Learning Outcome Statement:

interpret and evaluate measures of skewness and kurtosis to address an investment problem

Summary:

This LOS focuses on understanding the measures of skewness and kurtosis, which describe the shape of a distribution beyond the basic measures of central tendency and dispersion. Skewness indicates the degree of asymmetry of a distribution around its mean, with positive skewness indicating a long right tail, and negative skewness a long left tail. Kurtosis measures the 'tailedness' of the distribution, with high kurtosis indicating heavy tails and low kurtosis indicating light tails. These measures help in assessing the risks and characteristics of investment returns that are not apparent from mean and variance alone.

Key Concepts:

Skewness

Skewness is a measure of the asymmetry of the probability distribution of a real-valued random variable about its mean. Positive skewness indicates a distribution with an asymmetric tail extending towards more positive values, while negative skewness indicates a distribution with a tail extending towards more negative values.

Kurtosis

Kurtosis is a measure of the tailedness of the probability distribution of a real-valued random variable. High kurtosis means that the distribution has heavy tails and a sharp peak near the mean, low kurtosis means that the distribution has light tails and a flatter peak.

Excess Kurtosis

Excess kurtosis is the kurtosis of the distribution minus 3, which adjusts the measure to facilitate comparison with the normal distribution (which has a kurtosis of 3). Positive excess kurtosis indicates a leptokurtic distribution with heavier tails than a normal distribution, and negative excess kurtosis indicates a platykurtic distribution with lighter tails.

Formulas:

Sample Skewness

Skewness1ni=1n(XiXˉs)3Skewness \approx \frac{1}{n} \sum_{i=1}^{n} \left( \frac{X_i - \bar{X}}{s} \right)^3

This formula calculates the skewness of a sample, which measures the asymmetry of the distribution around its mean. The cubing of deviations preserves the sign, influencing the direction of the skew.

Variables:
nn:
sample size
XiX_i:
ith data point
Xˉ\bar{X}:
sample mean
ss:
sample standard deviation
Units: unitless

Sample Excess Kurtosis

KE[1ni=1n(XiXˉs)4]3KE \approx \left[ \frac{1}{n} \sum_{i=1}^{n} \left( \frac{X_i - \bar{X}}{s} \right)^4 \right] - 3

This formula calculates the excess kurtosis of a sample, which measures the tailedness of the distribution relative to a normal distribution. It adjusts the kurtosis by subtracting 3 to compare it directly with the normal distribution.

Variables:
nn:
sample size
XiX_i:
ith data point
Xˉ\bar{X}:
sample mean
ss:
sample standard deviation
Units: unitless

Measures of Central Tendency and Location

Learning Outcome Statement:

calculate, interpret, and evaluate measures of central tendency and location to address an investment problem

Summary:

This LOS focuses on understanding and applying measures of central tendency and location, which are crucial for analyzing data distributions in finance. Central tendency measures, including the mean, median, and mode, describe where data is centered. Measures of location, such as quartiles and percentiles, help identify values at or below which certain proportions of data lie. The content also discusses handling outliers and the implications of using different measures on investment decisions.

Key Concepts:

Arithmetic Mean

The arithmetic mean is calculated by summing all observations and dividing by the number of observations. It is sensitive to outliers.

Median

The median is the middle value in a sorted dataset. It is less affected by outliers compared to the mean.

Mode

The mode is the most frequently occurring value in a dataset. A dataset can be unimodal, bimodal, or have no mode.

Outliers

Outliers are extreme values that can skew the results of statistical measures. They can be handled by ignoring, deleting, or replacing them with other values.

Quantiles

Quantiles are values that divide a dataset into equal-sized, contiguous subsets. Common quantiles include quartiles, quintiles, deciles, and percentiles.

Interquartile Range (IQR)

IQR is the range between the first quartile (25th percentile) and the third quartile (75th percentile) and is used to measure statistical dispersion.

Formulas:

Sample Mean

Xˉ=i=1nXin\bar{X} = \frac{\sum_{i=1}^{n} X_i}{n}

Calculates the average of a sample.

Variables:
XiX_i:
Value of the i-th observation
nn:
Total number of observations
Units: unit of the data points

Interquartile Range

IQR=Q3Q1IQR = Q3 - Q1

Measures the middle 50% spread of the data, less influenced by outliers.

Variables:
Q3Q3:
Third quartile (75th percentile)
Q1Q1:
First quartile (25th percentile)
Units: unit of the data points

Correlation between Two Variables

Learning Outcome Statement:

interpret correlation between two variables to address an investment problem

Summary:

This LOS explores the concept of correlation between two variables, primarily focusing on its calculation, interpretation, and limitations. It covers the use of scatter plots to visually assess relationships, the mathematical computation of covariance and correlation, the properties of correlation, and the limitations of correlation analysis including the potential for spurious correlations and the influence of outliers.

Key Concepts:

Scatter Plot

A scatter plot is a graphical representation used to visualize the relationship between two variables. The degree of clustering indicates the strength of the relationship, with tighter clustering suggesting a stronger relationship.

Covariance

Covariance is a measure that indicates the extent to which two variables change together. A positive covariance indicates that the variables tend to move in the same direction, while a negative covariance indicates they move in opposite directions.

Correlation Coefficient

The correlation coefficient, ranging from -1 to +1, is a normalized measure of the strength and direction of the linear relationship between two variables. Values close to +1 or -1 indicate strong linear relationships, while values near 0 indicate weak or no linear relationship.

Properties of Correlation

Correlation properties include its bounded range from -1 to +1, interpretation of different values (0 indicating no linear relationship, +1 perfect positive linear relationship, -1 perfect negative linear relationship), and its use in assessing the strength of linear relationships.

Limitations of Correlation Analysis

Correlation analysis can be misleading if there are outliers or nonlinear relationships. It is sensitive to outliers and does not imply causation. Spurious correlations can arise due to chance, mixed variables, or a third variable affecting the two variables in question.

Formulas:

Sample Covariance

sXY=i=1n(XiXˉ)(YiYˉ)n1s_{XY} = \frac{\sum_{i=1}^{n} (X_i - \bar{X})(Y_i - \bar{Y})}{n-1}

This formula calculates the average product of deviations of observations from their respective means, providing a measure of how two variables move together in a sample.

Variables:
sXYs_{XY}:
sample covariance
XiX_i:
value of variable X in the sample
YiY_i:
value of variable Y in the sample
nn:
sample size
Xˉ\bar{X}:
sample mean of X
Yˉ\bar{Y}:
sample mean of Y
Units: units squared (e.g., returns squared if variables are returns)

Sample Correlation Coefficient

rXY=sXYsXsYr_{XY} = \frac{s_{XY}}{s_X s_Y}

This formula standardizes the covariance by the product of the standard deviations of the variables, providing a dimensionless measure of the linear relationship between the variables.

Variables:
rXYr_{XY}:
sample correlation coefficient
sXYs_{XY}:
sample covariance
sXs_X:
standard deviation of X
sYs_Y:
standard deviation of Y
Units: dimensionless

Measures of Dispersion

Learning Outcome Statement:

calculate, interpret, and evaluate measures of dispersion to address an investment problem

Summary:

This LOS focuses on understanding and applying various measures of dispersion, which describe the variability of data around a central value. These measures include the range, mean absolute deviation (MAD), sample variance, sample standard deviation, downside deviation, and coefficient of variation. Each measure provides insights into the spread or variability of data, which is crucial for assessing risk and making informed investment decisions.

Key Concepts:

Range

The range is the simplest measure of dispersion, calculated as the difference between the maximum and minimum values in a dataset. It provides a quick sense of the spread but is sensitive to outliers and does not describe the overall distribution shape.

Mean Absolute Deviation (MAD)

MAD measures the average of the absolute deviations from the mean, providing a summary of dispersion that uses all data points and avoids issues with negative deviations canceling out positive ones.

Sample Variance

Sample variance quantifies the average of the squared deviations from the sample mean, offering a measure that addresses the problem of deviations canceling each other out by squaring them.

Sample Standard Deviation

This is the square root of the sample variance and provides a dispersion measure in the same units as the data, making it easier to interpret in the context of the original data set.

Downside Deviation

This measure focuses on the variability of returns below a target or minimum acceptable return, reflecting investor concerns about potential losses rather than overall variability.

Coefficient of Variation (CV)

CV is a normalized measure of dispersion relative to the mean, useful for comparing the variability of datasets with different means or units of measurement. It expresses the amount of risk per unit of return.

Formulas:

Range

Range=Maximum valueMinimum value\text{Range} = \text{Maximum value} - \text{Minimum value}

Calculates the difference between the maximum and minimum values to determine the overall spread of the data.

Variables:
MaximumvalueMaximum value:
The highest value in the dataset
MinimumvalueMinimum value:
The lowest value in the dataset
Units: Same as the data units

Mean Absolute Deviation

MAD=1ni=1nXiXMAD = \frac{1}{n} \sum_{i=1}^{n} |X_i - \overline{X}|

Averages the absolute deviations from the mean, providing a measure of dispersion that is not affected by the direction of the deviations.

Variables:
nn:
Number of observations
XiX_i:
Individual observation
X\overline{X}:
Sample mean
Units: Same as the data units

Sample Variance

s2=1n1i=1n(XiX)2s^2 = \frac{1}{n-1} \sum_{i=1}^{n} (X_i - \overline{X})^2

Calculates the average squared deviations from the mean, which addresses the issue of negative and positive deviations canceling each other out.

Variables:
nn:
Number of observations
XiX_i:
Individual observation
X\overline{X}:
Sample mean
Units: Squared units of the data

Sample Standard Deviation

s=1n1i=1n(XiX)2s = \sqrt{\frac{1}{n-1} \sum_{i=1}^{n} (X_i - \overline{X})^2}

Provides a measure of dispersion in the same units as the data by taking the square root of the sample variance.

Variables:
nn:
Number of observations
XiX_i:
Individual observation
X\overline{X}:
Sample mean
Units: Same as the data units

Coefficient of Variation

CV=sXCV = \frac{s}{\overline{X}}

Normalizes the standard deviation by the mean to provide a relative measure of dispersion, useful for comparing datasets with different scales or units.

Variables:
ss:
Sample standard deviation
X\overline{X}:
Sample mean
Units: Unitless (ratio)