Friday, January 8, 2010

Study Session 2 - Reading 7 Statistical Concepts and Market Returns

LOS 7a - Descriptive vs. Inferential Statistics
Descriptive allow one to analyse and summarise large data sets - turns data into information.
Inferential involves making forecasts, estimates/judgments about a larger group from samples and is founded on probability theory.

Nominal Ordinal Interval Ratio

LOS 7b - Frequency Distribution
  1. Define the intervals - must be exhaustive and not overlap
  2. Assign the observations to their relevant intervals
  3. Count the observations

LOS 7c - Relative Frequency, Cumulative Frequency
  • Absolute frequency = the # of observations in each interval (e.g. 2, 3, 5)
  • Relative frequency = % of observations in each interval (e.g. 20%, 30%, 50%)
  • Cum. Abs. Freq. = the cumulative # of obs in each interval (e.g. 2, 5, 10)
  • Cum. Rel. Freq. = the cum frequency in each interval (e.g. 20%, 50%, 100%)

LOS 7d - Histograms

Graphical representation (either bar or polygon) of frequency distribution. Intervals on x, absolute (usually) frequency on y axis.

LOS 7e - Define, calculate and interpet measures of central tendency, including the population mean, sample mean, arithmetic mean, weighted average or mean, geometric mean, harmonic mean, and mode

All of these are essentially measures of expected returns w/r/t to stocks or portfolios with the exception of harmonic mean which is used largely in dollar cost averaging.

population/sample mean = is simply the arithmetic mean of all the observations and will often be used as Expected Value or Expected Return when referring to stock prices or returns. Sum of all deviations from mean will equal zero.

weighted mean/average is used where the observations have unequal influence on the mean. Multiply the values by their weights and then sum them all. Often used to find Expected Return of a portfolio where different stocks have different weights in portfolio so their returns are averaged using weighted average.

Note: The weighted average in many guises is used in other formulas where values are averaged but they are not equal e.g. variance of a portfolio where stocks have different weights.

median - midpoint. Middle observation. If there are an even number of observations, the median is the average of the middle two observations.

geometric mean is often used when calculating investment returns over multiple periods or when measuring compound growth rates. To calculate, take the nth root of the product of the n observations:






The first is the general formula for geometric mean. The second is used for calculating returns (which is quite a common use).

harmonic mean used for dollar cost averaging. divide the number of obs by the reciprocals of the obs, so...






harmonic mean <>

LOS 7f - Quartiles and other 'iles
These are just intervals. Divide the range by the appropriate number (5 for quintiles, 100 for percentiles) to get the size of the intervals. Remember, no overlapping.

To locate the position of the observation at a given percentile, y, with n data points sorted in ascending order (e.g. find the observation located at the 30th percentile):




LOS 7g - Define, calculate, and interpret 1) a range a mean absolute deviation and 2) the variance and standard deviation of a population and of a sample

  • Range = (max value - min value)
  • mean absolute deviation = average of the absolute value of all deviations from the mean.
  • population/sample variance = measures volatility/risk and is the square root of the average of the squared deviations from the mean. The average can be found arithmetically or using a weighted average as appropriate to the problem.

standard deviation is the most common expression of risk and is simply the square root of the variance. The σ is useful because it is expressed in the same units as the observations i.e. if your observations are in $ and cents then so is your σ.


LOS 7h - Calculate and interpret the proportion of observations falling within a specified number of standard deviations of the mean using Chebyshev's inequality

Chebyshev's inequality tells you the % of obs that lie within k standard deviations of mean is at least 1-1/k2

Works for any distribution and tells you minimum % and gives the following key markers:
  • 36% = +-1.25 standard deviations of the mean
  • 56% = +-1.50 standard deviations of the mean
  • 75% = +-2 standard deviations of the mean
LOS 7i - Define, calculate, and interpret the coefficient of variation and the Sharpe ratio

Coefficient of variation is a measure of dispersion in a distribution relative to the mean and allows us to make direct comparison of dispersion across different sets of data. Allows us to measure risk (variability) per unit of expected return whereas Sharpe measures return per unit of risk.

CV = standard deviation of x/average value of x

Sharpe Ratio measures excess return per unit of risk and is the risk premium divided by the standard deviation. Portfolios with large Sharpe ratios are preferred because they give more return per unit of risk. To calculate:



Very similar to Safety First Ratio.


LOS 7j - Define and interpret skewness, explain the meaning of a positively or negatively skewed return distribution and describe the relative locations of the mean, median, and mode for a nonsymmetrical distribution


OK. Skew is just like we use it in common speech. If we say that will skew the results it means throw them off in one direction or another. Positive skew says that there are positive outliers and so the distribution is humped to the left. Negative is humped to the right with a long left tail of negative possibilities.
  • Positive skew = mean > median > mode
  • Negative skew = mean <>
  • For a symmetrical distribution, they are equal.
NB put the three measures in alphabetical order and arrows point in the direction of skew.


LOS 7k - Define and interpret measures of sample skewness and kurtosis
  • kurtosis measures peakness of distribution and normal dist. = 3
  • leptokurtic is more peaked, mesokurtic is normal and platykurtic is less peaked than normal
  • lept = leap, meso = same, plat = flat

To calculate skew:


Note: skew is cubed which allows for a positive or negative results. The formula for kurtosis is the same formula but to the fourth power instead of cubed. Excess kurtosis is result minus 3.
To calculate kurtosis:


LOS 7l: Discuss the use of arithmetic mean or geometric mean when determining investment returns
  • use geometric mean for measures of past performance over multiple years/periods as it gives us the compounded rate
  • use arithmetic mean as estimator of next year's returns

No comments:

Post a Comment