cfa with no experience: Study Session 3 - Reading 10 Sampling and Estimation

LOS 10a. define simple random sampling, sampling error, and a sampling distribution, and interpret sampling error;

simple random sampling: every item has an equal chance of being selected
can be done by assigning a number to each item and using random numbers to select or by systematically choosing every nth item
sampling error = the difference between the sample stat (e.g. mean) and the corresponding population parameter (e.g. pop mean) i.e. how (un)representative is the sample stat
sampling distribution of the sample stat is probability distribution of all possible sample stats from a set of equal sized samples randomly drawn from same population

LOS 10b. distinguish between simple random and stratified random sampling;

simple random is just random or systematic sampling
stratified random sampling is proportionate - ensuring that the random sample contains a representative number of observations from each category e.g. different stocks

LOS 10c. distinguish between time-series and cross-sectional data;

time-series is looking at one category across multiple time periods
cross-sectional is looking at multiple categories during one single time period

LOS 10d. interpret the central limit theorem and describe its importance;

central limit theorem states that for a large enough sample size n (usually > 30) from a pop with a mean μ and a variance σ², the prob distribution for the sample mean will be approx. normal with a mean μ and a variance of σ²/n
Theory allows us to use normal distribution to test hypotheses about pop mean, regardless of distrib. of the pop
As the sample size grows, the sample stats become closer to the pop parameters
The sample mean will be approximately normally distributed.
The sample mean will be equal to the population mean (μ).
The sample variance will be equal to the population variance (σ2) divided by the size of the sample (n)
Thus the central limit theorem can help make probability estimates for a sample of a non-normal population (e.g. skewed, lognormal), based on the fact that the sample mean for large sample sizes will be a normal distribution.

LOS 10e. calculate and interpret the standard error of the sample mean;

standard error is the standard deviation (of the pop or, if not available, the sample) divided by the square root of the sample size
the sample mean and standard error can be used to calculate approximate confidence intervals for the mean i.e. the actual pop mean will lie between a and b with 95% confidence

LOS 10f. distinguish between a point estimate and a confidence interval estimate of a population parameter;

point estimate is a single sample value used to estimate pop parameters e.g. sample mean representing the pop mean where sample mean is a point estimate of the pop mean
confidence interval gives a range of values within which the actual value of a parameter will lie, given a probability of 1 - α (α is the level of significance)

LOS 10g. identify and describe the desirable properties of an estimator;

unbiased = the expected value of the estimator is equal to parameter you are trying to estimate
efficient = variance of sampling distribution is smaller than all other unbiased estimators
consistent = as sample size grows, estimator accurace increases i.e. standard error decreases

LOS 10h. explain the construction of confidence intervals;

confidence intervals are the point estimate ± (reliability factor * standard error)

LOS 10i. describe the properties of Student’s t-distribution and calculate and interpret its degrees of freedom;

Student's t-distribution is used when sample size is <>
It results in more conservative confidence intervals (curve is platykurtic - fat tails)
t-distribution is symetrical
defines by degrees of freedom (df) calculated by n-1 (sample size minus one)
t distribution converges to z distribution as sample size (degrees of freedom) becomes sufficiently large

LOS 10j. calculate and interpret a confidence interval for a population mean, given a normal distribution with 1) a known population variance, 2) an unknown
population variance, or 3) an unknown variance and a large sample size;

here we are trying to calculate the probability of the pop mean being within a certain range of values based on the sample mean distribution
when available, use population parameters to calculate the confidence interval
the calculation for when distribution is normal with known variance is:
where x is the sample mean,
z_α/2 is the reliability factor i.e. the z-score that leaves α/2 in the upper tail,
e.g. z_α/2 = 1.65 for 90% confidence (sig. level is 10% i.e. 5% in each tail) - might want to just think of this as 10% instead of thinking about the tails bit
and the last part is the standard error

So for example, you have a sample mean test score of 80% with a standard error of 5 at 95% confidence, then the true pop mean would be between 75% and 85% with 95% confidence

when variance is unknown, use t distribution:
here the t_α/2 part is the t-statistic corresponding to a t-distributed random variable with n-1 degrees of freedom

Rules of thumb for when to use t or z

if distribution is non-normal then small sample sizes do not work
if normal w/ known pop variance then use z statistic
if normal w/ unknown variance use t statistic
non-normals only work with large samples, use z or t depending on whether you know variance

LOS 10k. discuss the issues regarding selection of the appropriate sample size, data-mining bias, sample selection bias, survivorship bias, look-ahead bias, and time-period bias.

data mining = overestimating significance of a pattern in a data set; test pattern on out of sample data to confirm or deny overestimation of significance
sample selection bias = systematic exclusion of data from analysis, usually because unavailable (creates non-random samples)
survivorship bias = exclusion of samples such as using only surviving mutual funds in sample
look-ahead bias = basing the test at a point in time on data not available at that time
time-period bias = relation does not hold over other time periods

cfa with no experience

Tuesday, January 12, 2010

Study Session 3 - Reading 10 Sampling and Estimation

No comments:

Post a Comment

Followers

Blog Archive