LOS K requires us to:
describe the issues regarding the selection of the appropriate sample size, data mining bias, sample selection bias, survivorship bias, look-ahead bias, and time period bias.
1. Sample Size Selection
a. As discussed, the confidence interval, which is calculated using the following formula:
is affected by:
i. z or t,
ii. α, and
iii. also the standard error, i.e. σ/(n)1/2 or s/(n)1/2
Thus, the larger the standard error, the wider is the confidence interval. This is because; a larger n reduces the standard error and therefore the confidence intervals
b. Thus, there are inherent trade-offs in selecting a sample based on both statistical and economic factors:
i. A larger sample may result in increased precision due to the use of z-statistics rather than t-statistics, and reduction in the estimate of the standard error.
ii. A larger sample may also result in the cost being more than the benefits.
2. Data Mining Bias
a. Data mining is the practice of hitting a data set over and over again until you hit the gold.
Thus, just by random chance alone, a significant relationship will be found that actually does not exist in any other data set.
b. This data mining is typically not motivated by a theory or hypothesis. The significant results can also be obtained as a result of data narrowing, i.e. dropping the outlier cases or torturing the data until it confesses.
c. To verify the relationship and/or discover data-mining biases, we can conduct out-of-sample tests.
3. Sample Selection Bias
a. Sample selection bias is a result of the exclusion of certain data/variables due to unavailability.
b. Survivorship bias is a particular kind of selection bias wherein we only observe those firms that have succeeded and, therefore, survive.
4. Look-Ahead Bias
a. Look-ahead bias occurs when researchers use data not available at the test date to test a model and use it for predictions.
b. It may be particularly pronounced when using accounting data, which is typically reported with a lag in time.
5. Time Period Bias
a. Time-period bias occurs when the model uses data from a time period when the data is not representative of all possible values of the data across time.
b. Too short of a time period increases the likelihood of period-specific results.
c. Too long of a time period increases the chance of a regime change.