Statistics Much of this lecture is definitions, to enable you to understand, discuss, and design statistical tests. Types of statistics: * descriptive: given data, compute its mean, min, max, standard deviation, etc. * inferential: given a subset of data, infer information about whole data. inferential is our focus. Terminology: * population: all the data in the world. Sometimes infinite. Impossible or infeasible to examine it all. * sample: a finite subset that is examined Goal: given a sample, infer facts about the population. Types of data: * categorical * nominal -- categories have no order. example: blue-eyed, brown-eyed, green-eyed * ordinal -- categories are ordered, but there is no fixed relationship between them example: 1st place, 2nd place, 3rd place gap between 1st and 2nd could be bigger or smaller than gap between 2nd and 3rd * quantitative * interval -- fixed distance between measurements, but arbitrary zero value example: temperature in Fahrenheit or centigrade * ratio -- fixed distance between measurements, and meaningful zero value example: temperature in Kelvin Types of variables: * independent: controlled (and typically varied) by the experimenter. Think of them as inputs. * dependent: measured by the experimenter. Think of them as outputs. There is a somewhat related use of the term "independent" that may be confusing. Hypothesis: A guess or theory about the world, that the scientist wishes to validate or falsify. Null hypothesis: You can use any statement as your hypothesis, such as: * mu_2 < mu_1 * mu_2 = mu_1 * 1.5 * mu_2 = mu_1 (Let population 1 be users in the control treatment, and let mu_1 be their mean score. Let population 2 be users in the experimental treatment, and let mu_2 be their mean score.) The "null hypothesis" says that your experimental treatment has no effect. It is called H_0, and is written as "mu_2 = mu_1". The scientist often believes or hopes that the experimental treatment is better than the control treatment. In other words, the scientist believes that mu_2>mu_1. However, the scientist generally tests the null hypothesis H_0: mu_2=mu_1. Why? Why not test the hypothesis that the scientist really believes? The reason is thet it is easier to falsify a hypothesis than to support a hypothesis. Easier to falsify than to support. ===========================================================================- IID means "independent, identically distributed". We often hear Independent variables IID vs Markov chain * repeated throws of a random die are IID * markov example: * random walk, each element +/- 1 compared to previous value * letter frequency: expectation of next letter most likely letters: e t a o i n s r h most likely after the letter e: r s n d a most common digraphs: TH, HE, AN, IN, ER, ON, RE, ED, ND, HA, AT, EN, ES, OF, NT, EA, TI, TO, IO, LE, IS, OU, AR, AS, DE, RT, VE morse code: e . t - a .- o --- i .. n -. s ... r .-. then m .. (letter frequencies in 1830) * choosing from a bag without replacement * many samples from same field, when many fields exist Why do we care? * Estimating means * estimating variance Example: estimating mean, but choosing from subpopulation Given a set of data, what do we think the mean is? What is your confidence in that? (95% confidence interval) Consider two baseball players: Mr. HotAndCold Mr. Consistent year 1 4.78 year 2 1.23 year 3 3.45 year 4 1.02 year 5 4.55 year 6 3.22 year 7 3.44 year 8 4.55 Average Range year 9 3.45 year 10 year 11 year 12 year 13 year 14 year 15 year 16 Average Range Practical importance: * effect size estimation * explanation of causality * effect on downstream Power = the probability that it will reject a false null hypothesis. = the probability of detecting a real effect = 1 - probability of type II error (false negative) = 1 - \beta = P(reject H_0 | H_1 is true) (power is also known as sensitivity) \beta = propabality of type II error (false negative) \alpha = probability of type I error (false positive); typically Power is sometimes referred to as \pi \pi = .8 is a standard for adequacy .8 means 4-to-1 tradeoff between \beta and \alpha: e.g., \beta = .2, \alpha = .05 How to control power? * more data (bigger sample size) * design: more powerful statistical test * equal numbers of observations * paired vs. non-paired * different p value * magnitude of effect in the population Power lets you compute the needed sample size to achieve a given probability. [what's the opposite? significance tests, p values] Many trials conclude with “There was no statistically significant difference in adverse effects between groups” without noting that there was insufficient data to detect any but the largest differences. * meta-analysis is useful. Coin biased: 60% heads 10 flips: 20% chance of noticing 100 flips: 50+% chance of noticing 1000 flips: ~100% chance of noticing Nice reading on power: https://www.statisticsdonewrong.com/power.html t-test: compare two means (averages) * assume from a normal distribution, equal variance Questions: * what is the mean (does the mean have a particular value?) * confidence intervals * are the means equal? paired vs. unpaired tests * paired * reduces inter-subject variability * for: * same subject: "before-after" studies, two treatments * matched pairs t-test: 1-sided vs. 2-sided tests