Statistics

Much of this lecture is definitions, to enable you to understand, discuss, and design statistical tests.

Types of statistics:
 * descriptive:  given data, compute its mean, min, max, standard deviation, etc.
 * inferential:  given a subset of data, infer information about whole data.
   inferential is our focus.

Terminology:
 * population:  all the data in the world.  Sometimes infinite.  Impossible or infeasible to examine it all.
 * sample:  a finite subset that is examined
   Goal:  given a sample, infer facts about the population.

Types of data:
 * categorical
    * nominal -- categories have no order.
      example: blue-eyed, brown-eyed, green-eyed
    * ordinal -- categories are ordered, but there is no fixed relationship between them
      example: 1st place, 2nd place, 3rd place
      gap between 1st and 2nd could be bigger or smaller than gap between 2nd and 3rd
 * quantitative
    * interval -- fixed distance between measurements, but arbitrary zero value
      example: temperature in Fahrenheit or centigrade
    * ratio -- fixed distance between measurements, and meaningful zero value
      example: temperature in Kelvin

Types of variables:
 * independent:  controlled (and typically varied) by the experimenter.
   Think of them as inputs.
 * dependent:  measured by the experimenter.
   Think of them as outputs.

There is a somewhat related use of the term "independent" that may be confusing.


Hypothesis:
A guess or theory about the world, that the scientist wishes to validate or falsify.


Null hypothesis:

You can use any statement as your hypothesis, such as:
 * mu_2 < mu_1
 * mu_2 = mu_1 * 1.5
 * mu_2 = mu_1
(Let population 1 be users in the control treatment, and let mu_1 be their mean score.
Let population 2 be users in the experimental treatment, and let mu_2 be their mean score.)

The "null hypothesis" says that your experimental treatment has no effect.
It is called H_0, and is written as "mu_2 = mu_1".

The scientist often believes or hopes that the experimental treatment is better than the control treatment.  In other words, the scientist believes that mu_2>mu_1.  However, the scientist generally tests the null hypothesis H_0: mu_2=mu_1.  Why?  Why not test the hypothesis that the scientist really believes?

The reason is thet it is easier to falsify a hypothesis than to support a hypothesis.


Easier to falsify than to support.


===========================================================================-

IID means "independent, identically distributed".


We often hear 

Independent variables 


IID vs Markov chain
 * repeated throws of a random die are IID
 * markov example:
    * random walk, each element +/- 1 compared to previous value
    * letter frequency: expectation of next letter
      most likely letters:  e t a o i n s r h
      most likely after the letter e:  r s n d a
      most common digraphs: TH, HE, AN, IN, ER, ON, RE, ED, ND, HA, AT, EN, ES, OF, NT, EA, TI, TO, IO, LE, IS, OU, AR, AS, DE, RT, VE
      morse code:  e . t - a .- o --- i .. n -. s ... r .-. then m ..
        (letter frequencies in 1830)
 * choosing from a bag without replacement
 * many samples from same field, when many fields exist

Why do we care?
 * Estimating means
 * estimating variance

Example:  estimating mean, but choosing from subpopulation

Given a set of data, what do we think the mean is?
What is your confidence in that?  (95% confidence interval)

Consider two baseball players:
 Mr. HotAndCold
 Mr. Consistent

year 1	4.78
year 2	1.23
year 3	3.45
year 4	1.02
year 5	4.55
year 6	3.22
year 7	3.44
year 8	4.55
Average
Range

year 9	3.45
year 10
year 11
year 12
year 13
year 14
year 15
year 16
Average
Range

Practical importance:
 * effect size estimation
 * explanation of causality
 * effect on downstream


Power = the probability that it will reject a false null hypothesis.
      = the probability of detecting a real effect
      = 1 - probability of type II error (false negative)
      = 1 - \beta
      = P(reject H_0 | H_1 is true)
  (power is also known as sensitivity)
\beta = propabality of type II error (false negative)
\alpha = probability of type I error (false positive); typically 

Power is sometimes referred to as \pi
  \pi = .8 is a standard for adequacy
    .8 means 4-to-1 tradeoff between \beta and \alpha:  e.g.,
      \beta = .2, \alpha = .05

How to control power?
 * more data (bigger sample size)
 * design:  more powerful statistical test
    * equal numbers of observations
    * paired vs. non-paired
 * different p value
 * magnitude of effect in the population
Power lets you compute the needed sample size to achieve a given probability.

[what's the opposite?  significance tests, p values]

Many trials conclude with “There was no statistically significant difference in adverse effects between groups” without noting that there was insufficient data to detect any but the largest differences.
 * meta-analysis is useful.


Coin biased: 60% heads
 10 flips: 20% chance of noticing
 100 flips: 50+% chance of noticing
 1000 flips: ~100% chance of noticing

Nice reading on power:
https://www.statisticsdonewrong.com/power.html


t-test: compare two means (averages)
 * assume from a normal distribution, equal variance
Questions:
 * what is the mean (does the mean have a particular value?)
    * confidence intervals
 * are the means equal?


paired vs. unpaired tests
 * paired
    * reduces inter-subject variability
    * for:
       * same subject:  "before-after" studies, two treatments
       * matched pairs

t-test: 

1-sided vs. 2-sided tests