Constructing Ensembles

Bagging
- Run classifier k times on m examples drawn randomly with replacement from the original set of m examples
- Training sets correspond to 63.2% of original (+ duplicates)

Cross-validated committees
- Divide examples into k disjoint sets
- Train on k sets corresponding to original minus 1/k th

Boosting
- Maintain a probability distribution over set of training ex
- On each iteration, use distribution to sample
- Use error rate to modify distribution
  - Create harder and harder learning problems...

Bagging