Constructing Ensembles
Bagging
- Run classifier k times on m examples drawn randomly with replacement from the original set of m examples
- Training sets correspond to 63.2% of original (+ duplicates)
Cross-validated committees
- Divide examples into k disjoint sets
- Train on k sets corresponding to original minus 1/k th
Boosting
- Maintain a probability distribution over set of training ex
- On each iteration, use distribution to sample
- Use error rate to modify distribution
- Create harder and harder learning problems...