Ensemble Methods

DM 352 Syllabus | DM 552 Syllabus

last updated 23-Jun-2021

Some notes taken from Vadim Smolyakov, Ensemble Learning to Improve Machine Learning Results

Ensemble Methods

Construct a set of classifiers from the training data

Predict class label of test records by combining the predictions made by multiple classifiers


Why they work

Suppose there are 25 base classifiers.

Each classifier has error rate, ε = 0.35

Assume errors made by classifiers are uncorrelated

Probability that the ensemble classifier makes a wrong prediction:

which is better than 0.35.

General Approach

Types of ensemble methods

Manipulate data distribution in choosing a training data set for the individual models

Manipulate input features -- good for data sets with highly redundant features

Manipulate class labels -- partition multiple class values into two sets

Manipulate the learning algorithms



Boostrap AGgregation

Sampling with replacement


Build classifier on each bootstrap sample

Each sample has probability (1 – 1/n)n of being selected

Consider 1-dimensional data set:

Classifier is a decision stump

Decision rule: x ≤ k versus x > k

Split point k is chosen based on entropy


Assume test set is the same as the original data

Use majority vote to determine class of ensemble classifier

Comparisons of bagging for decision tree and K-nearest neighbor

Accuracy: 0.63 (+/- 0.02) [Decision Tree]
Accuracy: 0.70 (+/- 0.02) [K-NN]
Accuracy: 0.64 (+/- 0.01) [Bagging Tree]
Accuracy: 0.59 (+/- 0.07) [Bagging K-NN]

The decision tree bagging ensemble achieved higher accuracy in comparison to the k-NN bagging ensemble.
K-NN are less sensitive to perturbation on training samples and therefore they are called stable learners.


An iterative procedure to adaptively change the distribution of training data determined by the sampling process by focusing more on previously misclassified records.

Initially, all N records are assigned equal weights

Unlike bagging, weights may change at the end of each boosting round

Example 4 is hard to classify-- its weight is increased, therefore it is more likely to be chosen again in subsequent rounds

AdaBoost Algorithm (Adaptive Boosting)


Consider 1-dimensional data set:

Classifier is a decision stump

Decision rule: x ≤ k versus x > k

Split point k is chosen based on entropy


Random Forests

A commonly used class of ensemble algorithms are forests of randomized trees.

In random forests, each tree in the ensemble is built from a sample drawn with replacement (i.e. a bootstrap sample) from the training set. In addition, instead of using all the features, a random subset of features is selected, further randomizing the tree.

As a result, the bias of the forest increases slightly, but due to the averaging of less correlated trees, its variance decreases, resulting in an overall better model.

The J48 (C4.5) or Apriori algorithm is determinant.

Random forest algorithm

Resulting set of decision trees have a lack of correlation.

The model then is applied to a test instance by taking a majority vote from each tree. That is, the instance is evaluated by each tree and the resulting value is the majority of the trees' conclusion.


Empirical comparison among Ensemble Methods