Ensemble Learning¶

1. Voting Classifers¶

Aggregate the predications of each classifier and predict the class that gets the most votes, called hard voting classifier
Ensemble method work best when the classifiers are as independent from one another as possible
hard voting, aggregate the predictions of each classifier and predict the class that gets the most votes
soft voting, predict the class with the highest class probability, averaged over all the individual classifiers, it often archieves higher performance than hard voting, need all classifiers are able to estimate class probabilities

from sklearn.model_selection import train_test_split
from sklearn.datasets import make_moons

X, y = make_moons(n_samples=500, noise=0.30, random_state=42)
X_train, X_test, y_train, y_test = train_test_split(X, y, random_state=42)

from sklearn.ensemble import RandomForestClassifier
from sklearn.ensemble import VotingClassifier
from sklearn.linear_model import LogisticRegression
from sklearn.svm import SVC

log_clf = LogisticRegression(solver="lbfgs", random_state=42)
rnd_clf = RandomForestClassifier(n_estimators=100, random_state=42)
svm_clf = SVC(gamma="scale", random_state=42)

voting_clf = VotingClassifier(
    estimators=[('lr', log_clf), ('rf', rnd_clf), ('svc', svm_clf)],
    voting='hard')

from sklearn.metrics import accuracy_score

for clf in (log_clf, rnd_clf, svm_clf, voting_clf):
    clf.fit(X_train, y_train)
    y_pred = clf.predict(X_test)
    print(clf.__class__.__name__, accuracy_score(y_test, y_pred))

LogisticRegression 0.864
RandomForestClassifier 0.896
SVC 0.896
VotingClassifier 0.912

2. Bagging and Pasting¶

Use the same model for every classifier and train them on different random subsets of the training sets
bagging, sampling is performed with replacement
pasting, sampling is performed without replacement
bagging has higer bias and lower variance than pasting, results in better models

from sklearn.ensemble import BaggingClassifier
from sklearn.tree import DecisionTreeClassifier

bag_clf = BaggingClassifier(
    DecisionTreeClassifier(random_state=42), n_estimators=500,
    max_samples=100, bootstrap=True, random_state=42) # bootstrap = True, samples with replacement
bag_clf.fit(X_train, y_train)
y_pred = bag_clf.predict(X_test)
print(accuracy_score(y_test, y_pred))

0.904

# Out-of-Bag Evaluation
bag_clf = BaggingClassifier(
    DecisionTreeClassifier(random_state=42), n_estimators=500,
    max_samples=100, bootstrap=True, oob_score=True)
bag_clf.fit(X_train, y_train)
print(bag_clf.oob_score_)
y_pred = bag_clf.predict(X_test)
print(accuracy_score(y_test, y_pred))

0.928
0.92

Random Patches method, sample both training instances and features, max_samples, boostrap=True, max_feature, bootstrap_features=True
Random Subspaces method, keeping all training instace but sampling features, boostrap=False, max_samples=1.0, boostrap_featreus=True, max_features=0.6
Random Forest is trained via the bagging method max_samples=1.0 or use RandomForestClassifer

# Random Forest with BaggingClassifier
bag_clf = BaggingClassifier(
    DecisionTreeClassifier(splitter="random", max_leaf_nodes=16, random_state=42),
    n_estimators=500, max_samples=1.0, bootstrap=True, random_state=42)
bag_clf.fit(X_train, y_train)
y_pred = bag_clf.predict(X_test)
print(accuracy_score(y_test, y_pred))

0.92

# Random Forest with RandomForestClassifier
from sklearn.ensemble import RandomForestClassifier

rnd_clf = RandomForestClassifier(n_estimators=500, max_leaf_nodes=16, random_state=42)
rnd_clf.fit(X_train, y_train)

y_pred_rf = rnd_clf.predict(X_test)
print(accuracy_score(y_test, y_pred_rf))
rnd_clf.feature_importances_ # feature importance

0.912

array([0.42253629, 0.57746371])

Extra-Trees, using random thresholds for each feature when considering for splitting, rather than searching for the best possible thresholds
Trade more bias for lower variance

from sklearn.ensemble import ExtraTreesClassifier
extra_clf = ExtraTreesClassifier(n_estimators=500, max_leaf_nodes=16, random_state=42)
extra_clf.fit(X_train, y_train)

y_pred_extra = extra_clf.predict(X_test)
print(accuracy_score(y_test, y_pred_extra))

extra_clf.feature_importances_ # feature importance

0.912

array([0.42504529, 0.57495471])

3. Boosting¶

Train predictors sequentially, each trying to correct its predecessor

AdaBoost
- train a base classifier, and use it to make predictions on the training set
- increases the relative weight of misclassified trainning instances, train a second classfier using the updated weights, makes predictions on the training set
- and so on
- the algorithm stops when the desired number of predictors is reached, or when a perfect predictor is found
- make decision, computes the predictions of all the predictors and weights them using the predictor weights $\alpha_{j}$, the predicted class is the one that receiveds the majority of weighted votes
- too many estimators can make the model overfitted for boosting, bagging doest not

from sklearn.ensemble import AdaBoostClassifier

ada_clf = AdaBoostClassifier(
    DecisionTreeClassifier(random_state=42),
    n_estimators=200, algorithm="SAMME.R", learning_rate=0.5, random_state=42)
ada_clf.fit(X_train, y_train)

y_pred_ada = ada_clf.predict(X_test)
print(accuracy_score(y_test, y_pred_ada))

ada_clf.feature_importances_ # feature importance

0.856

array([0.43562701, 0.56437299])

Gradient Boosting
- adding predictors sequentially to an ensemble
- instead of tweaking the instance weights, fit the new predictor to the residual errors

from sklearn.ensemble import GradientBoostingClassifier
clf = GradientBoostingClassifier(random_state=0, n_estimators = 200, learning_rate=0.05)
clf.fit(X_train, y_train)

y_pred = clf.predict(X_test)
print(accuracy_score(y_test, y_pred))

clf.feature_importances_ # feature importance

0.888

array([0.42212231, 0.57787769])

from sklearn.ensemble import GradientBoostingClassifier
from sklearn.metrics import mean_squared_error
import numpy as np

clf = GradientBoostingClassifier(random_state=0, n_estimators = 200, learning_rate=0.05)
clf.fit(X_train, y_train)

# search for the best number of estimators
errors = [mean_squared_error(y_test, y_pred) for y_pred in clf.staged_predict(X_test)]
best_n = np.argmin(errors)+1

clf = GradientBoostingClassifier(random_state=0, n_estimators = best_n, learning_rate=0.05)
clf.fit(X_train, y_train)

y_pred = clf.predict(X_test)
print(accuracy_score(y_test, y_pred))

clf.feature_importances_ # feature importance

0.92

array([0.35868635, 0.64131365])

Stochastic Gradient Boosting
- Use the fraction of training instances for training each tree

XGBoost
- Extreme Gradient Boosting

import xgboost
xgb_clf = xgboost.XGBClassifier(random_state=42)
xgb_clf.fit(X_train, y_train)

y_pred = clf.predict(X_test)
print(accuracy_score(y_test, y_pred))

clf.feature_importances_ # feature importance

0.92

array([0.35868635, 0.64131365])

4. Stacking¶

Train predictors with the first half of the training set
Predict using the trained predictors with the second half of the training set
Use the predicted values from trained predictors as the new features
Use the generated new features and the target values to train a predictor (blender)