Skip to content

Latest commit

 

History

History

Ensemble

Folders and files

NameName
Last commit message
Last commit date

parent directory

..
 
 
 
 
 
 

AdaBoost Ensemble Learning

This explanation covers the concept of AdaBoost and its implementation in adaboost.mqh for MQL5, highlighting the flexibility of using different weak learners (like decision trees or logistic regression).

**I. AdaBoost Theory (Ensemble Learning Approach)`

AdaBoost, short for Adaptive Boosting, is an ensemble learning algorithm that combines multiple weak learners (models with moderate predictive power) into a strong learner (model with improved predictive performance). It achieves this by:

  1. Initializing weights for each data point: Initially, all data points have equal weight.
  2. Iteratively training weak learners:
    • In each iteration, a weak learner is trained on a modified dataset:
      • If the previous learner misclassified a point, its weight is increased.
      • If it was classified correctly, the weight is decreased. This focuses the subsequent learners on the "harder" data points.
    • The weight of the current weak learner is determined based on its performance on the weighted data.
  3. Combining the weak learners:
    • The final prediction of the ensemble is made by taking a weighted majority vote (classification) or a weighted average (regression) of the individual weak learner predictions, with higher weights given to more accurate learners.

II. AdaBoost.mqh Documentation:

The AdaBoost class provides functionalities for implementing the AdaBoost algorithm using either decision trees or logistic regression as weak learners.

**A. Common functionalities (present in both DecisionTree and LogisticRegression namespaces)`

  • AdaBoost(uint n_estimators=50, int random_state=42, bool bootstrapping=true) Constructor, allows setting hyperparameters (number of weak learners, random state for reproducibility, and enabling/disabling bootstrapping during training).
  • ~AdaBoost(void) Destructor.
  • void fit(matrix &x, vector &y): Trains the ensemble model using the provided data (x - independent variables, y - dependent variables).
  • int predict(vector &x): Predicts the class label (for classification) for a new data point (x).
  • vector predict(matrix &x): Predicts class labels (for classification) for multiple new data points (x).

B. Namespace-specific functionalities:

  • DecisionTree namespace:
    • CDecisionTreeClassifier *weak_learners[];: Stores weak learner pointers (decision trees) for memory management.
    • CDecisionTreeClassifier *weak_learner;: Internal pointer to the currently trained weak learner.
  • LogisticRegression namespace:
    • CLogisticRegression *weak_learners[];: Stores weak learner pointers (logistic regression models) for memory management.
    • CLogisticRegression *weak_learner;: Internal pointer to the currently trained weak learner.

III. Flexibility of Weak Learners:

The key takeaway here is that the AdaBoost class is not limited to using decision trees as weak learners. The provided examples showcase its usage with both decision trees and logistic regression. This demonstrates the flexibility of the AdaBoost framework, where any model capable of making predictions (classification or regression) can be used as a weak learner.

Random Forest Classification and Regression:

This explanation covers the CRandomForestClassifier and CRandomForestRegressor classes in MQL5, which implement random forests for classification and regression tasks, respectively.

**I. Random Forest Theory (Ensemble Learning Approach)`

A random forest is an ensemble learning method that combines multiple decision trees into a single model to improve predictive performance. Each decision tree is trained on a random subset of features (independent variables) and a bootstrapped sample of the data (randomly drawn with replacement, increasing the importance of potentially informative data points). Predictions from all trees are then aggregated through majority vote (classification) or averaging (regression) to make the final prediction. This process reduces the variance of the model and helps prevent overfitting.

II. CRandomForestClassifier Class:

This class provides functionalities for implementing a random forest for classification tasks.

Public Functions:

  • CRandomForestClassifier(uint n_trees=100, uint minsplit=NULL, uint max_depth=NULL, int random_state=-1) Constructor, allows setting hyperparameters (number of trees, minimum samples per split, maximum tree depth, and random state for reproducibility).
  • ~CRandomForestClassifier(void) Destructor.
  • void fit(matrix &x, vector &y, bool replace=true, errors_classifier err=ERR_ACCURACY) Trains the model on the provided data (x - independent variables, y - class labels).
    • replace controls whether bootstrapping samples with replacement (True) or not (False).
    • err specifies the error metric to use for internal training evaluation (default: ERR_ACCURACY).
  • double predict(vector &x) Predicts the class label for a new data point (x).
  • vector predict(matrix &x) Predicts class labels for multiple new data points (x).

Internal Functions:

  • ConvertTime(double seconds): Converts seconds to a human-readable format (not relevant for core functionality).
  • err_metric(errors_classifier err, vector &actual, vector &preds): Calculates the specified error metric (e.g., accuracy) on given data (not directly exposed to users).

III. CRandomForestRegressor Class:

This class implements a random forest for regression tasks. It inherits from CRandomForestClassifier and overrides specific functions for regression-specific behavior.

Public Functions:

  • CRandomForestRegressor(uint n_trees=100, uint minsplit=NULL, uint max_depth=NULL, int random_state=-1) Constructor (same as for the classifier).
  • ~CRandomForestRegressor(void) Destructor (same as for the classifier).
  • void fit(matrix &x, vector &y, bool replace=true, errors_regressor err=ERR_R2_SCORE) Trains the model (same as for the classifier, but default error metric is ERR_R2_SCORE).
  • double predict(vector &x) Predicts the continuous value for a new data point (x).
  • vector predict(matrix &x) Predicts continuous values for multiple new data points (x).

Internal Functions:

  • Same as in CRandomForestClassifier.

IV. Key Points:

  • Both classes use decision trees as base learners to build the random forest.
  • Hyperparameter tuning (number of trees, minimum samples per split, maximum depth) can significantly impact performance.
  • Random forests offer improved generalization and reduced variance compared to single decision trees.

Reference