Skip to content
Extras/classical-ml/ensemble-methods
// companion content · math depth

Ensemble Methods: Wisdom of Crowds

Ensemble methods combine multiple weak models into a strong one through bagging (parallel, independent) or boosting (sequential, corrective).

Instructor

Here's a fact that surprises most people: on structured, tabular data, random forests and gradient-boosted trees beat neural networks more often than not. Not because trees are smarter, but because combining many simple models is often better than one complex one.

Think about how you handle unreliable API calls in production. You don't trust a single service — you call multiple, compare results, and take the consensus. That's exactly what ensemble methods do with models.

Learning Objectives

  • Understand bagging as parallel model training with random data subsets
  • Build a simplified random forest from multiple decision trees
  • Distinguish bagging (variance reduction) from boosting (bias reduction)
  • Know when ensembles outperform deep learning (spoiler: tabular data)

The Promise.allSettled Pattern

You already use the ensemble pattern in frontend code. When you need reliability, you don't trust a single source.

Frontend

Promise.allSettled + vote
const results = await Promise.allSettled(models); majorityVote(results)

Machine Learning

Random forest
forest.predict(x) // each tree votes, majority wins
Structural Bridge
⚠ Where this breaks
Promise.allSettled runs known async tasks in parallel and votes the results you defined. Random forest trains many decorrelated decision trees on bootstrapped samples and votes — the diversity is the entire point and comes from the training procedure, not the call site.
ensemble-pattern.tstypescript
// Frontend: multiple API calls, take consensus
async function reliablePrice(productId: string): Promise<number> {
const results = await Promise.allSettled([
  fetchFromServiceA(productId),
  fetchFromServiceB(productId),
  fetchFromServiceC(productId),
]);

const prices = results
  .filter((r): r is PromiseFulfilledResult<number> => r.status === 'fulfilled')
  .map(r => r.value);

// Take the median — robust to one bad response
return median(prices);
}

// ML: multiple models, take majority vote
function randomForestPredict(
trees: DecisionTree[],
features: number[]
): number {
const predictions = trees.map(tree => tree.predict(features));

// Majority vote
const votes = new Map<number, number>();
for (const pred of predictions) {
  votes.set(pred, (votes.get(pred) ?? 0) + 1);
}

return [...votes.entries()].sort((a, b) => b[1] - a[1])[0][0];
}

Bagging: Random Subsets, Parallel Training

Bagging (Bootstrap AGGregating) trains each model on a random subset of the training data, sampled with replacement. This means each tree sees a slightly different version of the data.

bagging.tstypescript
// Bootstrap sampling: random subset with replacement
function bootstrapSample<T>(data: T[]): T[] {
const sample: T[] = [];
for (let i = 0; i < data.length; i++) {
  const idx = Math.floor(Math.random() * data.length);
  sample.push(data[idx]);
}
return sample; // same size as original, but with duplicates
}

// Random feature subset (what makes it a *random* forest)
function randomFeatures(allFeatures: string[], maxFeatures: number): string[] {
const shuffled = [...allFeatures].sort(() => Math.random() - 0.5);
return shuffled.slice(0, maxFeatures);
}

// Build a random forest
function buildRandomForest(
data: DataPoint[],
numTrees: number,
maxFeatures: number
): DecisionTree[] {
const trees: DecisionTree[] = [];

for (let i = 0; i < numTrees; i++) {
  // Each tree gets a random sample of data
  const sample = bootstrapSample(data);
  // Each tree only sees a random subset of features
  const features = randomFeatures(allFeatureNames, maxFeatures);
  // Train tree on this unique view of the data
  trees.push(trainDecisionTree(sample, features, { maxDepth: 10 }));
}

return trees;
}

Why does this work? Each individual tree is mediocre — it only sees part of the data and part of the features. But their errors are uncorrelated. When you average uncorrelated errors, they cancel out.

Boosting: Sequential Error Correction

While bagging trains models in parallel, boosting trains them sequentially. Each new model focuses on the mistakes of the previous one. It's like code review: each reviewer catches different bugs.

boosting-concept.tstypescript
// Conceptual boosting: each model focuses on previous errors
function boostingTrain(data: DataPoint[], numRounds: number) {
// Start with equal weights for all data points
let weights = new Array(data.length).fill(1 / data.length);
const models: { model: DecisionTree; weight: number }[] = [];

for (let round = 0; round < numRounds; round++) {
  // Train a weak model (shallow tree) on weighted data
  const model = trainWeightedTree(data, weights, { maxDepth: 3 });

  // Find misclassified points
  const errors = data.map((d, i) => ({
    index: i,
    wrong: model.predict(d.features) !== d.label,
  }));

  const errorRate = errors
    .reduce((sum, e) => sum + (e.wrong ? weights[e.index] : 0), 0);

  // Model weight: better models get more say
  const modelWeight = 0.5 * Math.log((1 - errorRate) / errorRate);
  models.push({ model, weight: modelWeight });

  // Increase weights on misclassified points
  // Next model will focus on these harder examples
  weights = weights.map((w, i) =>
    w * Math.exp(errors[i].wrong ? modelWeight : -modelWeight)
  );

  // Normalize weights
  const totalWeight = weights.reduce((a, b) => a + b, 0);
  weights = weights.map(w => w / totalWeight);
}

return models;
}

Bagging vs. Boosting: When to Use Each

Bagging (Random Forest)Boosting (XGBoost)
TrainingParallelSequential
ReducesVariance (overfitting)Bias (underfitting)
RiskLess prone to overfitCan overfit if too many rounds
SpeedFast (parallelizable)Slower (sequential)
When to useYour model is overfittingYour model is too simple

Challenge

Build a simplified random forest with bootstrap sampling and majority voting.

Exercise

IntermediateArithmetic~15 min

Build a Random Forest

Implement a simplified random forest. First, write bootstrapSample that creates a random sample with replacement from the input array. Then implement majorityVote that takes an array of predictions and returns the most common one. Finally, implement randomForestPredict that runs each tree's predict function on the features and returns the majority vote. Use 5 mock tree predictors provided in the starter code.

# bridge

Promise.allSettled + voteRandom forest

Key Takeaways

  • Ensembles combine weak models into strong ones, like Promise.allSettled with voting
  • Bagging trains models in parallel on random subsets — reduces variance
  • Boosting trains models sequentially, each fixing prior errors — reduces bias
  • On tabular data, random forests and XGBoost often beat neural networks

Need a hint?

🧭 Guidance
Solution
Report Issue
0/2000
Severity
Screenshot
+ Attach screenshot (optional)
page url + browser info captured automatically