Skip to content
Extras/ethics-responsibility/fairness-metrics
// companion content · math depth

Fairness Metrics: Defining 'Fair'

Fairness metrics quantify how equitably a model treats different groups — just like Lighthouse scores quantify accessibility compliance.

Instructor

You've audited your dataset and found imbalances. But how do you know if your model's predictions are actually unfair? You need numbers — fairness metrics give you a concrete score, just like Lighthouse gives you an accessibility score.

As a frontend developer, you've probably run a Lighthouse audit and stared at a score trying to decide if 72 is "good enough." Fairness metrics put you in a similar position — except the stakes aren't page performance, they're whether real people get loans, jobs, or medical care.

Learning Objectives

  • Define and compute demographic parity for a binary classifier
  • Define and compute equalized odds (true positive rate parity)
  • Understand the impossibility theorem — you cannot satisfy all fairness metrics simultaneously
  • Choose the appropriate fairness metric based on application context

Lighthouse, but for Fairness

Frontend

Lighthouse Accessibility Score
// Lighthouse: 72/100 accessibility — which rules are failing?

Machine Learning

Fairness Metric
// Demographic parity: 0.65 — which groups are disadvantaged?
Intuition Bridge
⚠ Where this breaks
Lighthouse accessibility produces a single composite score against fixed rules. Fairness metrics are a family of incompatible measures (demographic parity, equalized odds, calibration) and you cannot satisfy all of them simultaneously — the choice is value-laden.

Lighthouse tells you "your color contrast ratio is 3.2:1, but WCAG AA requires 4.5:1." Fairness metrics tell you "your loan approval rate for Group A is 80%, but for Group B it's only 45%." Both give you a number. Both force you to decide what "good enough" looks like.

Demographic Parity

The simplest fairness metric: does each group receive positive predictions at the same rate?

demographic-parity.tstypescript
interface Prediction {
group: string;    // e.g., "A" or "B"
predicted: number; // 0 or 1
actual: number;    // 0 or 1 (ground truth)
}

function demographicParity(predictions: Prediction[]): Record<string, number> {
const groupStats: Record<string, { positive: number; total: number }> = {};

for (const p of predictions) {
  if (!groupStats[p.group]) groupStats[p.group] = { positive: 0, total: 0 };
  groupStats[p.group].total++;
  if (p.predicted === 1) groupStats[p.group].positive++;
}

// Compute positive prediction rate per group
const rates: Record<string, number> = {};
for (const [group, stats] of Object.entries(groupStats)) {
  rates[group] = stats.positive / stats.total;
}

return rates;
// If rates are equal across groups → demographic parity is satisfied
}

// Disparate impact ratio: min(rate) / max(rate)
// The 4/5ths rule: ratio should be >= 0.8
function disparateImpactRatio(rates: Record<string, number>): number {
const values = Object.values(rates);
return Math.min(...values) / Math.max(...values);
}

Equalized Odds

A stricter metric: does each group have the same true positive rate AND false positive rate?

equalized-odds.tstypescript
function equalizedOdds(predictions: Prediction[]) {
const groupStats: Record<string, {
  truePositives: number;
  falsePositives: number;
  actualPositives: number;
  actualNegatives: number;
}> = {};

for (const p of predictions) {
  if (!groupStats[p.group]) {
    groupStats[p.group] = {
      truePositives: 0, falsePositives: 0,
      actualPositives: 0, actualNegatives: 0
    };
  }
  const s = groupStats[p.group];
  if (p.actual === 1) {
    s.actualPositives++;
    if (p.predicted === 1) s.truePositives++;
  } else {
    s.actualNegatives++;
    if (p.predicted === 1) s.falsePositives++;
  }
}

const result: Record<string, { tpr: number; fpr: number }> = {};
for (const [group, s] of Object.entries(groupStats)) {
  result[group] = {
    tpr: s.actualPositives > 0 ? s.truePositives / s.actualPositives : 0,
    fpr: s.actualNegatives > 0 ? s.falsePositives / s.actualNegatives : 0,
  };
}

return result;
// Equalized odds: TPR and FPR should be equal across groups
}

The Impossibility Theorem

Here's the uncomfortable truth: you mathematically cannot satisfy all fairness definitions at the same time (unless your model is perfect or the base rates are equal across groups). This is called the impossibility theorem.

This is like trying to optimize for Lighthouse performance, accessibility, SEO, and best practices all at 100 simultaneously — sometimes improving one metric degrades another. You have to make a judgment call about which metric matters most for your specific application.

When to use demographic parity: When equal access to a resource matters more than accuracy (e.g., seeing job ads).

When to use equalized odds: When you need the model to be equally accurate across groups (e.g., medical diagnosis).

Challenge

Compute fairness metrics for a classifier and determine if disparate impact exists.

Exercise

IntermediateArithmetic~15 min

Compute Fairness Metrics

Write two functions: `demographicParity` takes an array of predictions (each with `group`, `predicted`, and `actual` — all numbers 0 or 1 except group which is a string) and returns a Record mapping each group to its positive prediction rate. `disparateImpactRatio` takes that rates object and returns the ratio of the minimum rate to the maximum rate. A ratio below 0.8 indicates disparate impact.

# bridge

Lighthouse Accessibility ScoreFairness Metric

Key Takeaways

  • Demographic parity: equal positive prediction rates across groups
  • Equalized odds: equal true positive and false positive rates across groups
  • The impossibility theorem means you must choose which fairness metric matters most
  • The 4/5ths rule: disparate impact ratio below 0.8 signals unfairness

Need a hint?

🧭 Guidance
Solution
Report Issue
0/2000
Severity
Screenshot
+ Attach screenshot (optional)
page url + browser info captured automatically