Skip to content
Extras/ethics-responsibility/responsible-deployment
// companion content · math depth

Responsible Deployment: Who Gets Hurt If You're Wrong?

Responsible deployment means analyzing failure modes, documenting limitations, and deciding when human oversight is required — like conducting an accessibility impact assessment before a major launch.

Instructor

Your model scores well on every metric. Your fairness numbers look good. You've verified it learned the right patterns. But before you deploy, there's one more question — the most important one: who gets hurt if you're wrong?

Every frontend developer has done a pre-launch review. Does it work on mobile? Is it accessible? What happens if the API is down? Responsible ML deployment follows the same logic, but the failure cases involve people's livelihoods, health, and civil rights.

Learning Objectives

  • Conduct a failure mode analysis for an ML system
  • Distinguish between high-stakes and low-stakes ML applications
  • Understand when human-in-the-loop oversight is required
  • Create a model card documenting a model's capabilities and limitations

The Accessibility Review, but for ML

Frontend

Accessibility Impact Assessment
// a11y review: Who can't use this feature? What's the fallback?

Machine Learning

Model Impact Assessment
// ML review: Who is harmed by errors? What's the recourse?
Intuition Bridge
⚠ Where this breaks
Accessibility impact assessment can be checked against published WCAG criteria. Model impact assessment must consider distributional harms across populations the developer is not in — there is no spec to check; it requires affected-stakeholder consultation. Skipping consultation and declaring 'we considered ethics' is the dominant failure mode.

Before launching a major feature, responsible frontend teams ask: "Who can't use this? What's the fallback experience?" An accessibility impact assessment identifies users who might be excluded and builds alternatives.

A model impact assessment asks the same questions: "Who is harmed by errors? What recourse do they have? Is there a human fallback?"

Failure Mode Analysis

Every model will be wrong sometimes. The question is: what happens when it's wrong?

impact-assessment.tstypescript
interface FailureMode {
description: string;
probability: 'low' | 'medium' | 'high';
severity: 'low' | 'medium' | 'high' | 'critical';
affectedGroups: string[];
mitigation: string;
humanFallback: boolean;
}

interface ModelImpactAssessment {
modelName: string;
purpose: string;
stakeholders: string[];
failureModes: FailureMode[];
deploymentDecision: 'deploy' | 'deploy-with-oversight' | 'do-not-deploy';
rationale: string;
}

// Example: loan approval model
const loanModelAssessment: ModelImpactAssessment = {
modelName: 'loan-approval-v2',
purpose: 'Pre-screen loan applications for manual review',
stakeholders: ['applicants', 'loan officers', 'bank', 'regulators'],
failureModes: [
  {
    description: 'False negative — qualified applicant denied',
    probability: 'medium',
    severity: 'high',
    affectedGroups: ['applicants', 'especially underrepresented groups'],
    mitigation: 'Human review of all denials; applicant appeal process',
    humanFallback: true,
  },
  {
    description: 'False positive — unqualified applicant approved',
    probability: 'low',
    severity: 'medium',
    affectedGroups: ['bank', 'applicant (may take on unaffordable debt)'],
    mitigation: 'Secondary manual review before final approval',
    humanFallback: true,
  },
  {
    description: 'Systematic bias against a demographic group',
    probability: 'medium',
    severity: 'critical',
    affectedGroups: ['affected demographic group', 'regulators'],
    mitigation: 'Monthly fairness audits; regulatory reporting',
    humanFallback: true,
  },
],
deploymentDecision: 'deploy-with-oversight',
rationale: 'Model assists but does not replace human loan officers. All decisions subject to human review.',
};

function assessRisk(assessment: ModelImpactAssessment): string {
const hasCritical = assessment.failureModes.some(f => f.severity === 'critical');
const allHaveFallbacks = assessment.failureModes.every(f => f.humanFallback);

if (hasCritical && !allHaveFallbacks) {
  return 'DO NOT DEPLOY — critical failure modes without human fallbacks';
}
if (hasCritical && allHaveFallbacks) {
  return 'DEPLOY WITH OVERSIGHT — critical risks mitigated by human review';
}
return 'DEPLOY — risks are manageable with standard monitoring';
}

High-Stakes vs. Low-Stakes

Not all ML applications carry equal risk. A music recommendation that suggests a bad song is annoying. A medical diagnosis model that misses cancer is catastrophic.

High-stakes (require human oversight): healthcare, criminal justice, hiring, lending, child welfare.

Lower-stakes (can tolerate errors): content recommendations, spam filtering, autocomplete, image tagging.

The stakes determine how much oversight you need — not whether you need it at all. Every deployed model needs monitoring.

Model Cards

Model cards are like package.json for ML models — structured documentation that tells anyone who encounters your model what it does, what it was trained on, how it performs, and where it fails.

A model card should include:

  • Intended use: what the model is designed to do
  • Out-of-scope uses: what it should not be used for
  • Training data: what it was trained on, and known gaps
  • Performance metrics: accuracy, fairness metrics, broken down by group
  • Limitations: known failure modes and biases

Challenge

Build a model impact assessment for a given ML application scenario.

Exercise

IntermediateDeployment~15 min

Assess Model Impact

Write a function `assessDeploymentRisk` that takes an array of failure modes (each with `severity`: 'low' | 'medium' | 'high' | 'critical' and `hasHumanFallback`: boolean) and returns a deployment decision: 'do-not-deploy' if any critical failure mode lacks a human fallback, 'deploy-with-oversight' if there are critical or high severity modes but all have fallbacks, or 'deploy' if there are no critical or high severity modes.

# bridge

Accessibility Impact AssessmentModel Impact Assessment

Key Takeaways

  • Always ask 'who gets hurt if this is wrong?' before deploying
  • Failure mode analysis maps errors to their real-world consequences
  • High-stakes applications require human-in-the-loop oversight
  • Model cards document capabilities and limitations — like package.json for ML
  • Sometimes the responsible decision is to not deploy at all

Need a hint?

🧭 Guidance
Solution
Report Issue
0/2000
Severity
Screenshot
+ Attach screenshot (optional)
page url + browser info captured automatically