Skip to content
Extras/math-deep-dive/calculus-of-neurons
// companion content · math depth

The Calculus of Neurons

Partial derivatives and the chain rule are the engine behind every gradient update in neural networks.

Instructor

In the Neural Networks module, you used backpropagation to train models. The framework handled the math. But understanding what's actually happening — partial derivatives flowing backward through a computational graph — gives you the intuition to debug vanishing gradients, pick better architectures, and understand why certain tricks work.

Learning Objectives

  • Compute partial derivatives for multi-variable functions
  • Apply the chain rule to composite functions step by step
  • Trace gradient flow through a simple neural network by hand
  • Use TensorFlow.js automatic differentiation to verify manual gradients
  • Connect the concept of derivatives to rates of change in animation code

Derivatives as Rates of Change

You already think in derivatives when you write animation code. In requestAnimationFrame, velocity is the derivative of position with respect to time — how fast the position changes per frame.

Frontend

requestAnimationFrame
const velocity = (pos - lastPos) / deltaTime

Machine Learning

Gradient
const grad = tf.grad(loss)(weights)
Structural Bridge
⚠ Where this breaks
rAF velocity is a finite-difference of an observed position you authored frame by frame. tf.grad uses reverse-mode autodiff — it walks the computational graph backwards applying the chain rule to compute exact partial derivatives. Velocity tells you what already happened; the gradient tells you which direction to step in to reduce a loss you have not yet measured.

A gradient in ML is the same idea: how fast does the loss change when you nudge a weight? If the gradient is large, a small weight change causes a big loss change. If it's near zero, that weight isn't doing much.

derivative-intuition.tstypescript
import * as tf from '@tensorflow/tfjs';

// In frontend: velocity = rate of change of position
function animationDerivative(lastPos: number, currentPos: number, dt: number) {
return (currentPos - lastPos) / dt;  // This IS a derivative
}

// In ML: gradient = rate of change of loss w.r.t. weight
// For f(x) = x^2, the derivative is f'(x) = 2x
const f = (x: tf.Tensor) => x.square();
const df = tf.grad(f);

const x = tf.scalar(3);
const gradient = df(x);
console.log(await gradient.array()); // 6 — because 2 * 3 = 6
// At x=3, increasing x by a tiny amount increases x^2 by ~6 times that amount

Partial Derivatives

Neural networks have many weights. A partial derivative tells you how the loss changes when you nudge one weight while holding all others fixed.

partial-derivatives.tstypescript
import * as tf from '@tensorflow/tfjs';

// f(x, y) = x^2 * y + y^3
// Partial with respect to x: df/dx = 2xy (treat y as constant)
// Partial with respect to y: df/dy = x^2 + 3y^2 (treat x as constant)

// Manual computation at point (2, 3):
// df/dx = 2 * 2 * 3 = 12
// df/dy = 2^2 + 3 * 3^2 = 4 + 27 = 31

// Verify with TensorFlow.js
const f = (x: tf.Tensor, y: tf.Tensor) =>
x.square().mul(y).add(y.pow(3));

// Gradient with respect to x
const dfdx = tf.grad((x) => f(x, tf.scalar(3)));
console.log(await dfdx(tf.scalar(2)).array()); // 12

// Gradient with respect to y
const dfdy = tf.grad((y) => f(tf.scalar(2), y));
console.log(await dfdy(tf.scalar(3)).array()); // 31

The Chain Rule

The chain rule is the single most important idea in deep learning. If you have a composite function — and every neural network is a deeply nested composite function — the chain rule tells you how to compute the derivative of the whole thing.

chain-rule.tstypescript
import * as tf from '@tensorflow/tfjs';

// Consider a 2-layer network (no bias, for simplicity):
//   z1 = w1 * x         (layer 1: linear)
//   a1 = relu(z1)       (activation)
//   z2 = w2 * a1        (layer 2: linear)
//   loss = (z2 - y)^2   (MSE loss)
//
// Chain rule for dLoss/dw1:
//   dLoss/dw1 = dLoss/dz2 * dz2/da1 * da1/dz1 * dz1/dw1
//
// Each factor is a simple local derivative. Let's compute them:

const x = 2.0;
const y = 1.0;   // target
const w1 = 0.5;
const w2 = -0.3;

// Forward pass
const z1 = w1 * x;           // 1.0
const a1 = Math.max(0, z1);  // 1.0 (ReLU)
const z2 = w2 * a1;          // -0.3
const loss = (z2 - y) ** 2;  // 1.69

// Backward pass (chain rule, right to left)
const dLoss_dz2 = 2 * (z2 - y);         // 2 * (-1.3) = -2.6
const dz2_da1 = w2;                      // -0.3
const da1_dz1 = z1 > 0 ? 1 : 0;         // 1 (ReLU derivative)
const dz1_dw1 = x;                       // 2

// Chain them together:
const dLoss_dw1 = dLoss_dz2 * dz2_da1 * da1_dz1 * dz1_dw1;
console.log('Manual gradient dL/dw1:', dLoss_dw1); // 1.56

// Verify with TensorFlow.js auto-diff
const computeLoss = (w1t: tf.Tensor) => {
const z1t = w1t.mul(x);
const a1t = z1t.relu();
const z2t = tf.scalar(w2).mul(a1t);
return z2t.sub(y).square();
};

const autoGrad = tf.grad(computeLoss);
console.log('Auto gradient dL/dw1:', await autoGrad(tf.scalar(w1)).array());
// Same value: 1.56

The Computational Graph

Every neural network builds a computational graph during the forward pass. Backpropagation walks this graph in reverse, applying the chain rule at each node. This is why TensorFlow is called TensorFlow — tensors flow through a graph of operations.

computational-graph.tstypescript
import * as tf from '@tensorflow/tfjs';

// Visualize a computational graph as a pipeline:
//
//   x ──┐
//       ├─ [multiply] ─ z1 ─ [relu] ─ a1 ──┐
//  w1 ──┘                                    ├─ [multiply] ─ z2 ─ [sub y] ─ [square] ─ loss
//                                       w2 ──┘
//
// Forward: left to right (compute values)
// Backward: right to left (compute gradients via chain rule)

// With multiple weights, tf.grads returns all gradients at once
const networkLoss = (weights: tf.Tensor[]) => {
const [w1, w2] = weights;
const x = tf.scalar(2);
const y = tf.scalar(1);

const z1 = w1.mul(x);
const a1 = z1.relu();
const z2 = w2.mul(a1);
return z2.sub(y).square();
};

const gradFn = tf.grads(networkLoss);
const grads = gradFn([tf.scalar(0.5), tf.scalar(-0.3)]);

console.log('dL/dw1:', await grads[0].array()); // 1.56
console.log('dL/dw2:', await grads[1].array()); // gradient for w2

Challenge

Compute gradients by hand for a simple network, then verify with TensorFlow.js.

Exercise

IntermediateTraining~20 min

Compute Gradients

Implement two functions: (1) `manualGradient` that computes the gradient of a simple 2-layer network by hand using the chain rule. The network computes: z1 = w1 * x, a1 = relu(z1), z2 = w2 * a1, loss = (z2 - y)^2. Return dLoss/dw1. (2) `autoGradient` that uses tf.grad() to compute the same gradient automatically, verifying your manual computation.

# bridge

requestAnimationFrameGradient

Key Takeaways

  • A derivative measures rate of change — the same concept behind velocity in animation code
  • Partial derivatives measure how loss changes when you nudge one weight at a time
  • The chain rule decomposes complex derivatives into products of simple local derivatives
  • Backpropagation is the chain rule applied backward through a computational graph
  • TensorFlow.js tf.grad() and tf.grads() automate this — but understanding the math helps you debug gradient problems

Need a hint?

🧭 Guidance
Solution
Report Issue
0/2000
Severity
Screenshot
+ Attach screenshot (optional)
page url + browser info captured automatically