A4.3.3 HL · evaluating models

Overfitting, and the metrics that catch it

A model that fits the training data perfectly can still be useless. These two tools show why: how model complexity trades off against unseen data, and how a single threshold turns one classifier into very different scoreboards.

The fit: too simple, just right, too eager

polynomial degree: 3

train error

0.003

test error

0.007

This looks like a good fit.

Filled dots are training data, orange rings are unseen test data. Low degree misses the shape (underfit); very high degree wiggles through every training dot but misses the test points (overfit). The best degree is where the test error is lowest.

The scoreboard: one threshold, four numbers

predict 0 (left)actually 1actually 0predict 1 (right)

predicted 1predicted 0actually 1

TP · caught

FN · missed

actually 0

FP · false alarm

TN · correct pass

decision threshold: 0.50

accuracy96%
precision100%
recall92%
F196%

Slide the threshold right and the model gets pickier: precision rises but recall falls as it misses more real positives. There is no free lunch; F1 balances the two.

loading…

← Machine learning