A4.3.3 HL · evaluating models
A model that fits the training data perfectly can still be useless. These two tools show why: how model complexity trades off against unseen data, and how a single threshold turns one classifier into very different scoreboards.
train error
0.003
test error
0.007
This looks like a good fit.
Filled dots are training data, orange rings are unseen test data. Low degree misses the shape (underfit); very high degree wiggles through every training dot but misses the test points (overfit). The best degree is where the test error is lowest.
12
TP · caught
1
FN · missed
0
FP · false alarm
13
TN · correct pass
Slide the threshold right and the model gets pickier: precision rises but recall falls as it misses more real positives. There is no free lunch; F1 balances the two.