Evaluation

Testing and Training

Generally split the data randomly into

80% for training
20% for testing

K-Fold Cross Validation

K-Fold Cross Validation

Warning

Don’t use this if you only have a little data

Cut the data into $k$ chunks

Make $k$ splits. In each, use one chunk for testing and the other $k - 1$ chunks for training

Train on those, report test accuracy

Link to original

Leave One Out Cross Validation

Warning

This is computationally expensive, so prefer K-Fold Cross Validation

Only use when every ounce of training data counts
Will let you see exactly where your algorithm falls down

Validation Sets

One possible problem is that you overfit instead of the model overfitting

You might tune the hyperparameters to fit the test data too well
Now info about test set has been propagated back to algorithm

Solution: Split into training, test, and validation set

Validation data can be used to tune hyperparameters for your model, then testing data is used at the end to evaluate accuracy

Most people don’t actually do this because it’s too much work

Training Failures

Main reasons for model not working:

Overfitting
- Unable to generalize
- Cues in on patterns in training data that don’t actually exist in real world
- Can think of a decision tree with way too many nodes/rules
- Kinda like superstitions
Underfitting
- Too general
- Can think of a decision tree with only a couple levels
Lack of model power
- Some patterns are just too complicated for, say, decision trees
Lack of signal in data
- It really is impossible to learn anything from the data

Combating Training Failures

Overfitting

Decision trees have a really bad overfitting problem, can deal with it by:

Applying a depth limit where we chop everything off after 4 levels
- Depth limit is a hyperparameter for decision trees
Trying different ones and see how it affects our accuracy on the test set

Play around with a bunch of hyperparameters and see what works best for the test set

Evaluation

Accuracy

Easiest way to see how model is doing is accuracy

Class Imbalance

But class imbalance is a problem

Happens when an overwhelming majority of your data is a single class

Examples of class imbalance:

Fraud - Most transactions aren’t fraudulent
Disease - Most people won’t have the disease
Product - Most people won’t buy any given product

It’s very common to have a rare positive signal surrounded by negatives

Confusion Matrix

Shows you true positives, false positives, true negatives, and false negatives

	Positive	Negative
Classified Positive	True positive%	False positive%
Classified Negative	False negative%	True negative%

Precision

Precision is how many of the things that we classified as positive were actually positive

Precision = \frac{tp}{tp + f p}

Use precision when we only care about being correct about the things we identify as positive, e.g., Google doesn’t care if it turns away 1000 good engineers, they only care if the ones it does hire are good

Recall

Recall is how many of our positive class we didn’t miss

Recall = \frac{tp}{tp + f n}

Use recall when we want to make sure we don’t miss anything, e.g., identifying people with disease

Confidence

Most ML algorithms can tell you how confident they are in an answer

Depending on use case, you may want to take the confidence into account, e.g., launching nukes

Log loss: A measure of accuracy that penalizes overconfidence

Log loss = \frac{- 1}{N} i = 1 \sum N j = 1 \sum M x_{ij} \cdot lo g (p_{ij})

F1 score

Harmonic mean of precision and recall, used when both are important

Warning

Max thinks this is an awful metric

$F1 score = \frac{2}{\frac{1}{Precision} + \frac{1}{Recall}} = \frac{2 \cdot Precision \cdot Recall}{Precision + Recall}$

CMSC320 Notes

Explorer

Evaluation

Testing and Training

K-Fold Cross Validation

K-Fold Cross Validation

Leave One Out Cross Validation

Validation Sets

Training Failures

Combating Training Failures

Overfitting

Evaluation

Accuracy

Class Imbalance

Confusion Matrix

Precision

Recall

Confidence

F1 score

Graph View

Table of Contents

Backlinks