Decision Trees

Very simple, not very good, outdated (but a building block of one of our strongest algorithms: Random Forests)

Warning

Don’t actually use decision trees. Only good for demonstration purposes.

Use decision trees when you only want:

A pretty picture to show to someone
A general sense of your data you can look at

Training

Smaller is better:

Fewer rules: more generalization
Many rules: more specificity

We want to use as few rules as possible to classify our data

Deep trees are bad (deep = lots of levels, rules)
Bushy trees are fine (bushy = lots of splits at a single level)

Entropy

See sub-page: Entropy

Algorithm for picking a split

We want to get as possible to cutting the data in half with every split to get the smallest possible tree
So we want the splits with the most difference in information before and after the split
If you had 1 bit of entropy before the split and have 0.97 bits of entropy after the split, your information gain is only 0.03
- You want to have an information gain of 1 (start with 1 bit of entropy, and have 0 bits of entropy after split)
Splits can cause different sizes, e.g. you go from 1 bit of entropy to 0 bits of entropy for one child node but 0.99 bits of entropy for the other child node
- So information gain needs to be the weighted sum of each branch

Use a greedy heuristic:

Start with all the data and no split
For each column you can split on, compute the weighted information gain on splitting on that
- Find the column with the most information gain
Repeat until you have no entropy left (all elements in your node have the same class)

Math for the algorithm

$H (Node) = \sum p l g (\frac{1}{p})$ Weighted entropy: $H (Split) = \frac{size _{r i g h t} \cdot H ( right ) + size _{l e f t} \cdot H ( left )}{size _{r i g h t} + size _{l e f t}}$

Information gain is $H (Before split) - H (After split)$

The Classification Algorithm

Steps to use a decision tree once you’ve trained it:

Get a piece of data with no label
Start at the top of the tree
Go down each branch based on the appropriate feature
When you get to a leaf, classify

CMSC320 Notes

Explorer

Decision Trees

Training

Entropy

Algorithm for picking a split

Math for the algorithm

The Classification Algorithm

Graph View

Table of Contents

Backlinks