Some ML models hate categorical data, which is where one hot encoding comes in

Make a separate column for each category, and put a 1 or 0 in each category’s column depending on whether that row is in that category

Pros:

  • You get to use your categorical variables

Cons:

  • If there are 50 categories, your dataset explodes
pd.get_dummies(series)