A time series is a dataset where the values are arranged in a sequence (doesn’t actually have to be time)
We are concerned with predicting the next value in the sequence
Filling in Missing Data
Missing data can be filled in with:
- Forward/backward fill, where you just fill in missing values by copying over the previous or next values
- If your time series has trend or seasonality, then this will screw that up over long time periods
- Mean/median/mode imputation - bad, can underestimate the variance
- Moving average
- Takes into account temporal structure of data
- But window size important: if too large, then smooths too much; if too small, then won’t capture trend
- todo when would you want to use this
- Linear interpolation - Draw a line between the earliest and latest non-missing values
- Works with linear trends, but not other trends
- Doesn’t work well for seasonal data
- Spline interpolation: Uses splines (piecewise polynomials)
- More computationally expensive than previous approaches
- Better than polynomial interpolation because can fit data well while still using low-degree polynomials
- Smoother fit than linear interpolation because can curve
- But needs data to be smooth
- KNN imputation
- Computationally expensive, especially if have lots of features
- STL imputation - break apart into seasonality, trend, and residuals (noise), impute the residuals, then reassemble
- Can capture complex seasonal patterns that other methods miss
- Can handle any type of seasonality, not just monthly or quarterly
Aside from forward/backward fill and moving average, the rest were from this article:
Validation
Warning
Can’t do normal cross-validation with time series
- Cut the time series off at a given point
- Predict the next period using mean squared error
Time Series Components
- Noise: random jitters from things we can’t see
- Noise can be generated by any distribution, but is usually gaussian in nature
- Seasonality: Cyclic behavior
- Trend: Whether it’s going up or down
- A time series is stationary if it has a constant mean and variance (in which case it has no trend and no seasonality)
To find if time series is stationary, use Dickey-Fuller hypothesis test
- Dickey-Fuller checks if the time series has any unit roots
- Unit roots are a feature of time series that indicate if there is something affecting the data making it go away from the mean (seasonality or trend)
Separating the components
These 3 components can be combined in either an additive or multiplicative way:
- Additive: Model the target variable as the sum of the three components
- i.e., , where , , and are the trend, seasonality, and noise at each time step
- Multiplicative: Model the target variable as the product of the three components
- i.e.,
Steps for separating components according to https://timeseriesreasoning.com/contents/time-series-decomposition/ and https://machinelearningmastery.com/time-series-seasonality-with-python/:
- Use smoothing to get rid of seasonality and noise and get the trend component
- Can use centered moving average for this
- Then, decide whether the composition is additive or multiplicative (see above for notes)
- If additive, subtract trend from original time series
- If multiplicative, divide original time series by trend
- Now what you have left is either or , depending on whether you assumed additive or multiplicative composition
- Guess the season length? Here, suppose it’s a year
- todo is there a way to do it without guessing?
- Calculate the average for every January month, every February month, etc.
- This assumes that the data across all of January is more or less the same (works for things like temperature)
- Otherwise, you could get the average of every first day in a year, every second day in a year, etc.
- This gives you the pure seasonality component
- Again, remove the seasonality component from the combined or data
- What you have left is the noise component
Smoothing
- We want to remove the noise to find the “true” series
- This may be a better predictor than the actual data
Modeling
TODO: Take notes on this
Moving Average
Also known as a rolling average
TODO: Take notes on this
Moving Average with Exponential Smoothing
- Exponential smoothing applies exponentially decreasing weights to previous observations
- Because don’t want previous observations to contribute too much
- is a smoothing factor that takes values between 0 and 1
- It determines how fast the weight decreases for previous observations
- The lower the , the smoother it is
- Used when data has no seasonality and no trend
- Only applies level smoothing, no trend smoothing or seasonal smoothing
Recursive formula (don’t need to memorize): , where is the actual value at time
Double Exponential Smoothing
- is the trend smoothing factor and takes values between 0 and 1
- The lower the , the smoother it is
- Used when data has a trend but no seasonality
- Applies level smoothing and trend smoothing but no seasonal smoothing
Recursive formula (don’t need to memorize):
Triple Exponential Smoothing
- Used when data has seasonality and trend
- Applies level smoothing, trend smoothing, and seasonal smoothing
- Uses parameter called (gamma)
Modeling
Auto-Regressive Model
The AR model depends on its own past values (lags) to estimate future values
Moving Average Model
The moving-average MA model depends on past forecast errors to make predictions
Auto-Regressive Model
The AR model only depends on past values (lags) to estimate future values
Can handle trend but not seasonality
Takes 3 hyperparameters:
- p (lag order): number of lag observations in the model
- d (degree of differencing): number of times the raw observations are differenced
- q (order of the moving average): size of the moving average window