Mid-to-long term electricity demand forecasting using machine learning (2/2) — Dynamic realignment with Kalman filters

10 min readOct 27, 2023

Time series forecasting is a discipline that uses different statistical tools with applications across many fields such as forecasting asset prices in the field of finance, tracking planes and estimating missile trajectories in the aerospace and defense sector, and forecasting demand fluctuations in supply chain management. Staying up to date on forecasting techniques is important for businesses to anticipate future trends and outcomes accurately. By extrapolating past patterns, we can allocate resources optimally, and align with upcoming market shifts.

A New Approach

Our team at Sia Partners remains dedicated to staying up to date with the latest developments in time series forecasting techniques. Among the different fields of application, we are particularly interested in electricity load forecasting. In our previous article, we showed a fast and easy-to-implement machine learning method to predict electricity load. While using an independent model per hour improved performance, this method was not able to maintain the same performance during the covid-19 crisis, due to an abrupt change in the consumption behavior. In an effort to improve this, we looked into a new method that uses Kalman Filters (de Vilmarest et. Al., 2023). Kalman filters present two key advantages:

— Adaptability: · They do not rely on a rigid batch model that requires frequent retraining if we want to change one parameter or accommodate new data. Instead, they leverage an online approach that uses the latest observations to adapt our model to emerging patterns.

— Probabilistic Forecasts: They quantify two types of uncertainties:

Epistemic Uncertainty: The uncertainty we can reduce as new measurements are obtained.
Aleatory Uncertainty: The inherent randomness and variability in a system. This type of uncertainty is not reducible with additional information, as it’s a fundamental characteristic of the system. The Kalman filter can model this uncertainty through its state and measurement noise covariances.

What are Kalman Filters ?

Model basis:

A Kalman filter is a continuous Markov model based on a state-space representation. A state-space representation is a framework in which we represent dynamic systems by two sets of equations:

1 — State equations: describe how the system’s internal state evolves through time. It models the relationship between the next state and the current state alongside with any external control input:

where uₖ is the external control input, Bₖ is the control input matrix, Fₖ is the state transition matrix and εₜˣ is the process/state noise.

2 — Observation equations: They describe the relationship between the system’s internal state and the observations it generates. These measurements or observations are affected by noise, imperfections in sensors, etc. The idea is that in general, the internal state in a Kalman filter is a hidden variable that can’t be observed directly. The observed variable is therefore the link enabling us to access the internal state.

A Kalman filter’s primary purpose is to estimate the state distribution at time t not only based on the estimated states up to instant t-1, as a simple Markov model would, but also on a series of past measurements observed over time. With these observations, the model corrects its previous estimations of the system’s state and propagates this corrective term to update the estimations. This ability to adjust itself according to the received observations allows the filter to match a new pattern in the data.

Model’s assumptions

A Kalman filter is based on the following assumptions:

1 — The state representation is linear (this requirement can be relaxed using extended or unscented Kalman filters).

2 — The initial state is normally distributed with a covariance matrix that will be denoted P₀:

3 — The modelled sources of noise both follow a gaussian distribution:

State noise distribution :

Measurement noise distribution :

How the Algorithm Works

The algorithm operates recursively and is carried out in two steps: the prediction step and the correction step. There are two different notations that we must keep in mind X (t ∣ t-1) which is the prediction of the Kalman filter without the observation and X(t|t) which is the corrected prediction after the reception of the observation.

Prediction Step

The prediction step is executed at each time step. It computes the next state using this equation:

and updates the prediction covariance by increasing uncertainty:

b. Correction Step

The correction step is executed only when we receive new observations. This means that the algorithm won’t terminate due to a lack of observations and will handle new measurements when they are received. This step calculates the observation we should have measured if our predicted state were true thanks to this equation:

and updates the observation variance matrix:

Then we compute the innovation term Yₜ∼, , which is the difference between the measured observed value and the predicted one. The Kalman filter gain Kₜ is also computed. We can interpret this gain as a compromise of uncertainty. In other words, the gain will determine the balance between the uncertainty on the predicted state based on the state transition model P(t ∣ t-1) and the uncertainty on S(t ∣ t-1) the more we are sure of our measurement compared to our prediction, the more we trust the observation and the innovation:

Our posterior state is then corrected proportionally to the innovation term

and our reduced posterior covariance is

Application of Kalman filters in time series forecasting

Sixty years after their introduction, Kalman filters are still widely used and mainly applied in navigation processes, robotics, and control systems. They were even used in the Apollo program to give precise estimates of the spacecraft’s position In this article, we are interested in their use in time series forecasting. But what is our observation and what is our state equation in this case?

The idea implemented by de Vilmarest et. al is to fit a batch mathematical model on the training data to capture the complex relationship between explanatory variables and the target. The effects of each feature on our target will then be corrected with a clever use of Kalman filters to adapt to changes in consumption behavior.

Step 1: Training the Base Model

In our case, we went with one of the models discussed in our first article: a Generalized Additive Model (GAM) which is a statistical model that allows for non-linear relationships. We followed the same approach as in the previous article: we trained one GAM per hour to adjust to the intrinsic variation in energy demand between times of day. Training the GAM on our historical data will allow us to establish the following relationship:

where the effects f1 … fd are either linear or nonlinear. In our case, the effects are built using linear combinations of spline bases.

Step 2: Correcting the Base Model with Kalman Filters:

First, we freeze the effects f1 … fd learned on the training set for the features and correct the model by applying a multiplicative factor to these effects. By doing so, we consider that a new pattern in the data will only represent a change in the magnitude of these effects rather than a change in the relationship between the features and the target. This simplification reduces the problem dimensionality from ∑ᵢ nᵢ(mᵢ+1) + 1 where i ∈ [1, d] (if we modified the coefficients for each spline inside of f1 … fd) to d+1 (where mᵢ is the degree of splines of fᵢ and nᵢ the number of splines inside of fᵢ).

To summarize, our new covariate vector is:

where the effects are standardized, and we are estimating a vector such that :

In Kalman filter “vocabulary”, the internal state variable will then be the multiplicative factor theta(t) representing the true underlying relationship between the target and the explanatory features and transition state matrix will be our covariate vector f(xₜ). The Kalman filter will, by modifying its state with the introduction of a new pattern, modify the relationship between the explanatory variables and our target. Thereby also modifying the prediction. In other words, the Kalman filter will be added to a base model like the GAM to correct it once it starts becoming irrelevant for the new pattern.

Example of application to net load forecasting:

Methodology

We implemented and applied this method to forecasting the net electricity load in France before and after the COVID-19 crisis. The spread of this virus significantly disrupted energy consumption, making the model extremely inaccurate and thereby irrelevant. The idea was to leverage the proposed algorithm to detect new patterns in electricity demand without adding new exogenous variables and with limited available data.

We divided the dataset in two parts: the historical data (before the break point of Covid) and the source data (the crisis data). We fitted our GAM on historical data. The effects of the explanatory variables were extracted from the fitted GAM and put together to form the covariate vector.

Let’s illustrate this by an extremely simplified yet clear example:

Suppose we have two explanatory variables (or features): temperature and day of the week (between 1 and 7), and that before Covid, the GAM model had learned the following relationship:

where f₁, f₂ are spline functions whose coefficients were also learnt by the GAM.

The state of the Kalman filter in this case will be θt₀ = (2, 0.5, 1). With the appearance of a new pattern, the Kalman filter will adapt θt₀ to the new relationship between the electricity load and the features and correct the prediction of the GAM. Why Not Just Retrain?

Simply re-training the new GAM model would not be a good solution as:

Training a GAM only on crisis data would require waiting until having enough data points, therefore losing the ability to adapt quickly to changing behaviors.
Training a GAM on all historical data as well as the source data would lead the model to consider the crisis data points as outliers because of their scarcity.

Results

Let’s dive in the final part of this article: results. The application of this correction method on load forecasting reduced the MAPE by more than 50% and the RMSE by around 20% when applied to the crisis data. When it comes to computational costs, adding a Kalman filter correction leads to a sixfold increase in computation time compared to our base model alone. To give you a clearer idea, applying the correction to 5136 data points required approximatively 96 seconds vs 15 seconds for the GAM alone making its use in real-time application undoubtedly feasible.

The obtained results during the first days of the Covid crisis are given in this plot:

There is an adaptation phase at first that takes around 3 days (72 data points) before the GAM corrected with Kalman filters catches up and then outperforms the base GAM. However, as mentioned above, we have one GAM per hour and thus one Kalman filter per hour to correct each GAM. Therefore, each filter only needs 3 data points after initialization to align with the observed curve, so the adaptation phase could be significantly reduced if only one GAM is used.

Furthermore, we are currently using a random initialization for the Kalman filter parameters and believe that an improvement of the initialization method could further reduce the duration of the adaptation phase (for example, fitting a Bayesian regression on the training set to find a better θt₀).

Once we reach the end of the curfew period, our initial GAM model per hour is once again relevant. Yet still the predictions of the Kalman filters are more accurate.

Conclusion

The application of Kalman filters to time series forecasting seems like a promising approach. Its ability to adapt to changes in the data and quantify uncertainty leads to an increase in performance that could have a significant impact on industries such as energy or finance.

Kalman filters are a good compromise between computational cost, uncertainty modelling and the desired level of accuracy. This makes them suitable for real-time applications while staying conceptually straightforward and easy to implement.

Other approaches to perform dynamic realignment exist, such as unscented Kalman filters that relax the assumption of system linearity or particle filters accommodating the non-gaussian state distribution or even LSTMs and RNNs. It is important to assess the trade-off between the accuracy improvements achieved through these methods and the associated computational costs to determine the best choice for the application at hand.