Innovating Energy Forecasting in Distribution Networks with GAMLSS.
An extension of GAM models for superior peak forecasting
After years of delays, decision-makers from all over the world have lately agreed to take proactive measures to limit climate change. In Europe, energy & climate policies have made climate neutrality legally binding in 2021, in order to reach carbon-neutrality by 2050. One of the top priorities to achieve this goal is to switch from fossil fuel-based electricity generation to low & no-carbon electricity generation. This ambitious policy will intensify mid-to-long term demands on Europe’s electricity networks. In addition to this structural trend, short-term variations known to be heavily influenced by weather & seasonal factors, are expected to be increasingly volatile. The accuracy of energy forecasts must therefore be continually improved to avoid disruption to the service provided to the end users.
The development of well-known time series methods such as ARIMA or machine learning models has brought a significant leap in the quality of predictions. For example, a model for each time instance was implemented in a previous article, showing improved forecast accuracy by fitting a separate GAM (Generalized Additive Model) for each hour. However, when it comes to Low Voltage Networks, these temporal patterns can be significantly influenced by individual consumer behaviours, which can be harder to model with significant accuracy. Some new techniques such as probabilistic forecasting able to fit uncertain nature of end-user consumption hence surged in their application for grid loads [1].
One of the key challenges in load (=consumption) prediction is forecasting the amplitude of the peaks, often poorly interpretable by conventional forecasting methods. In this article we aim at tackling this problem in a probabilistic framework without sacrificing interpretability, commonly essential to ensure the adoption of the method by forecasting teams.
Recently, C. Gilbert et al. [2] have proposed a new probabilistic method with promising results on such distribution networks. This novel approach focuses on predicting daily peak demands in both amplitude and hour of the day through the application of the GAMLSS (Generalized Additive Model for Location Scale and Shape) distributional regression model. Building on Heka’s dedication to delivering state-of-the-art methods to clients, our team evaluated this method in the context of one of UK’s electricity distribution operators.
I. GAMLSS: a probabilistic & flexible extension of GAM
a. Probabilistic framework
In the probabilistic framework, models use distributions to represent both explanatories’ variables (i.e. features) x and variables of interest (i.e. target variables) y.
While several models like Hidden Markov Models (HMM) or Bayesian networks present interesting results, they often lack interpretability and require excessive computing resources. For this reason, we investigated Generalized Additive Model (GAM) instead.
b. Generalized Additive Models
GAMs are probabilistic models that extend generalized linear models to capture non-linear relationships between features x = (x₁, x₂, …, xₙ) and target variable y. GAMs model target variable as a linear combination of n non-linear elementary functions fᵢ of features xᵢ:
One of the key advantages of GAMs is the possibility of modeling non-linear behavior by integrating them into each auxiliary function. In practice, instead of using linear coefficients to model the relationship between the iᵗʰ function fᵢ and the iᵗʰ feature xᵢ, the model uses smooth basis functions bⱼ, such as splines for example. For that purpose, fᵢ is the weighted sum of the different smoothing functions bⱼ applied to xᵢ.
The equation of the iᵗʰ function fᵢ defined for the iᵗʰ feature xᵢ is therefore defined as:
Here, bⱼ represents a smoothing function, βⱼ the coefficients estimated by the Rigby and Stasinopoulos backfitting algorithm, k an hyperparameter controlling the model complexity, and xᵢ the iᵗʰ explanatory variable.
GAMs are based on the strong assumption that the target variable follows a distribution from the exponential family (Normal, Poisson, Beta Gamma…). As such, it can be read in (Eq. 1) that the only effect captured by the GAM is the mean of the distribution. All other effects that do not impact this parameter are implicitly subsumed in the error term.
Given this observation, one could naturally expect that taking other parameters into account, such as variance or higher order moments like skewness or kurtosis for example, would bring significant improvement to the method, allowing for modeling of other distributions than the exponential family ones.
c. Generalized Additive Models for Location Scale and Shape
GAMLSS (Generalized Additive Models for Location Scale and Shape), initially proposed by M.Stasinopoulos et al. in [4] along with a R package, offer flexibility to choose from over a hundred distributions for the target variable.
We implemented the GAMLSS in Python. Instead of rewriting the R code into Python, we used the rpy2 library, seamlessly integrating R functionalities into Python environment.
In the rest of the study, this aggregation of models is referred to as the aggregated model.
The equation for the aggregated model is:
In order to predict peak timing probability, we adopt a ‘time-to-event’ approach inspired from the work of A.Bender et al. [3], treating the day as a sequence of ‘non-peak’ times until a peak occurs. This suggests using a Binomial distribution. The aggregate model will be tested and compared with a simple GAM for demand prediction. But first, let’s delve into the data.
II. Presentation of the Case-Study
Feeder systems are essential components of the power distribution network. They deliver energy from substations to end-users like homes, businesses, and industries. For the rest of this article, we focus on the use-case of a set of five feeder systems in the United Kingdom that have been chosen based on the availability of their data. Additional information on such feeders cannot be provided for compliance reasons. The target series to be forecasted is the hourly electricity demand in MWh for each feeder, excluding electricity generation. Net-load forecasts (i.e. energy consumption after subtraction of the generation) have not been investigated in this article.
III. Choice of the features
Choice of features is key to build the forecasting model.
As electricity consumption varies throughout the day and across seasons, features such as hour of the day, day of the week, and day of the year are critical. For instance, consumption patterns differ between 4 AM and 7 PM. Seasons have significant impact on energy consumption aswell, as usage such as heating or air conditioning devices vary greatly between summer and winter seasons.
Weather features such as temperature, wind speed or wind direction play a significant role in predicting electricity consumption. In order to retrieve such weather data, we used a proprietary tool developed by the Heka core team called Weather & Climate. This platform provides access to a comprehensive database of weather data, for more than 30 variables and 5 different weather providers across the globe.
Finally, we used the peak value of the previous day and the peak value of the same day in the previous week. These variables ensure that fluctuations and trends in consumption patterns over time are accurately accounted for.
IV. Results
As stated in Eq. 3, the aggregated model requires a distribution for the target variable to be specified. While a normal distribution is often used, it may not always be the best fit as it is symmetric and hence does not account for data skewness. The same argument can be made over extreme values, as the normal distribution has thin tails, it underestimates the likelihood of extreme events. To address this, the distribution that best fits the target variable, based on the Akaike Information Criterion (AIC), has been chosen from the available distributions in the R package. This method is named fitDist.
To go deeper into the choice of the distribution for the target variable there is a method named chooseDist that, unlike fitDist, compares parametric conditional distributions using a previously fitted GAMLSS model, one that includes explanatory variables. It updates the existing model using different families of distributions and then calculates a variety of GAICs (Generalized AIC) for each distribution (e.g., AIC, BIC, Chi-square). This will help with the model selection but as it comes with a high computational cost, this method remains out of the scope of this article.
Two versions of the aggregated model will be considered and compared with a GAM. One with a Normal fitted distribution and the other with the distribution given by the fitDist method.
The general context for both scenarios is the following:
- Training Samples: 01/08/2021–31/07/2022 at the hourly mesh
- Testing Samples: 01/08/2022–31/12/2022 at the hourly mesh
- Forecast Strategy: One day ahead
- Benchmark model: Generalized Additive Model (GAM)
- We limit the scope to four typical feeders.
Below is an illustration of the demand for the four feeders during a randomly selected period:
Table 1 gives the Mean Absolute Percentage Error (MAPZ) on the 4 different feeders for the 3 considered models, i.e. the GAM model, the Aggregated model with a normal fitted distribution (AMN) and the Aggregated Model with a distribution given by the fitDist method (AMF). The Delta column gives the MAPE reduction (in %) with respect to the reference GAM model.
As presented in table, the AMN outperforms the GAM by approximately 14.7% in average[MN1] for the four selected feeders. Such a clear improvement is very convincing but remains to be confirmed on an extended number of feeders and networks.
Optimizing the distribution for the target variable with the fitDist method yields a notable average performance improvement of 16.8%. However, this observation needs to be nuanced: the performance of a specific feeder appears to decline subsequently. This is a case of overfitting: fitdist identifies the optimal distribution for the target variable using the training samples only. If the distribution of the target variable remains consistent across testing samples, adopting this new distribution enhances the GAMLSS performance. Conversely, if there’s a shift in behavior, the trained model might struggle to adapt, resulting in diminished performance compared to a simpler distribution like the binomial.
V. Limitations of the method
a. Chaotic feeders
Some feeders have more complex patterns that can be hard to discern by chaotic. Let’s consider the feeder 110202:
In Fig. 4, the model probabilities for consumption peak timing throughout the day lack discernible patterns, suggesting that peaks could occur at any time in the aggregated model. This issue arises from the limitations of the linear distributional regression model, which fails to capture the intricate relationships between variables. In contrast, more sophisticated yet less interpretable models like neural networks could identify these complex interactions. As a result, the aggregated model tends to predict the average of the target variable (Fig. 5) as it progresses through training, leading to poor performance.
b. Computational cost
Lastly, the GAMLSS used to create the aggregated model (Eq 2.) currently employs the backfitting algorithm initially developed by Rigby et al. [6] nearly 20 years ago. This algorithm’s convergence rate is rather slow compared to current solvers, which leads to significant training time. Depending on the chosen complexity of the distribution of the target variable, GAMLSS models can be up to 60 times slower in training compared to Generalized Additive Models (GAMs). Despite its utility, there is significant potential for efficiency improvement.
Conclusion
In this article, a new electricity demand forecast model with a special focus on daily peaks has been presented. Being an extension of the GAM, the model retains its adaptability to handle complex data with the various distributions, while delivering an improved performance especially on the intensity and timing of consumption peaks. Finally, the presented model maintains a high level of interpretability by preserving the linear aspect of GAM family models.
Such a model could have practical applications in time series forecasting, such as predicting energy demand, forecasting the influx of websites visitors, or anticipating foot traffic on premises.
Probabilistic modelling is an active research field and multiple further improvement are to be expected in the near future, like the Validation Generalized Deviance (VGD) method to fine-tune the degrees of freedom in a smoothing situation.
Probabilistic modeling is set to evolve in the coming years, and we invite you to follow us as we keep you updated on its potential breakthroughs.
Written by Vincent Tchoumba
References:
[1] J.De Vilmarest et al., Adaptive Probabilistic Forecasting of Electricity, 2023
https://arxiv.org/pdf/2301.10090
[2] C.Gilbert, J.Browell, B.Stephen, Probabilistic load forecasting for the low voltage network, 2023
https://arxiv.org/pdf/2206.11745
[3] A.Bender, A.Groll, F.Scheipl, A generalized additive model approach to time-to-event analysis, 2018
https://epub.ub.uni-muenchen.de/66323/1/1471082x17748083.pdf
[4] M.Stasinopoulos, R.Rigby, Generalized Additive Models for Location Scale and Shape (GAMLSS) in R, 2007
https://www.jstatsoft.org/index.php/jss/article/view/v023i07/207
[5] Rigby, R.A. and Stasinopoulos, D.M. (2005), Generalized additive models for location, scale and shape.
https://doi.org/10.1111/j.1467-9876.2005.00510.x