Mid-to-long term electricity demand forecasting using machine learning (1/2)
In the current context of very high energy prices in Europe on the one hand, and electricity production issues on the other hand, energy supply is a very sensitive topic. In today’s article, we’ll focus on a key aspect of the electricity value chain: the prediction of consumption. In our case, our perimeter will be all the consumers of France.
To do so, we’ll present a pure machine learning approach that has the advantage of being fast and relatively easy to implement. Numerous other approaches could be considered. In this first version of our series of 2, we will present models that can be easily implemented.
Presentation of the use case and data fetching
For this study, we will train and test a machine learning model that predicts hour by hour the electricity consumption of France.
It should be noted that France is Europe’s 2nd largest electricity producer and consumer after Germany, with an annual consumption of around 470 TWh.
Also, France has the largest share of electrical heating in Europe in volume and percentage. For this reason, it is the highest thermosensitive country which means that a modification of the temperature has the largest impact on demand. Typically at 7PM in Winter, if the temperature is 1°C below seasonal norms, the consumption will increase by about 2.3 GW, or approximately 5%. Thus, in the modelling, we’ll try to accurately capture this behaviour.
The first step of our analysis will be to fetch the appropriate data for our use case.
Target
For this study, we need the consumption data of France, our “target” data. The ODRE centralised numerous types of energy related data. Among other things, we can find the historical hourly electricity and gas consumption here.
Features
We also need some features that are relatively easy to find and convey information that will help us deduce the consumption. Naively, we think of time features first, that is to say the hour of the day, the day of the week, the month of the year, among others. These features are immediately available. We can also think of the holidays that may have an impact on consumption. We will see later what set of features suits our model best, as many combinations are possible especially when trying to avoid overfitting.
It is also known that the weather plays a key role in energy demand: when it gets cold, people turn on their heating systems, some of which work with electricity (electric radiator, heat pumps), and when it gets hot they tend to turn their air conditioning system on. Enedis, which is France’s main electricity distributor, provides some historical temperature and irradiance data here that are used internally to make load prediction.
More precisely, Enedis provides averaged and smoothed temperatures to make sure that the temperature used to predict the consumption is a good proxy of the true impact that temperature has on the network. Indeed, using the raw temperature measured by weather stations may not be a good solution in our case for the following reasons:
- Heating systems may not be turned on if the temperature only drops by a few degrees. But they will be turned on if the temperature decreases by few degrees during several days
- Heating systems themselves have a non-null reaction time and cannot be turned on instantly
- Smoothing the temperature helps keep the underlying signal by filtering the noise, and thus builds more robust models.
Different types of smoothing methodology are possible. The simplest one is probably exponential smoothing.
Here is a table that summarises all the feature that will be used for the modelling:
Preliminary data analysis
The overall analysis and modelling is realised on a Jupyter Notebook. The piece of code below is used to format the data. We choose to work in UTC timestamp, as pandas cannot easily parse data with multiple timezones.
Now that we have our data ready, we plot the different timeseries to get a first look at the patterns we may observe.
This confirms our intuition of the strong relationship between temperature and electricity consumption. We also see a strong seasonality pattern in the load signal.
The plot below will help us understand how the historical load evolves with the temperature. The data has been aggregated at a daily level (mean value) and we can see that, regardless of the day of the week, the dependency is quite similar: below 17°C, the consumption increases when the temperature decreases.
The only difference between different days of the week is the consumption level: It is lower during weekends compared to weekdays. This actually does make sense as during the weekend many companies are closed and thus consume less energy. We will make sure to capture this behaviour by adding the day of the week to the model features list.
Modelling
Introduction
Now that our dataset is ready, let’s get to the modelling part.
Time Series forecasting is a very active research topic. Heka maintains a strong technology watch on this field.
For this article, we will mainly focus on machine learning models, although many other modelling techniques could be used such as ARIMA/SARIMA/SARIMA, deep learning models, bayesians models, and the list goes on.
Methodology
Sia Partners has been heavily investing into its tech ecosystem Heka. Among other topics, a strong expertise in time series has been developed. Here is a typical workflow that will be implemented to perform time series forecasting:
Anyone that has ever worked with time series knows that it takes a lot of time to compare models or sets of features, iterate, build metrics functions, filter data, etc.
At Heka, we developed a customised package, named sia_ts_modelling that is now used internally to automate the modelling part. This initiative offers a tremendous list of advantages:
- Make it way faster to build a new model from scratch
- Avoid careless mistakes when rebuilding the same functions
- Include some Heka specific knowledge on time series which is not easy to deploy when not packaged (models per hour for instance, defined below)
- Expose a benchmarking table, allowing users to easily compare each model’s performances, choose and deploy the best one
- Provide an easy way to retrieve results
- Easily include and compare any user defined metrics
Parameterisation
Let’s now build our models and test different methods and features using our previously mentioned library where you can find a function named pipeline that can be used to create several models based on input features and chosen models. Here is how we run it in Python.
The prerequisite parameters are the following:
- Different sets of features to be tested
- All the models to be tested
- Train / test period split
- Metrics to be used
Several models are tested:
- Lasso Regression
- Random Forest
- A generalized additive model (GAM), provided by the Pygam library
The declaration of all models and features is done in one place only which perfectly illustrates two main features of our library: Simplicity and readability of the code making it more attractive for users to rely on this customised package for future usage.
Also, another technique that we will use in this benchmark is the concept of model per instant. That is, instead of having one big model valid for every single observation, we can build one model per instant which translates to one one per hour in our case. The graph below explains the way this methodology is implemented:
There is a very significant performance advantage of using models per hour instead of having only one big model: in our case, the MAPE with a GAM model will go from 5–8% to 3–4%. This is explained by the total decorrelation obtained by having one model per hour. This is even more applicable for linear models as they are optimised with the underlying constraint to respect a certain continuity from one value to the next for each temporal feature. This would not be the case for a Random Forest model where features’ separation is already a built-in attribute of the model.
The only limitation that one should pay attention to when building one model per instant is to make sure there is a sufficient amount of data available for each instant to avoid underfitting. In our case for instance, it would not be reasonable to have one model per hour of the week, as each model would be build with around 52 * 5 observations.
Moving to the last part, we will now talk about the results’ interpretation.
Results
Model selection
The pipeline function returns three outputs :
- The benchmarking dataframe (displayed in the graph below)
- The fitted models as a list
- The prediction dataframe. Its column are the forecasts of all configurations
Here is what the benchmarking dataframe returns:
With this comparative table, we can see that the best model is a generalised additive model (GAM), with a MAPE on test data of ~2.03 %, which is already quite satisfying. We can also plot the best prediction versus the historical signal to have a better visualisation of the outputs:
Model analysis
Having a model with good performances is a very good first step, but our analysis cannot be limited to this. We need to do some deep dives starting by focusing on residuals to check if we didn’t miss out on any important phenomenon and also to make sure our model is not overfitted. Then, we will take a closer look at partial dependence plots.
Residuals analysis
Residuals are the difference at each timestamp between the predictions and the observations. This analysis gives us a relevant insight on the model accuracy. In our example, residuals are expected to :
- Be uncorrelated. If a correlation exists within the residuals, it means some information has not been captured by our model.
- Have a mean of zero. If the mean is not null, then the forecast is biased.
More details can be found here.
For any time series analysis, checking that the residuals follow the 2 conditions above is mandatory before going forward with industrialisation. Let’s plot the residuals of our model using the data frame containing the output prediction:
Even though the residuals do not follow a perfect gaussian distribution, we can say that the result is acceptable. In fact, our model does have a limited set of features and is missing more elaborate ones such as holidays, time change and potentially other weather features like the wind speed or the irradiance that can have a non-negligible effect on the demand.
Partial dependence plots
It is necessary to study the model a bit more to try to understand how precisely each feature influences the model. The partial dependence plot is a relevant tool for that.
Our analysis will focus here on the partial dependence plot of the temperature, even though we could also take a look at this plot for the other calendar features (day of week, month, hour of the day). Let’s take a look at the partial dependence plot for the temperature:
From this plot, we can easily validate several prior assumptions :
- The effect of space heating on the network starts to be visible below 15°C.
- The effect of space cooling is visible above 20°C, but is much harder to see.
- The model catches the fact that the impact of space heating on the electricity demand in France is much higher than the impact of space cooling.
Shap values
Based on game theory, shapley values capture the marginal contribution of each player (=feature) on the game (=target) result. These values are becoming a very popular way to understand how each feature impacts the overall model prediction.
In our use case, they will help us to understand in one plot what feature impacts the output the most. The code snippet to build shapley values can be found below:
The average shapley value on the entire dataset for each feature returns the following results:
This confirms our intuition that the temperature is the feature having the biggest impact on the forecast. Calendar features such as period of the year and day of the week come in second and third positions. Whereas, the irradiance and the holiday period have a marginal effect on the prediction.
Conclusion
In this first article, we saw how to easily build a machine learning model for time series forecasting with an overall good performance of about 2% MAPE. Heka expertise was much appreciated and pointed out throughout the whole presentation as it has offered us a very useful internally-developed package with relevant features that can be easily replicated in other projects.
The second version of the article will focus on the modelling of uncertainty which is becoming a very hot topic nowadays in the field of time series forecasting. We will also address more elaborate modelling techniques to complete our overview of the main models used in this field.