Accurately predicting a building’s future energy usage by learning from history is useful when managing building energy use, preparing budgets, or when measuring and verifying the results of a pursued energy efficiency measure.

One common approach to predict building energy use is to train a linear function that relates current building variables to future energy usage. The linear function predictor is a function that takes in k numerical building variables vi (1 ≤  i ≤  k), multiplies each variable by a corresponding numerical weight wi, and then outputs a prediction p. Mathematically:

To learn the weights of the linear function, one can cycle through past building data and use the least mean square update. The least mean square update is:

for each 1 ≤ i ≤ k where 0 < α < 1.0 is how much to adjust the weights, and T is the actual energy usage that p should reflect (using observed historical data makes the model “supervised”). The following is an example of a supervised training data set using weather data (inputs) to predict next-day building energy (output).

Date Cooling Degree Days, Heating Degree Days* Actual Building Use
Tuesday 50, 0 200 kWh
Wednesday 54, 0 210 kWh
Today 46, 0 ?

*Heating and cooling degree days are units of measure used in the building energy efficiency community that measure the degree of heating or cooling a building needs in a day.

In the end, the predictor will have the following structure:

Jin Yang , Hugues Rivard, and Radu Zmeureanu in the paper “On-line Building Energy Prediction Using Adaptive Artificial Neural Networks” found that the performance of linear functions for energy prediction depends on the choice of input variables. Historical building information can yield significantly more accurate predictions, but only when the variables reflect specific points in time.

Determining which points in time to use for prediction is tricky and not always intuitive. Therefore, it makes sense to generate new variables that encode useful historical information rather than handpicking them. These new variables can also be combinations of other variables. A new intermediate part of the predictor, known as the variable constructor, can construct these variables.

One tried variable constructor is the neural network. Neural networks can learn complex and long-term relationships between variables. They have been successfully used on ambitious tasks like autonomous helicopter flying, playing chess, protein folding, and image recognition. Unfortunately, neural networks have a reputation of being complex to use.

In this article, we review simple (yet powerful) versions of neural networks. Specifically, we will review the echo state network, a neural network that can capture time-related relationships between variables on energy prediction applications.

About Neural Networks

Echo state networks rose from reservoir computing, a field known for using randomized models to solve problems. The random dynamics of these models can yield new variables that encode relevant aspects of a building’s history and relationships between variables.

Let’s start from the beginning. A neural network consists of a set number of nodes and can look like this:

An example neural network

A node takes input signals and outputs a number:

Each of the n node input signals xi is multiplied by a corresponding numerical weight, wi. The xiwi products are summed up and passed through a nonlinear function to compute the output. We will assume the nonlinear function is the tanh function, a common choice for echo state networks.

tanh function

The input signals into a node can be external to the network (e.g. building variables), the output from other nodes, or the output from any node on a previous time step.

When an input to a node comes from a node at a previous time step, the two nodes have a recurrent connection between them. Recurrent connections can be between the same node.

About Echo State Neural Networks

When a neural network has a recurrent connection, it is considered a recurrent neural network. An echo state network is a recurrent neural network where the node connections are set and never changed. Typical placement of an echo state network in an energy consumption predictor looks like the following:

One should feed data into the predictor chronologically to take advantage of the recurrent connections.

There have been various echo state network structures proposed.  A classical echo state network has the following components: connections from input variables to nodes and recurrent connections between nodes. A standard way of connecting the input variables to the nodes is to connect each input variable to all the nodes with numerical weights of equal magnitude but randomly signed.

Randomly assigned weights from inputs v1 and v2 to nodes

Determining how the nodes should connect is key.

A tried way of recurrently connecting the network nodes is to do it randomly. The recurrent connections are set to have a numerical weight of either 1 or 0 (with 0 equivalent to having no connection).

Then, the connection weights are divided by something called the spectral radius. The spectral radius of a matrix is roughly the maximum amount a matrix multiplied with a vector can scale up the elements of the vector. The matrix we are interested in is the matrix representation of the recurrent weights of the network. See the following diagram for details.

Matrix representation of an echo state network

Other Echo State Networks

In Delay Line Echo State Networks, information from external inputs is passed down the nodes, but slowly decays until it ends.

Delay Line Echo State Network

In a Cycle Echo State Network, information from external inputs is passed around the nodes, while decaying with each pass.

Cycle Echo State Network

Self-Feedback Echo State Networks are unique in that information is passed back to the same node, while decaying with each pass.

The network types described above can also be combined to form more complex networks.

Echo State Neural Networks to Predict Building Consumption

Guang Shi, Derong Liu, and Qinglai Wei, in the paper “Energy Consumption Prediction of Office Buildings Based on Echo State Networks”, tried the various echo state networks to predict the hourly building energy usage due to lights, sockets, and air conditioning.

They found that the energy usage was cyclic. Out of all the networks used, classical echo state networks incurred the least prediction error. However, the other simpler and explicitly structured echo state networks performed well, particularly in combinations.

When there were no recurrent connections in the network, the predictions were poor, suggesting their inclusion is worthwhile to improve model accuracy.


Echo state networks are relatively simple to train compared to other neural networks promoted in the machine learning literature. Their ease of training comes from the fact that the recurrent connections are never adjusted. Even simple versions of echo state networks can lead to accurate consumption predictions and should be considered for commercial building energy consumption prediction models.