Estimating energy use of existing dwellings – initial data analysis

The Zero Carbon 2050 challenge has been set… We need to reduce energy consumption from our buildings through energy efficiency, improving heating and cooling and by generating renewable energy. No small task…

The power industry has achieved significant carbon reductions in the last 10 years and is one of the success stories of climate change mitigation in the UK. Low carbon heating and cooling is on the verge of a similar revolution with the Spring statement’s announcement that no gas boilers should be installed in new homes from 2025.

Unfortunately, we have a problem with energy efficiency: we know we have to drastically reduce how much energy buildings use, but data on current performance is almost non-existent. Energy information in Energy Performance Certificates is useful but does not provide an accurate estimate of actual energy use in a basic unit which everyone understands: for example kWh (like measured at the meter).

Time is running out

We think it is unrealistic to expect accurate energy data to be available for every building in time to inform an urgent response to climate change. But what if there was a way of estimating the energy used in every building by using the small amount of robust in-use energy and carbon data that we already have? It might then be possible to translate the London-wide Zero Carbon target to a numerical target for each and every building in kWh.

Simply (or not, as is the case) that is what we are trying to do: guessing (or estimating should we say) the approximate energy use of every building in London by developing a predictive energy use model.

The journey we have embarked on is an enormous technical challenge, and also a social challenge with huge barriers. In this data driven economy there are justified concerns about privacy and third party ownership of information, meaning we have already come across major stumbling blocks…. But we are undeterred.

We are at the beginning of the project, and are taking small steps to help get to the summit.

Our first 6 weeks of data exploration focusing on existing dwellings

The first phase has been to identify robust in-use energy data for domestic buildings, to use this to identify trends and then to train a calibrated model to estimate energy use for existing dwellings.

We focused on two sets of public data, NEED (National Energy Efficiency Data Framework 2014) and BEIS’s postcode level gas and electricity meter estimates (on dwellings with EPCs). We find that some statistical correlations to consumptions can be drawn in some typical household parameters like floor area, building age and EPC bands. From these attributes, we applied a machine learning linear regression model for predicting consumption on both datasets. The results suggest that there is potential for a model fitting 50-60% of the data to be achieved, depending on how well we can set up the inputs by cleaning and engineering new proxies.

Charts mapping the actual energy consumption for each of the dwellings we have data for against the ‘predicted’ energy we have estimated through one of our calibrated models.

A summary of the analysis is detailed below.

National Energy Efficiency Data-framework (NEED) Database

First we looked at the National Energy Efficiency Data-framework (NEED). This dataset provides the electricity and gas consumption for ~4,800 households in London with attributes associated to each dwelling.

Correlations with household characteristics

We started by examining correlations between consumption and typical building parameters.

The graphs below show the statistical distribution of each category as boxplots. The boxes represent central 50% of the population in the group with median as the middle line. The dots represent outliers which fall beyond 3 times the standard deviations from the median.

Box plots showing that as the size of dwelling increases the majority of the populations energy demand increases, but that there is significant overlap.

Unsurprisingly, there is clear positive correlation between both gas and electricity consumption with increasing floor area.

Box plots showing that energy consumption reduces for younger buildings, but that there is little improvement post 1950.

We identified some inverse correlation between building age and consumption. Perhaps more of a distinction between buildings before and after 1950s. Interestingly regulation of energy efficiency in homes was introduced in 1972, with major overall away from an elemental method to a whole building assessment in 2006. The resolution of the age bands in our dataset are not sufficient to show whether this has been effective.

Box plots showing that energy consumption reduces with house type, however this has not been normalised against floor area so may be reproducing the correlation shown above.

Correlations between consumption and different property types like a detached house, bungalow, flat etc, can be observed if they are ordered accordingly. There is some ambiguity in where bungalows should sit, amongst the detached, semi-detached, end-terrace and mid-terrace housings.

Box plots showing measured annual energy consumption against EPC band and a strong correlation, especially for lower bands.

EPC bands also showed strong correlation with total annual consumption up to band F.

Box plots showing that deprivation is an indicator of energy consumption, the more deprived a local area the less energy it is likely to consume.

The NEED dataset provides the index of multiple deprivation, a measure of different deprivation domains such as income, employment, health and disability, education and training, barriers to housing and services, living environment and crime associated with each household. The dataset includes households in the bottom five groups out of ten, with ranking 1 as the most deprived groups of population in the country. The data suggests that the more deprived a household is, the less energy they consume. This parameter can be a powerful indicator of actual consumption but does not reflect the energy efficiency of the building stock.

In general, all correlations observed appear to be stronger in gas than electricity. This would make sense as gas is used as the main fuel for space heating and hot water in most homes.

Findings from postcode level meter estimates on domestic properties with an EPC

Another set of data we investigated was estimates gas and electricity consumption at a postcode level available from BEIS. Along with the sub-national annual statistics on gas and electricity consumption, BEIS published 2 sets in 2013 and 2015 of total consumption according to the meters associated at each postcode. Whilst it’s great to have metered data at a postcode level, there is no description as to what type of dwelling the data comes from (Big problem). Without being able to relate the energy data to a dwelling, the area of the dwelling, age of the dwelling etc. we were at a dead end. To make the metered data useful for our analysis, we had to find it a home.

To do this, we used publicly available EPCs to match the postcodes of buildings with these aggregated meter estimates. This allowed us to get a better picture of what type of dwelling the metered energy data was likely to have come from.

By all means this is not the equivalent of matching buildings with single meter readings, but the dataset provides a median value along with the number of meters associated at each postcode which could be representative of the real consumption if all the properties in that postcode are similar. That is an assumption we have had to make in the absence of a complete dataset.

To sense check the approach of mapping postcode data to EPC dwelling information, we looked at the gas consumption for different property types: houses, bungalows, maisonettes and flats where we expect to see a decrease in the total energy consumption profile. When we limited consumption data to only postcodes estimates with the smallest number of meters – 6, we get a distribution that seems to agree with building physics.

Box plots for actual gas consumption for homes broken down by dwelling type.

We decided to focus on only a handful of building parameters from the 85 columns available in the EPC database. From our initial research on the available datasets , we chose the key attributes based on how readily available they maybe from other datasets, so that any models we build can eventually be applied to the entire building stock.

These characteristics are also very likely to correlate with energy consumption. These include:

  • Floor area
  • Number of rooms
  • Property category
  • Building age

The floor area and number of rooms are information we were able to pull directly from the EPC database.

For the building age, we mapped the properties based on key words from the walls description on the EPC certificates, with 1 being the oldest and 4 representing new dwellings.

Using the construction type from the EPC database to roughly estimate building age

To assist with the analysis, we created a new proxy called “Exposed Sides” by which to consolidate the property type and the built form of the of dwelling.

Mapping the property type and built form to a single proxy parameter called ‘Exposed sides’

We were then able to better represent different property categories, combining both its type (e.g. house, flat) and its form (e.g. detached, end-terrace). The default value for a dwelling is 6 to represent the sides of a cube (or a typical detatched house), and the value decreases based on the property’s type and form. For example, we deducted 2 sides for a flat as it typically indicates there are dwellings above and under the property.

Analysis – Floor area, rooms, exposed sides and building age

Now for the numbers. Unsurprisingly, the floor area correlates strongly with annual consumption of both gas and electricity consumption. The relationship is even more apparent than the one observed in the NEED dataset as the floor areas are not grouped in bands. 

Correlation between actual gas and electrical energy consumption against floor area for each dwelling in the EPC database

There was also a positive correlation observed between the number of rooms and energy consumption, likely because this is both an indicator of occupancy and floor area.

Correlation between actual gas and electrical energy consumption against the number of rooms for each dwelling in the EPC database

We can also see a reasonable correlation between consumption and the newly constructed proxy – ‘Exposed Sides’.

Correlation between actual gas and electrical energy consumption against the estimated exposed sides for each dwelling in the EPC database

A small inverse correlation is observed in the building age, in agreement with the NEED dataset. The weak correlation could be due to bad mapping of the building age from the wall type data we used from the EPCs. Alternatively, it might be the case that building age simply does not correlate strongly with consumption, for example if newer properties are operated at a higher temperature.

Correlation between actual gas and electrical energy consumption against the estimated age for each dwelling in the EPC database

Building our predictive model

Now the fun part.

After developing our database for dwellings based on the energy consumption, and identifying the key parameters which correlate to energy consumption, we built our predictive energy model and took it for a run out.

We built the predictive model for domestic gas and electricity consumption using linear regression from both datasets. The models both used floor area (bands), building age and property typologies as inputs, with the addition of rooms and exposed sides in the model based on postcode meter estimates and EPC buildings.

The charts below map the actual energy consumption for each of the dwellings we have against the ‘predicted’ energy we have estimated through our calibrated model.

Predictive domestic energy model compared to actual NEED data

Model from postcode level meter estimates and EPC buildings

Predictive domestic energy model compared to actual EPC data

The R² scores, which demonstrates how well our model fits the data is higher in the postcode level estimates. This is probably because we have exact floor areas in the data and an additional input variable (number of rooms) that correlates strongly with consumption.

We suspect that a score in the region of 50-60% is the limit for predicting consumption using building parameters. We also suspect that the score will greatly improve the model predicting electricity using NEED dataset if we limited the outliers to a similar range as the postcode meter estimates.


So how have we done? Early signs show that we are on the right track. We have taken existing energy consumption information to calibrate a statistical model which estimates the consumption of dwellings based on a series of inputs. We have some tweaking to do, but by mapping the predicted energy consumption figures against real in-use energy consumption the model shows a strong correlation…

This should in theory allow us to take any building in London, estimate the current consumption, and then show the reduction required in order to meet carbon reduction targets.

Let’s not forget that this is just the start.  We still have to tackle the non-domestic sector which will undoubtedly be an even bigger challenge.

Put the kettle on…

Leave a Reply

Your email address will not be published. Required fields are marked *