Machine Learning Algorithms for Fine-Scale Prediction of US Building Energy Use

Hengfang Deng

Northeastern University

Hengfang (Alex) Deng is a PhD candidate from Northeastern University working on "Machine Learning Algorithms for Fine-Scale Prediction of US Office Building Energy Use"

Abstract

IntroductionWith the rapid development of data-driven computational algorithms and the growing trove of publicly available building attribute and energy data, there are now ample opportunities to apply machine learning... [ view full abstract ]

Introduction

With the rapid development of data-driven computational algorithms and the growing trove of publicly available building attribute and energy data, there are now ample opportunities to apply machine learning techniques in assessment of building energy performance. This research compares machine-learning methods such as Artificial Neural Network and Random Forest in terms of their prediction performance for lighting, heating, and cooling energy use. In addition, this research also aspires to extract the most dominant variables and quantifying their potential in building energy management. In combination with the plentitude of data reported and the advancing sensing technology, the accuracy and robustness of the energy model could be enhanced greatly.

Data

The microdata used in this study is the 2012 Commercial Buildings Energy Consumption Survey (CBECS) that comes from U.S. Energy Information Administration (EIA), it contains records for 6,720 commercial buildings in the United States, designed to be a statistically representative sample. The input variables include building physical characteristics (such as location, area, construction material), operational characteristics and occupancy patterns and the output variables are the energy use intensities (EUI) of electricity and fuels for different building end-uses (such as lighting).

Methods

CBECS data are first pre-processed to remove the missing or extreme values and eliminate correlation among predictors. A k-fold cross-validation method is used to compare the predictive performance of different algorithms. The dataset is randomly divided into k sets with equal size. A statistical model is obtained by using k-1 sets as the training set, the last one set is then used as the testing sets and this process repeats until all of the small sets have been used as the testing set. To present a thorough comparison, both linear and Lasso regression model (least absolute shrinkage and selection operator) are constructed followed by the multilayer perceptron method (Artificial Neural Networks) and tree bagging method (Random Forest). The study also aims at performing feature selection among all the input variables based on both correlation and information gain, as well as by training a model on different subsets of features that minimizes the prediction error.

Findings

The main objective of the study is to compare the predictive performances of multiple statistical machine learning methods tested by building energy survey data, and therefore to propose which model should be adopted when constructing an energy estimation model. The best Random Forest analysis has a prediction rate of 60%, exceeding adjusted R² results for multiple regression models run on the same set of explanatory variables. Another finding is the importance of different input variables, both individual and clustered feature importance would be quantified and compared to shed some lights on the relationships between physical, operational parameters and the energy use intensity of commercial buildings. Further work will apply these algorithms to opportunistic lighting, temperature, and occupancy sensors in the City of Boston, USA, as well as to energy data reported to the City for all medium and large commercial buildings under the Building Energy Disclosure and Reporting Ordinance (BERDO).

Authors

Hengfang Deng (Northeastern University)
Matthew Eckelman (Northeastern University)

Topic Areas

• Open source data, big data, data mining and industrial ecology , • Infrastructure systems, the built environment, and smart and connected infrastructure

Session

MS-9 » Urban metabolism and infrastructure systems (11:45 - Monday, 26th June, Room F)

Presentation Files

The presenter has not uploaded any presentation files.

Email Support • Blog • Privacy Policy • Cancellation Policy