Learn practical skills, build real-world projects, and advance your career
from helpers import *
import_all()
from xgboost import XGBRegressor
%matplotlib inline
import seaborn as sns
from sklearn.metrics import mean_squared_error
from sklearn.model_selection import TimeSeriesSplit


Sections:TopThe DataFeature EngineeringInvestigating CorrelationLag FeaturesSplittingThe ModelResults with Traditional SplitUsing Cross-ValidationMaking Future Predictions


The Data

  • This data is an excerpt from a Kaggle maintained and regularly update dataset collection
  • The dataset reflects the energy consumption as reported by the National Grid ESO, Great Britain's electricity system operator
  • Consumption is recorded twice an hour
  • The data covers January 1, 2009 to December 31, 2022

Importing Data

data = pd.read_csv('uk_power_consumption.csv', parse_dates = ['settlement_date'])
data = data[['settlement_date', 'tsd', 'is_holiday']]
data.columns = ['datetime', 'consumption', 'holiday']
data = data.set_index('datetime', drop=True)


head_tail_horz(data, 5, "UK Power Consumption Data", intraday = True)