Project Timeseries Xgboost - Notebook by Evan Marie Carr (evanmarie)

Learn practical skills, build real-world projects, and advance your career

Created a year ago

Sections: ● Top ● The Data ● Feature Engineering ● Investigating Correlation ● Lag Features ● Splitting ● The Model ● Results with Traditional Split ● Using Cross-Validation ● Making Future Predictions

from helpers import *
import_all()
from xgboost import XGBRegressor
%matplotlib inline
import seaborn as sns
from sklearn.metrics import mean_squared_error
from sklearn.model_selection import TimeSeriesSplit

The Data

This data is an excerpt from a Kaggle maintained and regularly update dataset collection
The dataset reflects the energy consumption as reported by the National Grid ESO, Great Britain's electricity system operator
Consumption is recorded twice an hour
The data covers January 1, 2009 to December 31, 2022

Importing Data

data = pd.read_csv('uk_power_consumption.csv', parse_dates = ['settlement_date'])
data = data[['settlement_date', 'tsd', 'is_holiday']]
data.columns = ['datetime', 'consumption', 'holiday']
data = data.set_index('datetime', drop=True)


head_tail_horz(data, 5, "UK Power Consumption Data", intraday = True)