Learn practical skills, build real-world projects, and advance your career

Hourly energy demand generation and weather

Project link : https://www.kaggle.com/nicholasjhana/energy-consumption-generation-prices-and-weather
This dataset contains 4 years of electrical consumption, generation, pricing, and weather data for Spain for 4 years.
There are 2 files

  1. energy_dataset : Time series generation of electricity in MW through various sources like Biomass, coal, solar etc
  2. weather_features : Time series data of cities in spain and their weather data like humidity, temperature, pressure etc

Objective: Predict Spain's total energy demand
Data type: Time series data
Data shape:
energy forecast : 35000 x 29 columns
weather features : 178k x 17 columns

Columns understanding (from kaggle )

Inpiration, from Kaggle:

Visualize the load and marginal supply curves.
What weather measurements, and cities influence most the electrical demand, prices, generation capacity?
Can we forecast 24 hours in advance better than the TSO?
Can we predict electrical price by time of day better than TSO?
Forecast intraday price or electrical demand hour-by-hour.
What is the next generation source to be activated on the load curve?


Official steps:

  1. Pick a large real-world dataset from Kaggle (see the "Recommended Datasets" section below) and download it using opendatasets. Your training set should contain at least 50,000 rows and 5 columns of data.

  2. Read the dataset description, understand the problem statement and describe the modeling objective clearly. You can also browse through existing notebooks created by others for inspiration.

  3. Perform exploratory data analysis, gather insights about the data, perform feature engineering, create a training-validation split, and prepare the data for modeling.

  4. Train & evaluate different machine learning models, tune hyperparameters and reduce overfitting to improve the model.

  5. Report the final performance of your best model(s), show sample predictions, and save model weights. Summarize your work, share links to references, and suggest ideas for future work.

  6. Publish your Jupyter notebook to Jovian, make a submission below and share your project with the community. Optionally, you may also write a blog post and contribute to the Jovian official blog.

Evaluation Criteria
Your submission must satisfy the following criteria:

  1. Training set should contain at least 50,000 rows of data and 5 columns
  2. Notebook must include all the steps listed in the project guidelines above
  3. Notebook must be executed end-to-end with error-free outputs for all cells
  4. You must train at least 2 different types of machine learning models
  5. You must tune at least 2 different hyperparameters for your chosen model
  6. Your model's performance on the validation set must be reasonably good
  7. Your project must be documented extensively using markdown cells
  8. Notebook must include references to relevant notebooks/tutorials/documentation sites
  9. Your notebook must not be plagiarized (i.e., directly copied) from another project
  1. Load dataset using opendatasets from Kaggle
  2. analyze files energy_dataset and weather_features and try to figure out a way to combine features. Find target and input columns.
  3. EDA on the data (find correlations, graphs, missing data, inf, NANs, means, outliers etc)
  4. Feature engineering (create new columns from Date, use uncertianity in a different way. Target column: Price_day or price_actual or total_load_forecast, total_load_actual)
  5. Imputation on numerical cols, No categorical columns so no one hot encoding required.
  6. Train-val-test split (70:10:20)
  7. Use Linear regression (fit ax1+bx2+cx3+... = y), random forest, Gradient boosting, Light GBM.
  8. Tune hyperparameters of random forest, GBM, light GBM.
  9. Find out accuracies and figure out the best model
  10. Sample predictions on some small data, features importances(weights)
  11. Summarize the findings