Ashrae Energy Prediction
ASHRAE  Great Energy Predictor III: How much energy will a building consume?
Assessing the value of energy efficiency improvements can be challenging as there's no way to truly know how much energy a building would have used without the improvements. The best we can do is to build counterfactual models. Once a building is overhauled the new (lower) energy consumption is compared against modeled values for the original building to calculate the savings from the retrofit. More accurate models could support better market incentives and enable lower cost financing.
This Kaggle competition challenges you to build these counterfactual models across four energy types based on historic usage rates and observed weather. The dataset includes three years of hourly meter readings from over one thousand buildings at several different sites around the world.
The evaluation metric given by the competion is as follows:
In this tutorial, we'll follow a stepbystep process for building the ML model:
 Download the dataset
 Exploring the dataset
 Prepare the dataset for training
 Training a baseline model
 Train & Evaluate Different Models
This competition provides an interesting grabbag of subproblems to solve:
 The data is messy and needs cleaned (site 0);
 The data sources are inconsistent (meter timestamps vs. weather timestamps);
 It asks you to predict several different vaguely related sets of values (meter type);
 It has hugely important categorical features (building_id); and we can find lots of external data to supplement the provided data. This makes it hard to know where to start and what areas are most important to tackle.
Let's start with Machine leanring theory:
In this section, I will take you through different types and techiniques of ML models to get blueprint of models we are going to use.
Machine learning is a "Field of study that gives computers the ability to learn without being explicitly programmed”.
The process starts with feeding good quality data and then training our machines(computers) by building machine learning models using the data and different algorithms. The choice of algorithms depends on what type of data do we have and what kind of task we are trying to automate.
Supervised vs. Unsupervised
In supervised learning, the training set you feed to the algorithm includes the desired solutions, called labels, where as in unsupervised learning, the training data is unlabeled.
Let us look at the supervised ML models:
Regression vs. Classification
Problems where a continuous numeric value must be predicted for each input are known as regression problems while problems where each input must be assigned a discrete category (also called label or class) are known as classification problems.
Again we can divide supervised models as linear and tree based models.
Linear models
Linear models are the ones which work on the priciples of linear relationships between the input and output variables. The Linear models can be expressed in the form of linear equations.
Linear regression algorithms gives way to advanced models such as Lasso, Ridge and ElasticNet which introduces the penalty terms. The idea is that shrinking (or penalizing) the coefficients, the overall accuracy can be improved. In case of Lasso, the penalty term is absolute value of weights, Ridge takes into account the squared value of weights. ElasticNet on the other hand takes into account both absolute and squared value of regularization.
Logistic regression is a classifier. It is the linear algorithm of modelling the probability of descrete output given inputs. It is a statistical method for classification that is generalized to multiclass classification.
Tree models
Tree based models on the other hand works on the priciples of information gain/entropy. The nodes are divided into subnodes on the basis of how the gini index decreases (information gain) when we split the data at some point on the variable.
Tree Based algorithms:
Decision tree is basic for the tree based algorithms. Unlike linear models, they capture the nonlinear relations quite well. In this method the data is splitted into two or more homogeneous splits on the basis of the most significant splitters among the input variables. They can be applied for both categorical or continous variables.
Nodes
 the varable on which a decision is being made
Root node
 the first node to be split is called the root node.
Split
 it is a process of dividing the data into subnodes
Decision node
 the sub node which is divided into further sub nodes
Parent and child Node
 The upper node to split is called parent node and nodes which is has split into are called child nodes.
Branch
 Subsection of entire tree.
Leaf Node
 The last node on which the final decision is made.
Pruning
 Removing subnodes from the decision nodes
Ensembling Techniques
Ensemble learning helps improve machine learning results by combining several models. This approach allows the production of better predictive performance compared to a single model. Basic idea is to learn a set of classifiers (experts) and to allow them to vote.
The main causes of error in the ML models are variance and bias. Ensembling methods helps to minimize these errors.
Bias
The inability of a ML model to capture the true relation on the training dataset is called bias.
Variance
The difference in the fits in the training sets is called as variance.
There is a basic trade of between bias and variance. A model which can capture the true relation on the training dataset will have high variability in the sums of squares.
Bagging and Boosting are two types of Ensemble Learning. These two decrease the variance of a single estimate as they combine several estimates from different models. Boosting tries to reduce bias.
Bagging:
Bootstrap Aggregating, also known as bagging, is a machine learning ensemble metaalgorithm designed to improve the stability and accuracy of machine learning algorithms used in statistical classification and regression. It decreases the variance and helps to avoid overfitting. It is usually applied to decision tree methods. Bagging is a special case of the model averaging approach.
Implementation steps of Bagging:
 Multiple subsets are created from the original data set with equal tuples, selecting observations with replacement.
 A base model is created on each of these subsets.
 Each model is learned in parallel from each training set and independent of each other.
 The final predictions are determined by combining the predictions from all the models.
Random Forest is one of the bagging models widely used.
Boosting
Boosting is an ensemble modeling technique that attempts to build a strong classifier from the number of weak classifiers. It is done by building a model by using weak models in series. Firstly, a model is built from the training data. Then the second model is built which tries to correct the errors present in the first model. This procedure is continued and models are added until either the complete training data set is predicted correctly or the maximum number of models is added.
Algorithm:

Initialise the dataset and assign equal weight to each of the data point.

Provide this as input to the model and identify the wrongly classified data points.

Increase the weight of the wrongly classified data points and decrease the weights of correctly classified data points. And then normalize the weights of all data points.

if (got required results)
Goto step 5
else
Goto step 2 
End
!pip install jovian upgrade quiet
import jovian