Earthquake Damage Prediction Ml Model
This is a modeling competition hosted by drivendata. In this competition the goal is to predict the level of damage to the buildings caused by the 2015 Earthquake in Nepal.
The data was collected through surveys by Kathmandu Living Labs and the Central Bureau of Statistics, which works under the National Planning Commission Secretariat of Nepal. This survey is one of the largest post-disaster datasets ever collected, containing valuable information on earthquake impacts, household conditions, and socio-economic-demographic statistics.
This is a classification problem for which we will be using classical ML techniques to predict from the classes for the given test dataset.
Steps to follow:
- Import the necessary libraries
- Download the dataset
- get all the datasets to the dataframe
- Get the basic info about the columns
- Get the statistical description about the variables
- Get the correlation matrix to view the relationship between the explainatory variables and explained variable and among explainatory variables themselves.
- Data preprocessing
- get rid of missing values
- encode the categorical variables
- remove useless variables
- Select the evaluation score - as needed by the copetition problem statement
- Split data into training and validation set
- Run different classification models to see which could work best
- Get the best model and train on the whole training set
- Get the predictions from the test set and replace the values in the submission file
- Make the first submission and view the scores
- Tune hyperparameters on the best models to further improve the accuracies
- Do the feature selectiong and feature enggineering (PCA, LDA, etc.)
- train the best model with best hyperparameters on the whole training set
- Make the submission again to view growth
Machine leanring Theory
Machine Learning model is system that has been trained from features to recognize the pattern and give out a label as an output. In the training set the model tend to learn a general theme around the data and based on the kind of model choosen, alligns the weights to several features in a way to predict the target variable.