Richter's Predictor: Modeling Earthquake Damage

Photo by Carl Campbell on Unsplash

This is a modeling competition hosted by drivendata. In this competition the goal is to predict the level of damage to the buildings caused by the 2015 Earthquake in Nepal.

The data was collected through surveys by Kathmandu Living Labs and the Central Bureau of Statistics, which works under the National Planning Commission Secretariat of Nepal. This survey is one of the largest post-disaster datasets ever collected, containing valuable information on earthquake impacts, household conditions, and socio-economic-demographic statistics.

This is a classification problem for which we will be using classical ML techniques to predict from the classes for the given test dataset.

Steps to follow:

Import the necessary libraries
Download the dataset
get all the datasets to the dataframe

Get the basic info about the columns
Get the statistical description about the variables
Get the correlation matrix to view the relationship between the explainatory variables and explained variable and among explainatory variables themselves.

Data preprocessing

get rid of missing values
encode the categorical variables
remove useless variables

Select the evaluation score - as needed by the copetition problem statement
Split data into training and validation set
Run different classification models to see which could work best

Get the best model and train on the whole training set
Get the predictions from the test set and replace the values in the submission file
Make the first submission and view the scores

Tune hyperparameters on the best models to further improve the accuracies
Do the feature selectiong and feature enggineering (PCA, LDA, etc.)
train the best model with best hyperparameters on the whole training set
Make the submission again to view growth

Machine leanring Theory

Machine Learning model is system that has been trained from features to recognize the pattern and give out a label as an output. In the training set the model tend to learn a general theme around the data and based on the kind of model choosen, alligns the weights to several features in a way to predict the target variable.