Learn practical skills, build real-world projects, and advance your career
Updated 4 years ago
Risk Models Using Tree-based Models
Welcome to the second assignment of Course 2!
In this assignment, you'll gain experience with tree based models by predicting the 10-year risk of death of individuals from the NHANES I epidemiology dataset (for a detailed description of this dataset you can check the CDC Website). This is a challenging task and a great test bed for the machine learning methods we learned this week.
As you go through the assignment, you'll learn about:
- Dealing with Missing Data
- Complete Case Analysis.
- Imputation
- Decision Trees
- Evaluation.
- Regularization.
- Random Forests
- Hyperparameter Tuning.
1. Import Packages
We'll first import all the common packages that we need for this assignment.
shap
is a library that explains predictions made by machine learning models.sklearn
is one of the most popular machine learning libraries.itertools
allows us to conveniently manipulate iterable objects such as lists.pydotplus
is used together withIPython.display.Image
to visualize graph structures such as decision trees.numpy
is a fundamental package for scientific computing in Python.pandas
is what we'll use to manipulate our data.seaborn
is a plotting library which has some convenient functions for visualizing missing data.matplotlib
is a plotting library.
import shap
import sklearn
import itertools
import pydotplus
import numpy as np
import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt
from IPython.display import Image
from sklearn.tree import export_graphviz
from sklearn.externals.six import StringIO
from sklearn.tree import DecisionTreeClassifier
from sklearn.ensemble import RandomForestClassifier
from sklearn.model_selection import train_test_split
from sklearn.experimental import enable_iterative_imputer
from sklearn.impute import IterativeImputer, SimpleImputer
# We'll also import some helper functions that will be useful later on.
from util import load_data, cindex