Learn practical skills, build real-world projects, and advance your career
Updated 4 years ago
Cox Proportional Hazards and Random Survival Forests
Welcome to the final assignment in Course 2! In this assignment you'll develop risk models using survival data and a combination of linear and non-linear techniques. We'll be using a dataset with survival data of patients with Primary Biliary Cirrhosis (pbc). PBC is a progressive disease of the liver caused by a buildup of bile within the liver (cholestasis) that results in damage to the small bile ducts that drain bile from the liver. Our goal will be to understand the effects of different factors on the survival times of the patients. Along the way you'll learn about the following topics:
- Cox Proportional Hazards
- Data Preprocessing for Cox Models.
- Random Survival Forests
- Permutation Methods for Interpretation.
1. Import Packages
We'll first import all the packages that we need for this assignment.
sklearn
is one of the most popular machine learning libraries.numpy
is the fundamental package for scientific computing in python.pandas
is what we'll use to manipulate our data.matplotlib
is a plotting library.lifelines
is an open-source survival analysis library.
import sklearn
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from lifelines import CoxPHFitter
from lifelines.utils import concordance_index as cindex
from sklearn.model_selection import train_test_split
from util import load_data