Learn practical skills, build real-world projects, and advance your career

Cox Proportional Hazards and Random Survival Forests

Welcome to the final assignment in Course 2! In this assignment you'll develop risk models using survival data and a combination of linear and non-linear techniques. We'll be using a dataset with survival data of patients with Primary Biliary Cirrhosis (pbc). PBC is a progressive disease of the liver caused by a buildup of bile within the liver (cholestasis) that results in damage to the small bile ducts that drain bile from the liver. Our goal will be to understand the effects of different factors on the survival times of the patients. Along the way you'll learn about the following topics:

  • Cox Proportional Hazards
    • Data Preprocessing for Cox Models.
  • Random Survival Forests
    • Permutation Methods for Interpretation.

1. Import Packages

We'll first import all the packages that we need for this assignment.

  • sklearn is one of the most popular machine learning libraries.
  • numpy is the fundamental package for scientific computing in python.
  • pandas is what we'll use to manipulate our data.
  • matplotlib is a plotting library.
  • lifelines is an open-source survival analysis library.
import sklearn
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt

from lifelines import CoxPHFitter
from lifelines.utils import concordance_index as cindex
from sklearn.model_selection import train_test_split

from util import load_data

2. Load the Dataset

Run the next cell to load the data.