Learn practical skills, build real-world projects, and advance your career

Build and Evaluate a Linear Risk model

Welcome to the first assignment in Course 2!

Overview of the Assignment

In this assignment, you'll build a risk score model for retinopathy in diabetes patients using logistic regression.

As we develop the model, we will learn about the following topics:

  • Data preprocessing
    • Log transformations
    • Standardization
  • Basic Risk Models
    • Logistic Regression
    • C-index
    • Interactions Terms

Diabetic Retinopathy

Retinopathy is an eye condition that causes changes to the blood vessels in the part of the eye called the retina.
This often leads to vision changes or blindness.
Diabetic patients are known to be at high risk for retinopathy.

Logistic Regression

Logistic regression is an appropriate analysis to use for predicting the probability of a binary outcome. In our case, this would be the probability of having or not having diabetic retinopathy.
Logistic Regression is one of the most commonly used algorithms for binary classification. It is used to find the best fitting model to describe the relationship between a set of features (also referred to as input, independent, predictor, or explanatory variables) and a binary outcome label (also referred to as an output, dependent, or response variable). Logistic regression has the property that the output prediction is always in the range [0,1][0,1]. Sometimes this output is used to represent a probability from 0%-100%, but for straight binary classification, the output is converted to either 00 or 11 depending on whether it is below or above a certain threshold, usually 0.50.5.

It may be confusing that the term regression appears in the name even though logistic regression is actually a classification algorithm, but that's just a name it was given for historical reasons.

1. Import Packages

We'll first import all the packages that we need for this assignment.

  • numpy is the fundamental package for scientific computing in python.
  • pandas is what we'll use to manipulate our data.
  • matplotlib is a plotting library.
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt