Build and Evaluate a Linear Risk model
Welcome to the first assignment in Course 2!
Overview of the Assignment
In this assignment, you'll build a risk score model for retinopathy in diabetes patients using logistic regression.
As we develop the model, we will learn about the following topics:
- Data preprocessing
- Log transformations
- Standardization
- Basic Risk Models
- Logistic Regression
- C-index
- Interactions Terms
Diabetic Retinopathy
Retinopathy is an eye condition that causes changes to the blood vessels in the part of the eye called the retina.
This often leads to vision changes or blindness.
Diabetic patients are known to be at high risk for retinopathy.
Logistic Regression
Logistic regression is an appropriate analysis to use for predicting the probability of a binary outcome. In our case, this would be the probability of having or not having diabetic retinopathy.
Logistic Regression is one of the most commonly used algorithms for binary classification. It is used to find the best fitting model to describe the relationship between a set of features (also referred to as input, independent, predictor, or explanatory variables) and a binary outcome label (also referred to as an output, dependent, or response variable). Logistic regression has the property that the output prediction is always in the range . Sometimes this output is used to represent a probability from 0%-100%, but for straight binary classification, the output is converted to either or depending on whether it is below or above a certain threshold, usually .
It may be confusing that the term regression appears in the name even though logistic regression is actually a classification algorithm, but that's just a name it was given for historical reasons.
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt