AI for Medicine Course 1 Week 1 lecture exercises
Data Exploration
In the first assignment of this course, you will work with chest x-ray images taken from the public ChestX-ray8 dataset. In this notebook, you'll get a chance to explore this dataset and familiarize yourself with some of the techniques you'll use in the first graded assignment.
The first step before jumping into writing code for any machine learning project is to explore your data. A standard Python package for analyzing and manipulating data is pandas.
With the next two code cells, you'll import pandas
and a package called numpy
for numerical manipulation, then use pandas
to read a csv file into a dataframe and print out the first few rows of data.
# Import necessary packages
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
%matplotlib inline
import os
import seaborn as sns
sns.set()
# Read csv file containing training datadata
train_df = pd.read_csv("nih/train-small.csv")
# Print first 5 rows
print(f'There are {train_df.shape[0]} rows and {train_df.shape[1]} columns in this data frame')
train_df.head()
There are 1000 rows and 16 columns in this data frame
Have a look at the various columns in this csv file. The file contains the names of chest x-ray images ("Image" column) and the columns filled with ones and zeros identify which diagnoses were given based on each x-ray image.