Explore the medical records of 299 heart failure patients with 13 features including clinical, body & lifestyle info. Analyze the dataset with Python libraries for EDA.
Cardiovascular diseases kill approximately 17 million people globally every year and they mainly exhibit as myocardial infarctions and heart failures. Heart failure (HF) occurs when the heart cannot pump enough blood to meet the needs of the body. In this project, we analyze a dataset containing the medical records of 299 heart failure patients collected at the Faisalabad Institute of Cardiology and at the Allied Hospital in Faisalabad (Punjab - Pakistan) during the months of April - December in 2015. It consists of 105 women and 194 men with their ages range between 40 and 95 years old. All 299 patients had left ventricular systolic dysfunction and had previous heart failures. This dataset contains 13 features, which reports clinical, body and lifestyle information of a patient namely Age, Anaemia, High Blood Pressure, Creatinine Phosphokinase (CPK), Diabetes, Ejection Fraction, Sex, Platelets, Serum Creatinine, Serum Sodium, Smoking Habit etc.
This Exploratory Data Analysis project is a part of "Data Analysis with Python: Zero to Pandas" course structured and provided by Jovian. In this project, we'll analyse the relationship between the different features of the heart failure patient included in this dataset namely the distribution of age among the patients, death rate, percentage of male and female patients, variation in the platelets amount, creatinine and sodium level in the blood. The graphical representation and visualisation of data using matplotlib and seaborn library in python helps us to easily understand a lot better about the dataset.
The dataset is obtained from Kaggle.
Please click here to know more about the dataset.
The dataset consist of column names (attributes) which doesn't provide complete information regarding the data recorded, so we have to refer to the another table / websites to see the complete information regarding the attributes (column names) including measurement units and normal level, if required.
Please click the below link to view the table containing information regarding column names.
There are several options for getting the dataset into Jupyter:
Download the CSV manually and upload it via Jupyter's GUI
Use the urlretrieve function from the urllib.request to download CSV files from a raw URL
Use a helper library, e.g., opendatasets, which contains a collection of curated datasets and provides a helper function for direct download.
Initially, I used the opendatasets helper library to download the files from Kaggle using my username and API key. Later, I uploaded the same dataset to my Github profile, to fetch the dataset directly with just few lines of code (using urllib.request.urlretrieve function) without any username or API key, just for my convenience.
Let's assign github raw url of the dataset which is already retrieved using opendatasets helper function to the variable named 'url'.
#assign the dataset (.csv) file url to a variable url = "https://raw.githubusercontent.com/lafirm/datasets/main/heart_failure_clinical_records_dataset.csv"