Sign In


Exploratory Data Analysis on BRFSS - 2013 Dataset


About the BRFSS Dataset

The Behavioral Risk Factor Surveillance System (BRFSS) is a collaborative project carried out to measure behavioral risk factors among non-institutionalized adults of ages 18 years and older residing in the US. The project is a collaboration between states and territories in the United States (US), as well as the Centers for Disease Control and Prevention (CDC). Among the objectives of the BRFSS project is to collect standard, state-specific data on preventive health behaviors and practices connected to chronic illnesses, accidents, and preventable infectious diseases, affecting the adult population.

The BRFSS evaluated several variables in 2013, including smoking, HIV/AIDS awareness, immunization, exercise, health status, access to healthcare, insufficient sleep, and knowledge of hypertension and cholesterol, among others. The data were collected via both landline telephone and cellular-telephone-based surveys whiles targeting adults residing in a private residence or college housing. All variable details can be found in the cookbook. [BRFSS-2013COOKBOOK] ( For this analysis, a few variables were selected from the record identification, health status of the population, demographics, tobacco use, alcohol consumption, and Exercise (physical activity) and analyzed.

What is Exploratory Data Analysis

Exploratory Data Analysis (EDA) is the process of exploring, investigating and gathering insights from data. The objective of EDA is to discover trends and patterns in any given data using graphical representations (such as scatter plots, bar charts, histograms, pie charts, etc.) and summary statistics including techniques for describing numeric variables such as count, mean, median, standard deviation, among others.

I downloaded the dataset from kaggle and performed thorough exploratory analysis on it by asking numerous relevant research questions. I used python libraries including pandas, matplotlib, seaborn, and plotly for the analysis.

The steps taken in the analysis include:

  1. Downloading the dataset from an Kaggle
  2. Data preparation and cleaning with Pandas
  3. Open-ended exploratory analysis and visualization
  4. Asking and answering interesting questions
  5. Summarizing inferences and drawing conclusions

Installing required packages and loading their libraries

In this analysis, I used Python's libraries such as Numpy, Pandas and data visualization tools such as matplotlib, seaborn, and plotly. The two cells below is the process of downloading the packages and loading the required libraries.

#Downloading the required packages-------------------------------------------------------------------------------------- 
!pip install numpy opendatasets matplotlib seaborn plotly folium --use-deprecated=legacy-resolver --user --upgrade --quiet
from warnings import filterwarnings
#Loading the necessary libraries-----------------------------------------------------------------------------------------
#numerical computing library
import numpy as np
#Data analysis library in python
import pandas as pd
import opendatasets as od
#Creating static visualizations in Python
import matplotlib 
import matplotlib.pyplot as plt
from matplotlib import rcParams
#High-level interface for drawing informative statistical graphics.
import seaborn as sns
#Creating interactive visualizations
import as px
#Turns on “inline plotting”, where plot graphics will appear in your notebook
%matplotlib inline
import os
Samuel Adjei6 months ago