Learn practical skills, build real-world projects, and advance your career

Predicting a students score based on their study hours

Study Hours

Author: Lafir


In this project, we will train a machine learning model to predict the percentage of marks scored by a student based on their study hours.

This is a simple linear regression problem since it involves only two variables, i.e. Hours and Scores. Hours column represents number of study hours, and Scores column represents percentage of marks scored by the student.

In this notebook, we will use linear regression class from scikit-learn linear model library for training our model. We will also use libraries like Pandas, Numpy, Matplotlib, and Seaborn to perform exploratory data analysis and gather insights for machine learning. Here is a list of the activities that our project involves:

  1. Download the Dataset
  • Install and import required libraries

  • Download data from Github

  • Load dataset with Pandas

  1. Explore the Dataset
  • Basic info about dataset

  • Exploratory data analysis & visualization

  1. Prepare Dataset for Training
  • Split into training and validation sets
  • Extract inputs and outputs (targets)
  1. Model Training

  2. Model Validation

  3. Model Application

  4. Inferences and Conclusion

  5. References

1. Download the Dataset

Install and import required libraries

#install all required libraries 
!pip install jovian pandas matplotlib seaborn scikit-learn --upgrade --quiet