Learn practical skills, build real-world projects, and advance your career

New York City Taxi Fare Prediction


In this project, we will be utilising Kaggle's New York City Taxi Fare Prediction data. The data is part of an archived Google Cloud competition. Our objective is to use machine learning techniques to forecast the fare for a taxi ride between specified pickup and dropoff locations.

This is a very large datasets and consists of three 'CSV' files

train.csv - contains the basic input feature and target fare_amount values for the training set (about 55M rows)

test.csv - Contain the same input features like train.csv file but without target fare_amount values, and it has only 10k rows.

sample_submission.csv - a sample submission file in the correct format (columns key and fare_amount). This file 'predicts' fare_amount to be $11.35 for all rows, which is the mean fare_amount from the training set.

Your goal is to predict fare_amount for each row using the 'test.csv' and then submit our results on kaggle to find Kaggle score.

Packages and Libraries

Let's get started by installing all of the necessary packages and libraries for this project