Sign In

Nyc Taxi Ride Time Prediction V3

[jovian] Detected Colab notebook... [jovian] Please enter your API key ( from ): API KEY:

Predicting the Duration of a Taxicab Ride in NYC


Taxicabs are a very important means of transportation throughout many cities and New York City in particular. This notebook anaylzes data gathered in 2016 and originally published by the NYC Taxi and Limousine Commission (TLC) available here and made into a competition posted on which lasted from 7/2017 to 9/2017. In this notebook I provide various models which come close to the highest scoring entries. The training dataset contained the following fields:

  • id - a unique identifier for each trip
  • vendor_id - a code indicating the provider associated with the trip record
  • pickup_datetime - date and time when the meter was started
  • dropoff_datetime - date and time when the meter was stopped
  • passenger_count - the number of passengers in the vehicle (driver entered value)
  • pickup_longitude - the longitude where the meter was started
  • pickup_latitude - the latitude where the meter was stopped
  • dropoff_longitude - the longitude where the meter was started
  • dropoff_latitude - the latitude where the meter was stopped
  • store_and_fwd_flag - This flag indicated whether the trip record was held in vehicle memory before sending to the vendor because the vehicle did not have a connection to the server: Y indicates a store and forward trip; Nindicates a trip which was not a store and forward trip
  • trip_duration - duration of the trip in seconds

The test data set provided by Kaggle contains the same fields except dropoff_datetime and trip_duration.

This notebook proceeds following the outline:

  1. Download the datasets from Kaggle
  2. Prepare and divide the dataset for training and validation
  3. Create and establish a baseline model
  4. Introduce new features
  5. Create linear models, decision trees, and random forests with various hyperparameters

Download the datasets

Ari Blinder6 months ago