Ml Project Predict If Car Purchased At Auction Is Lemon
Don't Get Kicked!
Predicting if the car puchased at an auction is good or bad buy.
Introduction
One of the biggest challenges of an auto dealership purchasing a used car at an auto auction is the risk of that the vehicle might have serious issues that prevent it from being sold to customers. The auto community calls these unfortunate purchases "kicks".
Kicked cars often result when there are tampered odometers, mechanical issues the dealer is not able to address, issues with getting the vehicle title from the seller, or some other unforeseen problem. Kick cars can be very costly to dealers after transportation cost, throw-away repair work, and market losses in reselling the vehicle.
In this project figure out which cars have a higher risk of being a kick, which can provide real value to dealerships trying to provide the best inventory selection possible to their customers.
We will take a look at real world data for 150,000 borrowers, and use machine learning techniques to build models which could be deployed in a bank to help the bank and a potential borrower make the best financial decision possible.
Data Description
All the variables in the data set are defined as follows:
- RefID: Unique (sequential) number assigned to vehicles
- IsBadBuy: Identifies if the kicked vehicle was an avoidable purchase
- PurchDate: The Date the vehicle was Purchased at Auction
- Auction: Auction provider at which the vehicle was purchased
- VehYear: The manufacturer's year of the vehicle
- VehicleAge: The Years elapsed since the manufacturer's year
- Make: Vehicle Manufacturer
- Model: Vehicle Model
- Trim: Vehicle Trim Level
- SubModel: Vehicle Submodel
- Color: Vehicle Color
- Transmission: Vehicles transmission type (Automatic, Manual)
- WheelTypeID: The type id of the vehicle wheel
- WheelType: The vehicle wheel type description (Alloy, Covers)
- VehOdo: The vehicles odometer reading
- Nationality: The Manufacturer's country
- Size: The size category of the vehicle (Compact, SUV, etc.)
- TopThreeAmericanName: Identifies if the manufacturer is one of the top three - American manufacturers
- MMRAcquisitionAuctionAveragePrice: Acquisition price for this vehicle in - average condition at time of purchase
- MMRAcquisitionAuctionCleanPrice: Acquisition price for this vehicle in the above Average condition at time of purchase
- MMRAcquisitionRetailAveragePrice: Acquisition price for this vehicle in the retail market in average condition at time of purchase
- MMRAcquisitonRetailCleanPrice: Acquisition price for this vehicle in the retail market in above average condition at time of purchase
- MMRCurrentAuctionAveragePrice: Acquisition price for this vehicle in average condition as of current day
- MMRCurrentAuctionCleanPrice: Acquisition price for this vehicle in the above condition as of current day
- MMRCurrentRetailAveragePrice: Acquisition price for this vehicle in the retail market in average condition as of current day
- MMRCurrentRetailCleanPrice: Acquisition price for this vehicle in the retail market in above average condition as of current day
- PRIMEUNIT: Identifies if the vehicle would have a higher demand than a standard purchase
- AcquisitionType: Identifies how the vehicle was aquired (Auction buy, trade in, etc)
- AUCGUART: The level guarantee provided by auction for the vehicle (Green light - Guaranteed/arbitratable, Yellow Light - caution/issue, red light - sold as is)
- KickDate: Date the vehicle was kicked back to the auction
- BYRNO: Unique number assigned to the buyer that purchased the vehicle
- VNZIP: Zipcode where the car was purchased
- VNST: State where the the car was purchased
- VehBCost: Acquisition cost paid for the vehicle at time of purchase
- IsOnlineSale: Identifies if the vehicle was originally purchased online
- WarrantyCost: Warranty price (term=36month and millage=36K)
There are 32 independent variables and the dependent variable which we need to predict is IsBadBuy
Project Outline
These are the steps that we will follow:
- Installing and importing all the libraries
- Downloading the data
- Exploratory Data Analysis
- Feature Engineering
- Data Preprocessing
- Identifying Numeric and Categorical columns
- Imputing Missing Values
- Scaling Numerical Data
- Encoding Categrical Columns
- Training, Validation and Test sets
- Training dumb models
- Traning a Logistic Regression model
- Traning other different models
- Training the best model by hypertuning the parameters and making predictions
- Summary