Used Cars Eda Organised Final
Exploratory Data Analysis of Used car listings on Craigslist
Craigslist is an American classified advertisements website with sections devoted to jobs, housing, for sale, items wanted, services, community service, gigs, résumés, and discussion forums.It is also the world's largest collection of used vehicles for sale.
- Data pre-processing
- Answer questions and gain insights on the dataset obtained
- Future Work
In this project, we will ask and answer interesting questions and create interactive visualisations to showcase our findings. We will try to uncover interesting trends and find out why they are so.
We will use libraries like pandas, plotly and folium amongst others.
About the data
The dataset contains one csv file named
vehicles.csv with a size of 1.45 GB. It contains relevant information that Craigslist provides on car sales including columns like price, condition, manufacturer, latitude/longitude, and 18 other categories.
The dataset contains a total of 26 columns, listed below are the 17 columns relevant to our analysis.
region: Region from where the listing is made.
price:Asking price for the vehicle in the listing.
year:Year of registration of the vehicle listed.
manufacturer: Make of the vehicle listed.
model: Model name of the vehicle listed.
condition: Condition of the vehicle listed.
cylinders:Engine size, based on the number of cylinders it has.
odometer: The number of miles on the odometer of the vehicle.
title_status: Contains the title status of the vehicle. Vehicle titles are certificates for legal ownership of a vehicle.
transmission:The type of transmission on the vehicle.
drive: Contains information about how the drive train delivers its power eg. AWD, FWD, RWD etc.
size: Which size category the vehicle falls in.
typeSeparates the vehicles on the basis of their type, eg. Hatchback, Pickup, Sedan etc.
lat: Latitude of from where the listing is made.
long: Longitude of from where the listing is made.
posting_date: Date of when the listing was made.
state: State code of where the listing is made.
In this section, the chosen dataset is downloaded from Kaggle. This is done using the
opendatasets library. On obtaining the dataset we read it using
pandas and study the thus obtained pandas dataframe. Some of the important columns, that we plan to use in our analysis, are selected whereas the others are dropped so that the execution times are reduced.
!pip install jovian --upgrade --quiet
# Execute this to save new versions of the notebook jovian.commit(project="used-cars-eda-organised")
[jovian] Detected Colab notebook... [jovian] Uploading colab notebook to Jovian... Committed successfully! https://jovian.ai/udaysidhu1/used-cars-eda-organised