Zillow Prize Prediction
Zillow Prize: Zillow’s Home Value Prediction (Zestimate)
Zillow’s Zestimate home valuation has shaken up the U.S. real estate industry since first released 11 years ago.
A home is often the largest and most expensive purchase a person makes in his or her lifetime. Ensuring homeowners have a trusted way to monitor this asset is incredibly important. The Zestimate was created to give consumers as much information as possible about homes and the housing market, marking the first time consumers had access to this type of home value information at no cost.
“Zestimates” are estimated home values based on 7.5 million statistical and machine learning models that analyze hundreds of data points on each property. And, by continually improving the median margin of error (from 14% at the onset to 5% today), Zillow has since become established as one of the largest, most trusted marketplaces for real estate information in the U.S. and a leading example of impactful machine learning.
Zillow is asking you to predict the log-error between their Zestimate and the actual sale price, given all the features of a home.
overview
Downloading the data
- Loading it into a dataframe
- Describing dataset
Proccessing and Feature Engineering
- Duplicate values
- Merging of property and training dataset
- Missing Valuse above 35% droped
- Date column
Exploratory Data Analysis and Visualisation
- Date
- Parcel Location
- Target Variable
- correlation Heatmap
Identifying Input and Target Columns
Categorical and Numeric Columns
- Encoded categorical columns
- Imputing missing numerical columns
- Scale numeric values
Splitting the data for training
- Training data(X_train)
- Validation data(X_val)
Training and Tuning Different Model
- Random Forest Regression
- XGBRegressor
- Gradient Boosting Regression
Training Final Model
- Gradient Boost Regression
Saving The Model
- Using joblib
Test Ptrdiction
- conclution
Conclusion
- Summary
- Downside
- Limitations
Import libraries to be used
import os
import opendatasets as od
import pandas as pd
pd.set_option("display.max_columns", 120)
pd.set_option("display.max_rows", 120)