Jovian
Sign In

Zillow Prize Prediction

Zillow Prize: Zillow’s Home Value Prediction (Zestimate)

Zillow’s Zestimate home valuation has shaken up the U.S. real estate industry since first released 11 years ago.

A home is often the largest and most expensive purchase a person makes in his or her lifetime. Ensuring homeowners have a trusted way to monitor this asset is incredibly important. The Zestimate was created to give consumers as much information as possible about homes and the housing market, marking the first time consumers had access to this type of home value information at no cost.

“Zestimates” are estimated home values based on 7.5 million statistical and machine learning models that analyze hundreds of data points on each property. And, by continually improving the median margin of error (from 14% at the onset to 5% today), Zillow has since become established as one of the largest, most trusted marketplaces for real estate information in the U.S. and a leading example of impactful machine learning.

Zillow is asking you to predict the log-error between their Zestimate and the actual sale price, given all the features of a home.

overview

Downloading the data

  • Loading it into a dataframe
  • Describing dataset

Proccessing and Feature Engineering

  • Duplicate values
  • Merging of property and training dataset
  • Missing Valuse above 35% droped
  • Date column

Exploratory Data Analysis and Visualisation

  • Date
  • Parcel Location
  • Target Variable
  • correlation Heatmap

Identifying Input and Target Columns

Categorical and Numeric Columns

  • Encoded categorical columns
  • Imputing missing numerical columns
  • Scale numeric values

Splitting the data for training

  • Training data(X_train)
  • Validation data(X_val)

Training and Tuning Different Model

  • Random Forest Regression
  • XGBRegressor
  • Gradient Boosting Regression

Training Final Model

  • Gradient Boost Regression

Saving The Model

  • Using joblib

Test Ptrdiction

  • conclution

Conclusion

  • Summary
  • Downside
  • Limitations

Import libraries to be used

import os
import opendatasets as od
import pandas as pd
pd.set_option("display.max_columns", 120)
pd.set_option("display.max_rows", 120)