14 days ago

Learn how to predict USA housing prices using different machine learning methods in this project. Explore regression models and ensemble models such as RandomForest, XGBoost, and LightGBM. Follow the step-by-step guide to download data, process and clean it, perform exploratory analysis and visualization, train different models, and use hyper parameter tuning to find the best model. Run the code using free online resources or on your computer locally.

# Machine Learning - Predict USA Housing Price

#### Introduction

Accurately estimating the value of real estate is an important problem for many stakeholders including house owners, house buyers, agents, creditors, and investors. It is also a difficult one. Though it is common knowledge that factors such as the size, number of rooms and location affect the price, there are many other things at play. Additionally, prices are sensitive to changes in market demand and the peculiarities of each situation, such as when a property needs to be urgently sold.

The sales price of a property can be predicted in various ways, but is often based on regression techniques. All regression techniques essentially involve one or more predictor variables as input and a single target variable as output.

In this project, we will use different machine learning methods performance in predicting the selling price of houses based on a number of features such as the area, the number of bed- and bathrooms and the geographical position etc.

#### What is Machine learning?

Machine Learning (ML) is the method of applying programmatic and statistical computer technology to analyse large datasets, and as a result, uncover new understandings. It is a focussed sub-section of Artificial Intelligence (AI), where a more extensive set of predictive modelling tools enable a computer program to learn by generalising from examples.

Generally speaking, machine learning projects follow the same process. Data ingestion, data cleaning, exploratory data analysis, feature engineering and finally machine learning. Just a reminder the process is not linear and you might find you have to jump back and forth between different stages.

Machine learning tasks are usually split into three categories; supervised, unsupervised and reinforcement. For this competition, our task is supervised learning.

Supervised learning uses examples and labels to find patterns in data

It’s easy to recognise the type of machine learning task in front of you from the data you have and your objective. We’ve been given housing data consisting of features and labels, and we’re tasked with predicting the price based on various features defined.

#### How to Run the Code

The best way to learn the material is to execute the code and experiment with it yourself. This tutorial is an executable Jupyter notebook. You can run this tutorial and experiment with the code examples in a couple of ways: using free online resources (recommended) or on your computer.

##### Option 1: Running using free online resources (1-click, recommended)

The easiest way to start executing the code is to click the Run button at the top of this page and select Run on Binder. You can also select "Run on Colab" or "Run on Kaggle", but you'll need to create an account on Google Colab or Kaggle to use these platforms.

##### Option 2: Running on your computer locally

To run the code on your computer locally, you'll need to set up Python, download the notebook and install the required libraries. We recommend using the Conda distribution of Python. Click the Run button at the top of this page, select the Run Locally option, and follow the instructions.