Learn practical skills, build real-world projects, and advance your career

Open In Colab

PREDICTION OF RAIN TOMORROW IN AUSTRALIA

The dataset of my project for Machine Learning with Python: Zero to GBMs is Rain Tomorrow Prediction in Australia which I obtained from the Kaggle - Rain in Australia.


Context:
Predict the next-day rain by training classification models on the target variable RainTomorrow

Content
This dataset contains about 10 years of daily weather observations from many locations across Australia.

RainTomorrow is the target variable to predict. It means -- did it rain the next day, Yes or No? This column is Yes if the rain for that day was 1mm or more.

Input: We have 23 features except target.

  • Date : The date of observation
  • Location : The common name of the location of the weather station
  • MinTemp: The minimum temperature in degrees celsius
  • MaxTemp: The maximum temperature in degrees celsius
  • Rainfall: The amount of rainfall recorded for the day in mm
  • Evaporation: The so-called Class A pan evaporation (mm) in the 24 hours to 9am
  • Sunshine: The number of hours of bright sunshine in the day.
  • WindGustDir: The direction of the strongest wind gust in the 24 hours to midnight
  • WindGustSpeed: The speed (km/h) of the strongest wind gust in the 24 hours to midnight
  • WindDir9am: Direction of the wind at 9am
  • WindSpeed3pm: Wind speed (km/hr) averaged over 10 minutes prior to 3pm
  • Humidity9am: Humidity (percent) at 9am
  • Humidity3pm: Humidity (percent) at 3pm
  • Pressure9am: Atmospheric pressure (hpa) reduced to mean sea level at 9am
  • Pressure3pm: Atmospheric pressure (hpa) reduced to mean sea level at 3pm
  • Cloud9am: Fraction of sky obscured by cloud at 9am. This is measured in "oktas", which are a unit of eigths. It records how many eigths of the sky are obscured by cloud. A 0 measure indicates completely clear sky whilst an 8 indicates that it is completely overcast.
  • Cloud3pm: Fraction of sky obscured by cloud (in "oktas": eighths) at 3pm. See Cload9am for a description of the values
  • Temp9am: Temperature (degrees C) at 9am
  • Temp3pm: Temperature (degrees C) at 3pm
  • RainToday: Boolean: 1 if precipitation (mm) in the 24 hours to 9am exceeds 1mm, otherwise 0

Target Class: RainTomorrow

  • The amount of next day rain in mm. Used to create response variable RainTomorrow. A kind of measure of the "risk".
  • Yes(1) - No(0)
    • 1 - tomorrow is rainy
    • 0 - tomorrow isn't rainy

How many datasets we have?:

  • Number of Instances : 142.193
  • Target Class (Rain Tomorrow): 110316 No - 31877 Yes

Data Source:

import warnings
warnings.filterwarnings("ignore")

DATA DROPPING

import pandas as pd
import numpy as np
import seaborn as sns
data = pd.read_csv("weatherAUS.csv")

def SplittingDate(df):
    df['Year'] = pd.DatetimeIndex(df['Date']).year
    df['Month'] = pd.DatetimeIndex(df['Date']).month
    df['Day'] = pd.DatetimeIndex(df['Date']).day
SplittingDate(data)
data=data.drop(columns=['Date','RISK_MM'])
data['RainTomorrow'] = data['RainTomorrow'].str.lower().replace({"yes":1,"no":0})
data['RainToday'] = data['RainToday'].str.lower().replace({"yes":1,"no":0})