Updated 4 years ago
Movies Example - Using a StratifiedSample
Author: Andreas Traut
In this example I used a dataset which contains the following columns:
Rank | Title | Year | Score | Metascore | Genre | Vote | Director | Runtime | Revenue | Description | RevCat
My aim was to predict the Revenue based on the other information. As there are some "NaN"-values in the column "Revenue" I did the following steps:
- I filled the "NaN"-value in column "Revenue" with Median-values,
- then I drew a stratified sample (based on "Revenue") on the whole dataset,
- created a pipeline to fill the "NaN"-value in other columns (e.g. "Metascore", "Score").
- split the dataset into a training dataset ("movies_training") and a testing dataset ("movies_test"):