Movies Example - Using a StratifiedSample

Author: Andreas Traut

In this example I used a dataset which contains the following columns:

My aim was to predict the Revenue based on the other information. As there are some "NaN"-values in the column "Revenue" I did the following steps:

I filled the "NaN"-value in column "Revenue" with Median-values,
then I drew a stratified sample (based on "Revenue") on the whole dataset,
created a pipeline to fill the "NaN"-value in other columns (e.g. "Metascore", "Score").
split the dataset into a training dataset ("movies_training") and a testing dataset ("movies_test"):