machinehack-ode-to-code-tredence

Use the "Run" button to execute the code.

!pip install jovian --upgrade --quiet
|████████████████████████████████| 68 kB 4.4 MB/s eta 0:00:01 Building wheel for uuid (setup.py) ... done
import jovian
# Execute this to save new versions of the notebook
jovian.commit(project="machinehack-ode-to-code-tredence", \
              git_message = "first commit from jovian",
              git_commit= True)
# it can work only when executing the notebook on the binder.

Overview

Tredence is a data science and AI engineering company focused on solving the last mile problem in analytics. The ‘last mile’ is defined as the gap between insight creation and value realization. Tredence is 1,500-plus employees strong with offices in Foster City, Chicago, London, Toronto and Bangalore, with the largest companies in retail, CPG, hi-tech, telecom, travel and industrials as clients.

Problem Statement:

The year is 2050 and a team of astronauts from all over the world went on a mission to an Exoplanet and discovered a vast amount of life and awesome weather. The scientists began collecting data samples of fruits found in their landing site and were curious by their shape and size. They collected data for more than a solar year of the planet to understand the fruit growing conditions in different weathers.

To analyze data and grow fruits similar to earth, they began transmitting data back to the Earth, however, due to solar radiation, some data got corrupted and got lost in transmission. Back on Earth, the scientists figured they need to identify the type of climate the exoplanet has based on the properties of the fruit with the existing challenge of missing data. Help the scientists identify the earth-like season in which the fruit must have grown using the data collected.

Evaluation

What is the Metric In this competition? How is the Leaderboard Calculated?

The submission will be evaluated using the accuracy metric. One can use sklearn.metrics.accuracy to get a valid score.
This hackathon supports private and public leaderboards
The public leaderboard is evaluated on 30% of Test data
The private leaderboard will be made available at the end of the hackathon which will be evaluated on 100% Test data

How to Generate a valid Submission File

Sklearn models support the predict() method to generate the predicted values.

You should submit a .csv file with exactly 18,321 rows x 1 columns[season]. Your submission will return an Invalid Score if you have extra columns or rows.
Note: Do not shuffle the sequence of the test series

Using Pandas:

submission_df.to_csv('my_submission_file.csv', index=False)

About Data

Columns: [‘edible-poisonous’, 'cap-diameter', 'cap-shape', 'cap-color', 'does-bruise-or-bleed', 'gill-attachment', 'gill-color', 'stem-height', 'stem-width', 'stem-color', 'has-ring', 'ring-type', 'habitat', 'season']

Train: 42,748 rows x 14 columns

Test: 18,321 rows x 14 columns

Data Dictionary:
Independent Variables

edible-poisonous: edible=e, poisonous=p
cap-diameter: float number in cm
cap-shape: bell=b, conical=c, convex=x, flat=f, sunken=s, spherical=p, others=o
cap-color: brown=n, buff=b, gray=g, green=r, pink=p, purple=u, red=e, white=w, yellow=y, blue=l, orange=o, black=k
does-bruise-bleed: bruises-or-bleeding=t,no=f
gill-attachment: adnate=a, adnexed=x, decurrent=d, free=e, sinuate=s, pores=p, none=f
gill-color: see cap-color + none=f
stem-height: float number in cm
stem-width: float number in mm
stem-color: see cap-color + none=f
has-ring: ring=t, none=f
ring-type: cobwebby=c, evanescent=e, flaring=r, grooved=g, large=l, pendant=p, sheathing=s, zone=z, scaly=y, movable=m, none=f
habitat: grasses=g, leaves=l, meadows=m, paths=p, heaths=h, urban=u, waste=w, woods=d

Dependent variable

season: spring=s, summer=u, autumn=a, winter=w