Learn practical skills, build real-world projects, and advance your career
Updated a year ago
!git clone https://github.com/woldemarg/impulse_classifier
Cloning into 'impulse_classifier'...
remote: Enumerating objects: 28, done.
remote: Counting objects: 100% (28/28), done.
remote: Compressing objects: 100% (22/22), done.
remote: Total 28 (delta 8), reused 21 (delta 4), pack-reused 0
Unpacking objects: 100% (28/28), 178.24 KiB | 986.00 KiB/s, done.
cd impulse_classifier
/content/impulse_classifier/impulse_classifier
!pip install --upgrade lightgbm
Looking in indexes: https://pypi.org/simple, https://us-python.pkg.dev/colab-wheels/public/simple/
Requirement already satisfied: lightgbm in /usr/local/lib/python3.9/dist-packages (3.3.5)
Requirement already satisfied: wheel in /usr/local/lib/python3.9/dist-packages (from lightgbm) (0.40.0)
Requirement already satisfied: scikit-learn!=0.22.0 in /usr/local/lib/python3.9/dist-packages (from lightgbm) (1.2.2)
Requirement already satisfied: scipy in /usr/local/lib/python3.9/dist-packages (from lightgbm) (1.10.1)
Requirement already satisfied: numpy in /usr/local/lib/python3.9/dist-packages (from lightgbm) (1.22.4)
Requirement already satisfied: joblib>=1.1.1 in /usr/local/lib/python3.9/dist-packages (from scikit-learn!=0.22.0->lightgbm) (1.1.1)
Requirement already satisfied: threadpoolctl>=2.0.0 in /usr/local/lib/python3.9/dist-packages (from scikit-learn!=0.22.0->lightgbm) (3.1.0)
%matplotlib inline
import numpy as np
from sklearn.datasets import make_classification
from sklearn.model_selection import train_test_split
# define random state for reproducibility
RND = 1234
# set sample size
SMP = 5000
# shate of hidden to emulate PU-dataset
n_hidden_share = 0.5
# set test share
EVL = 0.25
X, y = make_classification(
n_samples=SMP,
weights=[0.6],
shuffle=True,
random_state=RND)
y_pu = y.copy()
pos = np.nonzero(y)[0]
np.random.RandomState(RND).shuffle(pos)
n_hidden = int(y.sum() * n_hidden_share)
y_pu[pos[:n_hidden]] = 0
X_trn, X_tst, y_trn_pu, y_tst_pu, y_trn, y_tst = train_test_split(
X, y_pu, y, test_size=EVL, random_state=RND, stratify=y_pu)
print(f'Positives in original target: {y.sum()} ({y.mean():.1%})')
print(f'Positives in modified target: {y_pu.sum()} ({y_pu.mean():.1%})')
Positives in original target: 2013 (40.3%)
Positives in modified target: 1007 (20.1%)