Learn practical skills, build real-world projects, and advance your career
Created 4 years ago
scikit-learn-pca
Credits: Forked from PyCon 2015 Scikit-learn Tutorial by Jake VanderPlas
Dimensionality Reduction: PCA
%matplotlib inline
import numpy as np
import matplotlib.pyplot as plt
import seaborn;
from sklearn import neighbors, datasets
import pylab as pl
seaborn.set()
iris = datasets.load_iris()
X, y = iris.data, iris.target
from sklearn.decomposition import PCA
pca = PCA(n_components=2)
pca.fit(X)
X_reduced = pca.transform(X)
print("Reduced dataset shape:", X_reduced.shape)
import pylab as pl
pl.scatter(X_reduced[:, 0], X_reduced[:, 1], c=y,
cmap='RdYlBu')
print("Meaning of the 2 components:")
for component in pca.components_:
print(" + ".join("%.3f x %s" % (value, name)
for value, name in zip(component,
iris.feature_names)))
('Reduced dataset shape:', (150, 2))
Meaning of the 2 components:
0.362 x sepal length (cm) + -0.082 x sepal width (cm) + 0.857 x petal length (cm) + 0.359 x petal width (cm)
-0.657 x sepal length (cm) + -0.730 x sepal width (cm) + 0.176 x petal length (cm) + 0.075 x petal width (cm)
Dimensionality Reduction: Principal Component Analysis in-depth
Here we'll explore Principal Component Analysis, which is an extremely useful linear dimensionality reduction technique. Principal Component Analysis is a very powerful unsupervised method for dimensionality reduction in data. Look for directions in the data with the most variance.
Useful to explore data, visualize data and relationships.
It's easiest to visualize by looking at a two-dimensional dataset: