Here I will using scikit-learn to perform PCA in Jupyter Notebook.

First, I need some example to get familiar with this

Get our data and analysis it


In [ ]:
import numpy as np
from sklearn.decomposition import PCA
import pandas as pd
df = pd.read_csv('Manhattan.txt', sep='\s+')
df.drop('id', axis=1, inplace=True)
df.tail()

how to index a given part of a DataFrame have been a problem for me.

Refer pandas/html/10min.html#selection-by-position to keep in mind(link to file outside this dir not work well) file:///C:/work/python/%E6%96%87%E6%A1%A3/pandas/html/10min.html#selection-by-position


In [ ]:
tdf = df.iloc[:, 0:-3]
tdf.tail()

取一个主成分, 解释方差0.917864


In [ ]:
pca = PCA(n_components=8)
pca.fit(tdf)
np.set_printoptions(precision=6, suppress=True)

print('各主成份方差贡献占比:', end=' ')
print(pca.explained_variance_ratio_)

emotion_score = pd.DataFrame(pca.transform(tdf))
emotion_score.rename(columns={'0': 'emotion_score'}, inplace=True)
# 第一个主成份
pd.concat([df, emotion_score.loc[:, 0]], axis=1, join='inner').rename(index=str, columns={0: 'emotion_score'}).to_csv('Manhattan_score_raw.txt', index=None, sep='\t')