note1



In [1]:

    
# 多行结果输出支持
from IPython.core.interactiveshell import InteractiveShell
InteractiveShell.ast_node_interactivity = "all"

阅读文献



In [3]:

    
import numpy as np
np.set_printoptions(precision=4)



In [4]:

    
# 导入视频
# from IPython.display import YouTubeVideo
# YouTubeVideo("8iGzBMboA0I")



In [5]:

    
# 制作切片(tuple 形式)
dims = np.index_exp[10:28:1,3:13]
dims









    Out[5]:





(slice(10, 28, 1), slice(3, 13, None))

The SVD algorithm factorizes a matrix into one matrix with orthogonal columns and one with orthogonal rows (along with a diagonal matrix, which contains the relative importance of each factor)

np.allclose(a, b, rtol=1e-05, atol=1e-08, equal_nan=False) 判断 a，b 是否近似相等
np.diag() 求对角阵
np.argsort(a, axis=-1, kind='quicksort', order=None) 对数据排序并返回排序后的下标
np.where(condition, [x, y]) 返回符合条件的x, y

Nonnegative matrix factorization (NMF) is a non-exact factorization that factors into one skinny positive matrix and one short positive matrix. NMF is NP-hard and non-unique. There are a number of variations on it, created by adding different constraints.

SVD 是精确分解
NMF 是非精确分解，NP 难，不唯一

TF-IDF

Topic Frequency-Inverse Document Frequency (TF-IDF) is a way to normalize term counts by taking into account how often they appear in a document, how long the document is, and how commmon/rare the term is.

TF = (# occurrences of term t in document) / (# of words in documents)

IDF = log(# of documents / # documents with term t in it)

For NMF, matrix needs to be at least as tall as it is wide, or we get an error with fit_transform Can use df_min in CountVectorizer to only look at words that were in at least k of the split texts
NMF 的矩阵行数>=列数



In [10]:

    
x = np.arange(9.).reshape(3, 3)
# 有填充的功能
# 条件为真就从 x 选择，否则从 y 选择
np.where(x < 5, x, -1)









    Out[10]:





array([[ 0.,  1.,  2.],
       [ 3.,  4., -1.],
       [-1., -1., -1.]])



In [ ]: