在真實情況下,你當然使用自己的資料,那要怎麼做呢?FukuML 提供了很簡易的方法可以讓大家載入自己的資料:
your_training_data_file = '/path/to/your/training_data/file'
pla_bc.load_train_data(your_training_data_file)
your_testing_data_file = '/path/to/your/testing_data/file'
pla_bc.load_test_data(your_testing_data_file)
就是這麼簡單,讓我們來實際演示一下:
In [1]:
import FukuML.PLA as pla
pla_bc = pla.BinaryClassifier()
pla_bc.load_train_data('/Users/fukuball/Projects/fuku-ml/FukuML/dataset/linear_separable_train.dat')
Out[1]:
(array([[ 1. , -0.49475104, 1.60851023],
[ 1. , 0.99350955, 2.53942025],
[ 1. , 0.67365802, 2.41859411],
[ 1. , -1.91676615, 0.48923093],
[ 1. , -0.80964166, 1.26206511],
[ 1. , -0.45285374, 1.82885284],
[ 1. , 0.27463815, 2.08049683],
[ 1. , 0.89694355, 3.7834262 ],
[ 1. , -1.72520564, 0.87640485],
[ 1. , 0.7349451 , 3.39882197],
[ 1. , -1.02461018, 1.44258081],
[ 1. , -0.60392455, 0.98807458],
[ 1. , 0.08098387, 2.15878467],
[ 1. , 0.48213089, 2.18476304],
[ 1. , 0.74123261, 3.22706092],
[ 1. , -0.57649605, 0.27757466],
[ 1. , -1.60301663, 0.85311484],
[ 1. , -1.90040634, 1.14021401],
[ 1. , 0.7943513 , 2.68559323],
[ 1. , 0.15398661, 2.61447653],
[ 1. , 2.4192871 , 4.18943591],
[ 1. , 0.13016586, 2.53128795],
[ 1. , -1.00057111, 1.2998211 ],
[ 1. , -2.24935866, -0.51829791],
[ 1. , -0.11745011, 2.36365622],
[ 1. , -0.18131864, 1.90732415],
[ 1. , -1.0669876 , 1.84490598],
[ 1. , -0.41819858, 1.20384123],
[ 1. , -1.27557363, 1.58879675],
[ 1. , -0.48455613, 1.56688674],
[ 1. , 1.76857878, 2.70393626],
[ 1. , 0.6178306 , 1.41965757],
[ 1. , 0.24021005, 2.07796794],
[ 1. , -0.40745049, 1.12846498],
[ 1. , -0.5450063 , 1.64924578],
[ 1. , -0.89149772, 1.29851015],
[ 1. , 1.12855231, 1.96797717],
[ 1. , -0.62563244, 1.87988573],
[ 1. , -0.91508504, 1.62532636],
[ 1. , -1.21008395, 0.41751392],
[ 1. , 1.0369232 , 3.32224131],
[ 1. , 0.82678315, 3.36840655],
[ 1. , 0.00522133, 2.57820823],
[ 1. , -1.06147755, 1.06473163],
[ 1. , -0.45700467, 2.00276916],
[ 1. , 0.13487671, 0.65962212],
[ 1. , 1.22494293, 2.4905672 ],
[ 1. , 0.82587401, 1.6469229 ],
[ 1. , -0.46393125, 2.81795857],
[ 1. , -2.36851079, 1.01187775],
[ 1. , 0.12587105, 2.55995705],
[ 1. , -0.35712397, 1.88322814],
[ 1. , 0.68857731, 2.45378334],
[ 1. , -1.11846239, 1.7060288 ],
[ 1. , -1.73549484, 1.16778056],
[ 1. , 0.18491969, 1.6888773 ],
[ 1. , 1.10350087, 2.55392247],
[ 1. , -0.44246031, 1.49684599],
[ 1. , -0.22148107, 2.66175094],
[ 1. , 0.30829778, 2.25791677],
[ 1. , -0.29287034, 2.04485062],
[ 1. , -0.44357665, 1.58064718],
[ 1. , 0.01694366, 1.60172119],
[ 1. , -0.35169509, 2.20195385],
[ 1. , 0.55527319, 1.84184212],
[ 1. , -0.59067181, 1.19101348],
[ 1. , 0.21534601, 2.95975298],
[ 1. , 0.79769729, 2.79259136],
[ 1. , 0.41191044, 1.86899517],
[ 1. , -1.39417234, 1.15327164],
[ 1. , 0.71641377, 2.87832566],
[ 1. , 0.44264983, 3.19840287],
[ 1. , -1.11935978, 1.0214965 ],
[ 1. , 0.25788802, 3.2897688 ],
[ 1. , -0.33296609, 0.88930833],
[ 1. , 2.07653897, 3.77544829],
[ 1. , -0.35754265, 0.91029036],
[ 1. , 0.38998221, 2.7169355 ],
[ 1. , 0.88980695, 2.23294531],
[ 1. , -0.30374769, 2.16560662],
[ 1. , 0.46858362, 2.22595082],
[ 1. , 1.5953943 , 3.86456874],
[ 1. , 0.80516593, 1.14755445],
[ 1. , -1.07078848, 1.07630871],
[ 1. , -0.61184666, 1.08231727],
[ 1. , -1.22082978, 1.441157 ],
[ 1. , 0.42133054, 2.00527312],
[ 1. , -1.15371133, 0.39545553],
[ 1. , -1.16529981, 0.55726593],
[ 1. , -0.0753288 , 2.65117295],
[ 1. , 1.6801046 , -0.08598257],
[ 1. , 1.8497788 , -0.01692729],
[ 1. , 1.85056814, 0.08177583],
[ 1. , 2.80034537, 0.81539772],
[ 1. , 0.40956132, 0.55189673],
[ 1. , 1.31484126, -0.82314199],
[ 1. , 0.74578981, -0.63417363],
[ 1. , 1.52094099, 0.07809939],
[ 1. , 2.07568652, -0.1150392 ],
[ 1. , 0.99162865, -0.53539644],
[ 1. , 0.7038055 , -0.19221186],
[ 1. , 2.06231887, -0.03769865],
[ 1. , 2.5799191 , 0.07069567],
[ 1. , 0.87544838, -0.5637692 ],
[ 1. , 2.11936394, -0.12933603],
[ 1. , 1.72361776, -0.83946543],
[ 1. , 1.00174566, -0.38122767],
[ 1. , 2.67256573, 1.39825995],
[ 1. , 1.95670692, -0.76033255],
[ 1. , -0.12377975, -1.66555773],
[ 1. , 3.49754388, 0.86099572],
[ 1. , 2.30620253, 1.41859895],
[ 1. , 2.55608362, 0.42400762],
[ 1. , 2.48991027, 0.76553132],
[ 1. , 2.55940671, -0.57984346],
[ 1. , 2.32904612, -0.51945896],
[ 1. , 1.7353993 , -0.75519976],
[ 1. , 2.51829003, 0.37786517],
[ 1. , 1.8706277 , -0.93869733],
[ 1. , 0.22236542, -2.44483319],
[ 1. , -0.09038213, -1.79941358],
[ 1. , 2.70225343, 0.94731516],
[ 1. , 1.88566698, 0.2798723 ],
[ 1. , 1.58910203, -0.70947294],
[ 1. , 3.0973127 , 0.92156856],
[ 1. , 2.69809437, 0.38175539],
[ 1. , 3.3289721 , 0.41484516],
[ 1. , 1.87232143, -0.61040703],
[ 1. , 2.33296818, 0.02254511],
[ 1. , 1.81407758, -0.17053957],
[ 1. , 2.96737955, 1.44181063],
[ 1. , 2.9380551 , 0.47943516],
[ 1. , 0.73973305, -1.28050721],
[ 1. , 2.08422916, -1.39634791],
[ 1. , 1.66061566, -0.98458495],
[ 1. , 1.98635728, -0.28509211],
[ 1. , 2.56435931, -0.47953988],
[ 1. , 2.20241294, 0.39511807],
[ 1. , 3.87224268, 0.76225007],
[ 1. , 1.64152869, 0.42732398],
[ 1. , 2.01228488, 0.83947942],
[ 1. , 1.12248906, -0.86437473],
[ 1. , 0.92242964, 0.68317263],
[ 1. , 1.60673796, 0.18415559],
[ 1. , 1.50243849, 0.01270292],
[ 1. , 2.85000032, 0.26811154],
[ 1. , 1.11113213, -1.08291552],
[ 1. , 1.40913806, -0.73020544],
[ 1. , 1.73676161, 0.22610285],
[ 1. , 2.55634878, 0.73043033],
[ 1. , 1.68015105, -0.51196788],
[ 1. , 2.07339426, -0.69056912],
[ 1. , 2.51559063, 0.28962156],
[ 1. , 1.91898721, -0.38112982],
[ 1. , 2.84276804, 1.00034359],
[ 1. , 2.48044735, 0.97198591],
[ 1. , 1.00833589, -1.14853349],
[ 1. , 1.33359442, -0.28894411],
[ 1. , 1.77539826, -1.30014904],
[ 1. , 0.67101677, -1.91851934],
[ 1. , 3.17326187, 1.70334529],
[ 1. , 2.08009842, -0.74647001],
[ 1. , 2.29719679, -0.16319915],
[ 1. , 1.83769012, -0.19705278],
[ 1. , 1.80368554, -0.12629935],
[ 1. , 1.56859964, -0.2302815 ],
[ 1. , 1.47931541, -0.54655241],
[ 1. , 1.9768354 , -0.26445185],
[ 1. , 2.50410495, 1.52574828],
[ 1. , 2.35432514, -0.47573114],
[ 1. , 1.07167578, 0.15158119],
[ 1. , 1.11479351, -1.433885 ],
[ 1. , 1.12664849, 0.11512314],
[ 1. , 2.08937432, -0.48560459],
[ 1. , 0.9236267 , -0.89739827],
[ 1. , 2.22914995, 0.08347266],
[ 1. , 1.87811234, -0.22916618],
[ 1. , 3.32948946, 0.77274808],
[ 1. , 2.60176922, 0.08437287],
[ 1. , 1.98635292, 0.81549389]]),
array([ 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1.,
1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1.,
1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1.,
1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1.,
1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1.,
1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1.,
1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., -1.,
-1., -1., -1., -1., -1., -1., -1., -1., -1., -1., -1., -1., -1.,
-1., -1., -1., -1., -1., -1., -1., -1., -1., -1., -1., -1., -1.,
-1., -1., -1., -1., -1., -1., -1., -1., -1., -1., -1., -1., -1.,
-1., -1., -1., -1., -1., -1., -1., -1., -1., -1., -1., -1., -1.,
-1., -1., -1., -1., -1., -1., -1., -1., -1., -1., -1., -1., -1.,
-1., -1., -1., -1., -1., -1., -1., -1., -1., -1., -1., -1., -1.,
-1., -1., -1., -1., -1., -1., -1., -1., -1., -1., -1.]))
In [2]:
pla_bc.load_test_data('/Users/fukuball/Projects/fuku-ml/FukuML/dataset/linear_separable_test.dat')
Out[2]:
(array([[ 1.00000000e+00, 3.12179041e-01, 3.26300582e+00],
[ 1.00000000e+00, -7.88545922e-01, 1.84177454e+00],
[ 1.00000000e+00, 3.45018856e-01, 2.02971487e+00],
[ 1.00000000e+00, -5.90936663e-02, 2.06095580e+00],
[ 1.00000000e+00, -6.04229133e-01, 1.89545186e+00],
[ 1.00000000e+00, 2.92639463e-01, 2.21847534e+00],
[ 1.00000000e+00, 1.37291076e+00, 3.10397301e+00],
[ 1.00000000e+00, -8.55850926e-01, 7.43968659e-01],
[ 1.00000000e+00, -2.33116362e-04, 1.45262917e+00],
[ 1.00000000e+00, 5.63747692e-01, 2.65759454e+00],
[ 1.00000000e+00, 3.27474170e+00, 5.16394190e-01],
[ 1.00000000e+00, 1.00982446e+00, -9.92472127e-01],
[ 1.00000000e+00, 1.63602318e+00, -7.66844250e-01],
[ 1.00000000e+00, 2.81507689e+00, 2.63441093e-01],
[ 1.00000000e+00, 1.83736479e+00, -8.03493918e-01],
[ 1.00000000e+00, 2.26025418e+00, 4.02606276e-01],
[ 1.00000000e+00, 2.18689341e+00, 7.86296427e-01],
[ 1.00000000e+00, 1.34179804e+00, 1.26613719e-04],
[ 1.00000000e+00, 2.75190511e+00, -3.67967235e-01],
[ 1.00000000e+00, 2.92122264e+00, 1.35934066e-01]]),
array([ 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., -1., -1., -1.,
-1., -1., -1., -1., -1., -1., -1.]))
看吧,都順利載入資料了,接下來的問題只剩下資料集的格式是怎麼樣,這個可以直接看 FukuML 提供的資料集一窺究竟:
https://github.com/fukuball/fuku-ml/blob/master/FukuML/dataset/pla_binary_train.dat
其實格式真的很簡單,就是將每筆資料的特徵值用空格隔開,然後放成一橫行,然後將這筆資料的答案用空格隔開放在最後,答案是正分類就是 1,負分類就是 -1,這樣就完成了。
所以比如你想做銀行核卡預測,然後審核的特徵是年薪、年齡、性別,那假設小明年薪 100W、年齡 30、性別男性且通過核卡了,那這筆資料就是:
100 30 1 1
假設小華年薪 20W、年齡 25、性別男性,沒有通過核卡,這筆資料就是:
20 25 1 -1
假設小美年薪 30W、年齡 24、性別女性,有過核卡,這筆資料就是:
30 24 0 1
以此類推,簡簡單單、輕輕鬆鬆,大家就可以使用自己的資料來玩玩看機器學習囉~