使用自己的訓練資料集和測試資料集

在真實情況下,你當然使用自己的資料,那要怎麼做呢?FukuML 提供了很簡易的方法可以讓大家載入自己的資料:

your_training_data_file = '/path/to/your/training_data/file'
pla_bc.load_train_data(your_training_data_file)

your_testing_data_file = '/path/to/your/testing_data/file'
pla_bc.load_test_data(your_testing_data_file)

就是這麼簡單,讓我們來實際演示一下:


In [1]:
import FukuML.PLA as pla

pla_bc = pla.BinaryClassifier()
pla_bc.load_train_data('/Users/fukuball/Projects/fuku-ml/FukuML/dataset/linear_separable_train.dat')


Out[1]:
(array([[ 1.        , -0.49475104,  1.60851023],
        [ 1.        ,  0.99350955,  2.53942025],
        [ 1.        ,  0.67365802,  2.41859411],
        [ 1.        , -1.91676615,  0.48923093],
        [ 1.        , -0.80964166,  1.26206511],
        [ 1.        , -0.45285374,  1.82885284],
        [ 1.        ,  0.27463815,  2.08049683],
        [ 1.        ,  0.89694355,  3.7834262 ],
        [ 1.        , -1.72520564,  0.87640485],
        [ 1.        ,  0.7349451 ,  3.39882197],
        [ 1.        , -1.02461018,  1.44258081],
        [ 1.        , -0.60392455,  0.98807458],
        [ 1.        ,  0.08098387,  2.15878467],
        [ 1.        ,  0.48213089,  2.18476304],
        [ 1.        ,  0.74123261,  3.22706092],
        [ 1.        , -0.57649605,  0.27757466],
        [ 1.        , -1.60301663,  0.85311484],
        [ 1.        , -1.90040634,  1.14021401],
        [ 1.        ,  0.7943513 ,  2.68559323],
        [ 1.        ,  0.15398661,  2.61447653],
        [ 1.        ,  2.4192871 ,  4.18943591],
        [ 1.        ,  0.13016586,  2.53128795],
        [ 1.        , -1.00057111,  1.2998211 ],
        [ 1.        , -2.24935866, -0.51829791],
        [ 1.        , -0.11745011,  2.36365622],
        [ 1.        , -0.18131864,  1.90732415],
        [ 1.        , -1.0669876 ,  1.84490598],
        [ 1.        , -0.41819858,  1.20384123],
        [ 1.        , -1.27557363,  1.58879675],
        [ 1.        , -0.48455613,  1.56688674],
        [ 1.        ,  1.76857878,  2.70393626],
        [ 1.        ,  0.6178306 ,  1.41965757],
        [ 1.        ,  0.24021005,  2.07796794],
        [ 1.        , -0.40745049,  1.12846498],
        [ 1.        , -0.5450063 ,  1.64924578],
        [ 1.        , -0.89149772,  1.29851015],
        [ 1.        ,  1.12855231,  1.96797717],
        [ 1.        , -0.62563244,  1.87988573],
        [ 1.        , -0.91508504,  1.62532636],
        [ 1.        , -1.21008395,  0.41751392],
        [ 1.        ,  1.0369232 ,  3.32224131],
        [ 1.        ,  0.82678315,  3.36840655],
        [ 1.        ,  0.00522133,  2.57820823],
        [ 1.        , -1.06147755,  1.06473163],
        [ 1.        , -0.45700467,  2.00276916],
        [ 1.        ,  0.13487671,  0.65962212],
        [ 1.        ,  1.22494293,  2.4905672 ],
        [ 1.        ,  0.82587401,  1.6469229 ],
        [ 1.        , -0.46393125,  2.81795857],
        [ 1.        , -2.36851079,  1.01187775],
        [ 1.        ,  0.12587105,  2.55995705],
        [ 1.        , -0.35712397,  1.88322814],
        [ 1.        ,  0.68857731,  2.45378334],
        [ 1.        , -1.11846239,  1.7060288 ],
        [ 1.        , -1.73549484,  1.16778056],
        [ 1.        ,  0.18491969,  1.6888773 ],
        [ 1.        ,  1.10350087,  2.55392247],
        [ 1.        , -0.44246031,  1.49684599],
        [ 1.        , -0.22148107,  2.66175094],
        [ 1.        ,  0.30829778,  2.25791677],
        [ 1.        , -0.29287034,  2.04485062],
        [ 1.        , -0.44357665,  1.58064718],
        [ 1.        ,  0.01694366,  1.60172119],
        [ 1.        , -0.35169509,  2.20195385],
        [ 1.        ,  0.55527319,  1.84184212],
        [ 1.        , -0.59067181,  1.19101348],
        [ 1.        ,  0.21534601,  2.95975298],
        [ 1.        ,  0.79769729,  2.79259136],
        [ 1.        ,  0.41191044,  1.86899517],
        [ 1.        , -1.39417234,  1.15327164],
        [ 1.        ,  0.71641377,  2.87832566],
        [ 1.        ,  0.44264983,  3.19840287],
        [ 1.        , -1.11935978,  1.0214965 ],
        [ 1.        ,  0.25788802,  3.2897688 ],
        [ 1.        , -0.33296609,  0.88930833],
        [ 1.        ,  2.07653897,  3.77544829],
        [ 1.        , -0.35754265,  0.91029036],
        [ 1.        ,  0.38998221,  2.7169355 ],
        [ 1.        ,  0.88980695,  2.23294531],
        [ 1.        , -0.30374769,  2.16560662],
        [ 1.        ,  0.46858362,  2.22595082],
        [ 1.        ,  1.5953943 ,  3.86456874],
        [ 1.        ,  0.80516593,  1.14755445],
        [ 1.        , -1.07078848,  1.07630871],
        [ 1.        , -0.61184666,  1.08231727],
        [ 1.        , -1.22082978,  1.441157  ],
        [ 1.        ,  0.42133054,  2.00527312],
        [ 1.        , -1.15371133,  0.39545553],
        [ 1.        , -1.16529981,  0.55726593],
        [ 1.        , -0.0753288 ,  2.65117295],
        [ 1.        ,  1.6801046 , -0.08598257],
        [ 1.        ,  1.8497788 , -0.01692729],
        [ 1.        ,  1.85056814,  0.08177583],
        [ 1.        ,  2.80034537,  0.81539772],
        [ 1.        ,  0.40956132,  0.55189673],
        [ 1.        ,  1.31484126, -0.82314199],
        [ 1.        ,  0.74578981, -0.63417363],
        [ 1.        ,  1.52094099,  0.07809939],
        [ 1.        ,  2.07568652, -0.1150392 ],
        [ 1.        ,  0.99162865, -0.53539644],
        [ 1.        ,  0.7038055 , -0.19221186],
        [ 1.        ,  2.06231887, -0.03769865],
        [ 1.        ,  2.5799191 ,  0.07069567],
        [ 1.        ,  0.87544838, -0.5637692 ],
        [ 1.        ,  2.11936394, -0.12933603],
        [ 1.        ,  1.72361776, -0.83946543],
        [ 1.        ,  1.00174566, -0.38122767],
        [ 1.        ,  2.67256573,  1.39825995],
        [ 1.        ,  1.95670692, -0.76033255],
        [ 1.        , -0.12377975, -1.66555773],
        [ 1.        ,  3.49754388,  0.86099572],
        [ 1.        ,  2.30620253,  1.41859895],
        [ 1.        ,  2.55608362,  0.42400762],
        [ 1.        ,  2.48991027,  0.76553132],
        [ 1.        ,  2.55940671, -0.57984346],
        [ 1.        ,  2.32904612, -0.51945896],
        [ 1.        ,  1.7353993 , -0.75519976],
        [ 1.        ,  2.51829003,  0.37786517],
        [ 1.        ,  1.8706277 , -0.93869733],
        [ 1.        ,  0.22236542, -2.44483319],
        [ 1.        , -0.09038213, -1.79941358],
        [ 1.        ,  2.70225343,  0.94731516],
        [ 1.        ,  1.88566698,  0.2798723 ],
        [ 1.        ,  1.58910203, -0.70947294],
        [ 1.        ,  3.0973127 ,  0.92156856],
        [ 1.        ,  2.69809437,  0.38175539],
        [ 1.        ,  3.3289721 ,  0.41484516],
        [ 1.        ,  1.87232143, -0.61040703],
        [ 1.        ,  2.33296818,  0.02254511],
        [ 1.        ,  1.81407758, -0.17053957],
        [ 1.        ,  2.96737955,  1.44181063],
        [ 1.        ,  2.9380551 ,  0.47943516],
        [ 1.        ,  0.73973305, -1.28050721],
        [ 1.        ,  2.08422916, -1.39634791],
        [ 1.        ,  1.66061566, -0.98458495],
        [ 1.        ,  1.98635728, -0.28509211],
        [ 1.        ,  2.56435931, -0.47953988],
        [ 1.        ,  2.20241294,  0.39511807],
        [ 1.        ,  3.87224268,  0.76225007],
        [ 1.        ,  1.64152869,  0.42732398],
        [ 1.        ,  2.01228488,  0.83947942],
        [ 1.        ,  1.12248906, -0.86437473],
        [ 1.        ,  0.92242964,  0.68317263],
        [ 1.        ,  1.60673796,  0.18415559],
        [ 1.        ,  1.50243849,  0.01270292],
        [ 1.        ,  2.85000032,  0.26811154],
        [ 1.        ,  1.11113213, -1.08291552],
        [ 1.        ,  1.40913806, -0.73020544],
        [ 1.        ,  1.73676161,  0.22610285],
        [ 1.        ,  2.55634878,  0.73043033],
        [ 1.        ,  1.68015105, -0.51196788],
        [ 1.        ,  2.07339426, -0.69056912],
        [ 1.        ,  2.51559063,  0.28962156],
        [ 1.        ,  1.91898721, -0.38112982],
        [ 1.        ,  2.84276804,  1.00034359],
        [ 1.        ,  2.48044735,  0.97198591],
        [ 1.        ,  1.00833589, -1.14853349],
        [ 1.        ,  1.33359442, -0.28894411],
        [ 1.        ,  1.77539826, -1.30014904],
        [ 1.        ,  0.67101677, -1.91851934],
        [ 1.        ,  3.17326187,  1.70334529],
        [ 1.        ,  2.08009842, -0.74647001],
        [ 1.        ,  2.29719679, -0.16319915],
        [ 1.        ,  1.83769012, -0.19705278],
        [ 1.        ,  1.80368554, -0.12629935],
        [ 1.        ,  1.56859964, -0.2302815 ],
        [ 1.        ,  1.47931541, -0.54655241],
        [ 1.        ,  1.9768354 , -0.26445185],
        [ 1.        ,  2.50410495,  1.52574828],
        [ 1.        ,  2.35432514, -0.47573114],
        [ 1.        ,  1.07167578,  0.15158119],
        [ 1.        ,  1.11479351, -1.433885  ],
        [ 1.        ,  1.12664849,  0.11512314],
        [ 1.        ,  2.08937432, -0.48560459],
        [ 1.        ,  0.9236267 , -0.89739827],
        [ 1.        ,  2.22914995,  0.08347266],
        [ 1.        ,  1.87811234, -0.22916618],
        [ 1.        ,  3.32948946,  0.77274808],
        [ 1.        ,  2.60176922,  0.08437287],
        [ 1.        ,  1.98635292,  0.81549389]]),
 array([ 1.,  1.,  1.,  1.,  1.,  1.,  1.,  1.,  1.,  1.,  1.,  1.,  1.,
         1.,  1.,  1.,  1.,  1.,  1.,  1.,  1.,  1.,  1.,  1.,  1.,  1.,
         1.,  1.,  1.,  1.,  1.,  1.,  1.,  1.,  1.,  1.,  1.,  1.,  1.,
         1.,  1.,  1.,  1.,  1.,  1.,  1.,  1.,  1.,  1.,  1.,  1.,  1.,
         1.,  1.,  1.,  1.,  1.,  1.,  1.,  1.,  1.,  1.,  1.,  1.,  1.,
         1.,  1.,  1.,  1.,  1.,  1.,  1.,  1.,  1.,  1.,  1.,  1.,  1.,
         1.,  1.,  1.,  1.,  1.,  1.,  1.,  1.,  1.,  1.,  1.,  1., -1.,
        -1., -1., -1., -1., -1., -1., -1., -1., -1., -1., -1., -1., -1.,
        -1., -1., -1., -1., -1., -1., -1., -1., -1., -1., -1., -1., -1.,
        -1., -1., -1., -1., -1., -1., -1., -1., -1., -1., -1., -1., -1.,
        -1., -1., -1., -1., -1., -1., -1., -1., -1., -1., -1., -1., -1.,
        -1., -1., -1., -1., -1., -1., -1., -1., -1., -1., -1., -1., -1.,
        -1., -1., -1., -1., -1., -1., -1., -1., -1., -1., -1., -1., -1.,
        -1., -1., -1., -1., -1., -1., -1., -1., -1., -1., -1.]))

In [2]:
pla_bc.load_test_data('/Users/fukuball/Projects/fuku-ml/FukuML/dataset/linear_separable_test.dat')


Out[2]:
(array([[  1.00000000e+00,   3.12179041e-01,   3.26300582e+00],
        [  1.00000000e+00,  -7.88545922e-01,   1.84177454e+00],
        [  1.00000000e+00,   3.45018856e-01,   2.02971487e+00],
        [  1.00000000e+00,  -5.90936663e-02,   2.06095580e+00],
        [  1.00000000e+00,  -6.04229133e-01,   1.89545186e+00],
        [  1.00000000e+00,   2.92639463e-01,   2.21847534e+00],
        [  1.00000000e+00,   1.37291076e+00,   3.10397301e+00],
        [  1.00000000e+00,  -8.55850926e-01,   7.43968659e-01],
        [  1.00000000e+00,  -2.33116362e-04,   1.45262917e+00],
        [  1.00000000e+00,   5.63747692e-01,   2.65759454e+00],
        [  1.00000000e+00,   3.27474170e+00,   5.16394190e-01],
        [  1.00000000e+00,   1.00982446e+00,  -9.92472127e-01],
        [  1.00000000e+00,   1.63602318e+00,  -7.66844250e-01],
        [  1.00000000e+00,   2.81507689e+00,   2.63441093e-01],
        [  1.00000000e+00,   1.83736479e+00,  -8.03493918e-01],
        [  1.00000000e+00,   2.26025418e+00,   4.02606276e-01],
        [  1.00000000e+00,   2.18689341e+00,   7.86296427e-01],
        [  1.00000000e+00,   1.34179804e+00,   1.26613719e-04],
        [  1.00000000e+00,   2.75190511e+00,  -3.67967235e-01],
        [  1.00000000e+00,   2.92122264e+00,   1.35934066e-01]]),
 array([ 1.,  1.,  1.,  1.,  1.,  1.,  1.,  1.,  1.,  1., -1., -1., -1.,
        -1., -1., -1., -1., -1., -1., -1.]))

看吧,都順利載入資料了,接下來的問題只剩下資料集的格式是怎麼樣,這個可以直接看 FukuML 提供的資料集一窺究竟:

https://github.com/fukuball/fuku-ml/blob/master/FukuML/dataset/pla_binary_train.dat

其實格式真的很簡單,就是將每筆資料的特徵值用空格隔開,然後放成一橫行,然後將這筆資料的答案用空格隔開放在最後,答案是正分類就是 1,負分類就是 -1,這樣就完成了。

所以比如你想做銀行核卡預測,然後審核的特徵是年薪、年齡、性別,那假設小明年薪 100W、年齡 30、性別男性且通過核卡了,那這筆資料就是:

100 30 1 1

假設小華年薪 20W、年齡 25、性別男性,沒有通過核卡,這筆資料就是:

20 25 1 -1

假設小美年薪 30W、年齡 24、性別女性,有過核卡,這筆資料就是:

30 24 0 1

以此類推,簡簡單單、輕輕鬆鬆,大家就可以使用自己的資料來玩玩看機器學習囉~