For every element $1 \dots n$, we should multiply the value of that element in array (Series) $A$ with the value for that same element in array $B$ and then sum all of these values.
Implement a function below that takes as input two Series $A$ and $B$ and returns the value required for the numerator of the equation.
In [ ]:
import pandas as pd
def CS_num(A,B):
#insert your code here
# Run the code below to check your code.
df = pd.read_pickle('test_LL.pickle')
print(CS_num(df.ix[0], df.ix[1]))
Similar to the numerator, we want to multiply each value in Series $A$ by itself, and then find the sum of all of these values.
Write a function below that will take as input a Series $A$ and will return the appropriate value for the first half of the denominator of our cosine similarity equation.
In [ ]:
def CS_den_part(A):
#insert your code here
#The lines below are to check your code for errors.
print(CS_den(df.ix[0]))
A brief look at the second square root in the denominator should demonstrate that we do can use our previous function (CS_den_part) to calculate the second part of the denominator as well as the first.
The last pre-calculation for our cosine similarity equation is to bring the two parts of the denominator together. Define a function below that will take two Series $A$ and $B$ and call the CS_den_part function above to do the appropriate calculations and will return the necessary value for the denominator.
In [ ]:
def CS_den(A, B):
#insert your code here
#The lines below are to check your code for errors.
print(CS_den(df.ix[0], df.ix[1]))
In order to do the final calculation of our cosine similarity score, we need to write a function that will take two Series $A$ and $B$ as input, call the appropriate functions to do the calculations for the parts of the CS equation, and return a single number that is the cosine similarity of the two Series.
In [ ]:
def cos_sim(A,B):
#insert your code here
#The lines below are to check your code for errors.
print(cos_sim(df.ix[0], df.ix[1]))
Now, finally, we need to wrap everything we have done above in a function that takes a matrix of arrays (DataFrame) $DF\_LL$, performs the necessary calculations to calculate the CS of every row with every other row, and builds a new DataFrame $CS\_DF$ that has the same shape as $DF\_LL$ but contains the cosine similarity scores for the individual arrays.
In [ ]:
def CS_matrix(DF_LL):
#insert your code here
#The lines below are meant to check your function for errors.
print(CS_matrix(df))
What do you notice about your matrix? Do these characteristics that you notice make sense? Why or why not?
Our last step is simply to compare our answers to the cosine distance function in the sci-kit learn package. This function is extremely easy to implement. It is done like this:
In [6]:
1-CS
Out[6]:
1
10
12
13
14
15
17
19
20
23
25
26
27
29
3
33
35
37
38
39
1
1.000000
0.707380
0.682076
0.716077
0.680384
0.705289
0.696743
0.688928
0.697534
0.690243
0.710229
0.717261
0.711713
0.756331
0.788174
0.741112
0.755663
0.721572
0.706176
0.677446
...
10
0.707380
1.000000
0.838920
0.765198
0.668240
0.695764
0.687894
0.677130
0.685574
0.678412
0.697515
0.704351
0.698922
0.700459
0.673914
0.686770
0.700627
0.712466
0.759010
0.758575
...
12
0.682076
0.838920
1.000000
0.866618
0.708827
0.670517
0.662957
0.652455
0.660590
0.653690
0.672074
0.678657
0.673428
0.674959
0.649700
0.661906
0.675259
0.686640
0.752882
0.775229
...
13
0.716077
0.765198
0.866618
1.000000
0.850439
0.818521
0.695073
0.686618
0.725198
0.717629
0.750373
0.757581
0.721528
0.710976
0.679807
0.694739
0.702439
0.713670
0.725200
0.718975
...
14
0.680384
0.668240
0.708827
0.850439
1.000000
0.776405
0.722877
0.655169
0.716952
0.709473
0.739874
0.746893
0.687168
0.678894
0.644351
0.660899
0.662959
0.673136
0.669345
0.639555
...
15
0.705289
0.695764
0.670517
0.818521
0.776405
1.000000
0.765844
0.777129
0.714802
0.707341
0.739669
0.746777
0.711161
0.700682
0.669658
0.684547
0.692118
0.703208
0.693972
0.666010
...
17
0.696743
0.687894
0.662957
0.695073
0.722877
0.765844
1.000000
0.802772
0.840065
0.758365
0.689581
0.696270
0.690924
0.693625
0.661858
0.677195
0.684627
0.777243
0.766187
0.658497
...
19
0.688928
0.677130
0.652455
0.686618
0.655169
0.777129
0.802772
1.000000
0.858612
0.782143
0.779884
0.690375
0.685091
0.688812
0.652723
0.670098
0.672080
0.757973
0.752407
0.648082
...
20
0.697534
0.685574
0.660590
0.725198
0.716952
0.714802
0.840065
0.858612
1.000000
0.861461
0.860356
0.727163
0.693606
0.697377
0.660868
0.678443
0.680452
0.767392
0.761754
0.656163
...
23
0.690243
0.678412
0.653690
0.717629
0.709473
0.707341
0.758365
0.782143
0.861461
1.000000
0.820920
0.810178
0.686368
0.690098
0.653963
0.671358
0.673346
0.683757
0.680032
0.649308
...
25
0.710229
0.697515
0.672074
0.750373
0.739874
0.739669
0.689581
0.779884
0.860356
0.820920
1.000000
0.805390
0.802098
0.708571
0.672594
0.689822
0.691981
0.702597
0.698631
0.667572
...
26
0.717261
0.704351
0.678657
0.757581
0.746893
0.746777
0.696270
0.690375
0.727163
0.810178
0.805390
1.000000
0.810060
0.800638
0.679215
0.696525
0.698719
0.709429
0.705406
0.674112
...
27
0.711713
0.698922
0.673428
0.721528
0.687168
0.711161
0.690924
0.685091
0.693606
0.686368
0.802098
0.810060
1.000000
0.807069
0.673972
0.691173
0.693347
0.703977
0.699990
0.668918
...
29
0.756331
0.700459
0.674959
0.710976
0.678894
0.700682
0.693625
0.688812
0.697377
0.690098
0.708571
0.800638
0.807069
1.000000
0.714955
0.821197
0.733866
0.706445
0.702754
0.670432
...
3
0.788174
0.673914
0.649700
0.679807
0.644351
0.669658
0.661858
0.652723
0.660868
0.653963
0.672594
0.679215
0.673972
0.714955
1.000000
0.702267
0.716418
0.685477
0.670762
0.645303
...
33
0.741112
0.686770
0.661906
0.694739
0.660899
0.684547
0.677195
0.670098
0.678443
0.671358
0.689822
0.696525
0.691173
0.821197
0.702267
1.000000
0.845733
0.695129
0.686191
0.657449
...
35
0.755663
0.700627
0.675259
0.702439
0.662959
0.692118
0.684627
0.672080
0.680452
0.673346
0.691981
0.698719
0.693347
0.733866
0.716418
0.845733
1.000000
0.773404
0.693732
0.670713
...
37
0.721572
0.712466
0.686640
0.713670
0.673136
0.703208
0.777243
0.757973
0.767392
0.683757
0.702597
0.709429
0.703977
0.706445
0.685477
0.695129
0.773404
1.000000
0.842367
0.682021
...
38
0.706176
0.759010
0.752882
0.725200
0.669345
0.693972
0.766187
0.752407
0.761754
0.680032
0.698631
0.705406
0.699990
0.702754
0.670762
0.686191
0.693732
0.842367
1.000000
0.834475
...
39
0.677446
0.758575
0.775229
0.718975
0.639555
0.666010
0.658497
0.648082
0.656163
0.649308
0.667572
0.674112
0.668918
0.670432
0.645303
0.657449
0.670713
0.682021
0.834475
1.000000
...
4
0.719692
0.718501
0.692788
0.719800
0.677672
0.708942
0.700304
0.686065
0.694638
0.687377
0.707406
0.714427
0.708896
0.708955
0.790808
0.698316
0.719871
0.732696
0.709794
0.688087
...
42
0.698748
0.827003
0.714201
0.719897
0.660142
0.687317
0.679540
0.668921
0.677263
0.670188
0.689062
0.695816
0.690453
0.691965
0.665704
0.678426
0.692114
0.703813
0.755690
0.855457
...
45
0.682765
0.785266
0.653215
0.715780
0.705728
0.705260
0.664276
0.654004
0.741238
0.733501
0.753297
0.712026
0.675077
0.676512
0.650568
0.663160
0.676542
0.688002
0.673174
0.648803
...
46
0.685460
0.728720
0.652278
0.715507
0.706514
0.705128
0.666773
0.659251
0.741418
0.733682
0.753003
0.714808
0.679894
0.682538
0.651173
0.666324
0.673631
0.684518
0.675615
0.647890
...
47
0.679021
0.677930
0.653670
0.679181
0.639447
0.668935
0.660780
0.647363
0.655453
0.648601
0.667503
0.674129
0.668910
0.668957
0.649255
0.658899
0.679241
0.691347
0.669735
0.649234
...
48
0.685860
0.684747
0.660243
0.686003
0.645865
0.675654
0.667419
0.653862
0.662032
0.655112
0.674203
0.680895
0.675623
0.675674
0.655788
0.665520
0.686066
0.698292
0.676463
0.655762
...
50
0.698082
0.692510
0.667543
0.696696
0.659127
0.686366
0.678619
0.667910
0.676238
0.669174
0.688001
0.694742
0.689388
0.690939
0.664980
0.677533
0.691202
0.702861
0.687702
0.663036
...
51
0.715119
0.705792
0.680196
0.712901
0.677446
0.702479
0.695061
0.686988
0.695540
0.688277
0.707085
0.713940
0.708459
0.711294
0.679177
0.694644
0.702293
0.713597
0.704272
0.675621
...
53
0.708890
0.699653
0.674279
0.706709
0.671568
0.696378
0.689023
0.681026
0.689504
0.682305
0.700951
0.707746
0.702313
0.705120
0.673266
0.688608
0.696189
0.707396
0.698154
0.669745
...
54
0.678542
0.673453
0.649188
0.677828
0.641477
0.667766
0.660189
0.649989
0.658095
0.651220
0.669581
0.676146
0.670935
0.672358
0.646551
0.659078
0.672378
0.683769
0.669033
0.644802
...
55
0.712179
0.706417
0.680964
0.711016
0.672891
0.700461
0.692512
0.681819
0.690322
0.683110
0.702372
0.709259
0.703791
0.705637
0.678194
0.691716
0.705295
0.717245
0.701789
0.676365
...
56
0.722744
0.741390
0.738292
0.744183
0.682415
0.710501
0.702461
0.691490
0.700113
0.692799
0.712311
0.719292
0.713749
0.715670
0.688150
0.701684
0.715459
0.727553
0.734828
0.733326
...
57
0.689842
0.708035
0.705066
0.710682
0.651687
0.678525
0.670850
0.660356
0.668591
0.661607
0.680237
0.686904
0.681610
0.683107
0.657210
0.669753
0.683266
0.694813
0.701751
0.700325
...
58
0.686563
0.699605
0.695161
0.705080
0.650778
0.674715
0.667572
0.659928
0.668144
0.661167
0.679252
0.685839
0.680573
0.683258
0.652137
0.667148
0.674479
0.685356
0.696752
0.690503
...
59
0.703818
0.694300
0.669107
0.715233
0.678604
0.704831
0.683453
0.675255
0.683660
0.707953
0.740244
0.747354
0.744474
0.734957
0.668254
0.683099
0.690655
0.701720
0.692502
0.664609
...
6
0.719336
0.778034
0.811573
0.778018
0.679444
0.707454
0.699457
0.688487
0.697072
0.689790
0.709210
0.716159
0.710640
0.712213
0.804255
0.698320
0.712410
0.724442
0.731658
0.730185
...
61
0.697220
0.684787
0.659812
0.707075
0.673480
0.696912
0.677045
0.671413
0.679759
0.701843
0.732575
0.739554
0.736547
0.728930
0.660302
0.677273
0.679382
0.689813
0.685931
0.655392
...
62
0.722445
0.800834
0.824240
0.795784
0.707761
0.733847
0.700177
0.693754
0.702374
0.695045
0.738669
0.745717
0.740024
0.718966
0.683803
0.700529
0.702858
0.713544
0.763213
0.753739
...
63
0.708601
0.841683
0.882000
0.799135
0.693727
0.719412
0.686488
0.680070
0.688520
0.681336
0.724021
0.730927
0.725348
0.704803
0.670622
0.686856
0.689170
0.699626
0.839418
0.891797
...
65
0.724006
0.768445
0.760456
0.765743
0.709326
0.812203
0.778414
0.730066
0.703921
0.696576
0.740302
0.747365
0.741660
0.720548
0.685286
0.702061
0.704393
0.715104
0.804469
0.836600
...
67
0.708560
0.695385
0.670001
0.760391
0.784242
0.865132
0.762274
0.714950
0.716877
0.709398
0.752748
0.759900
0.726298
0.705534
0.670739
0.687315
0.689571
0.700075
0.695997
0.665516
...
68
0.709741
0.699785
0.674377
0.752434
0.779479
0.784774
0.688541
0.680006
0.718111
0.710615
0.742971
0.750105
0.714506
0.704157
0.673677
0.688245
0.695894
0.706989
0.697649
0.669845
...
69
0.688751
0.683142
0.658509
0.720937
0.710353
0.710356
0.669357
0.658719
0.698185
0.690894
0.709972
0.716881
0.679887
0.681447
0.656030
0.668304
0.681786
0.693269
0.678314
0.654063
...
7
0.708005
0.862811
0.812678
0.771417
0.666231
0.697086
0.688616
0.674504
0.682932
0.675793
0.695461
0.702360
0.696923
0.697034
0.676839
0.686695
0.707867
0.720449
0.722210
0.726511
...
71
0.713869
0.707685
0.682169
0.711911
0.673486
0.701357
0.693448
0.682467
0.690977
0.683759
0.702989
0.709876
0.704406
0.706360
0.679575
0.692718
0.706315
0.718221
0.702729
0.677563
...
72
0.837623
0.714574
0.688743
0.723872
0.689293
0.713218
0.705440
0.698782
0.707487
0.700099
0.719466
0.726472
0.720886
0.804125
0.731076
0.789228
0.754587
0.724003
0.714833
0.684101
...
9
0.683303
0.823862
0.726669
0.717921
0.645956
0.672435
0.664806
0.654529
0.662692
0.655769
0.674257
0.680868
0.675619
0.677055
0.651083
0.663688
0.677082
0.688551
0.673712
0.649318
...
a
0.204840
0.250612
0.265683
0.279937
0.262745
0.277323
0.228571
0.185950
0.204236
0.191131
0.232960
0.282360
0.246332
0.182721
0.191139
0.165320
0.207915
0.242616
0.249375
0.262352
...
about
0.734857
0.721060
0.694733
0.788154
0.775047
0.776968
0.712226
0.705701
0.743020
0.735268
0.780127
0.787536
0.752775
0.731346
0.695557
0.712583
0.714950
0.725821
0.721559
0.690083
...
abroad
0.690153
0.677980
0.653260
0.687067
0.655313
0.677153
0.670448
0.664987
0.673253
0.666227
0.683951
0.690518
0.685236
0.689050
0.653685
0.670651
0.672711
0.683059
0.679250
0.648883
...
abstract
0.563718
0.559387
0.539227
0.562924
0.532671
0.554572
0.548292
0.539751
0.546482
0.568092
0.583788
0.589470
0.587406
0.589427
0.537083
0.547386
0.558431
0.567876
0.555634
0.535585
...
ache
0.707177
0.695044
0.669716
0.704759
0.672462
0.694577
0.687657
0.682352
0.690835
0.683625
0.701852
0.708596
0.703174
0.706998
0.670000
0.687809
0.689848
0.700512
0.696694
0.665227
...
admired
0.765850
0.767027
0.758139
0.769235
0.711351
0.739823
0.733172
0.722450
0.731413
0.723785
0.742395
0.749438
0.743729
0.749309
0.722304
0.734312
0.752068
0.762642
0.761436
0.753104
...
afar
0.708502
0.696309
0.670933
0.705998
0.673613
0.695799
0.688872
0.683525
0.692022
0.684800
0.703054
0.709809
0.704377
0.708218
0.671235
0.689030
0.691081
0.701758
0.697923
0.666436
...
after
0.659784
0.647953
0.624333
0.651458
0.617015
0.642038
0.635612
0.626020
0.633804
0.627189
0.643987
0.650185
0.645207
0.648833
0.624673
0.636012
0.649968
0.659911
0.643970
0.620149
...
again
0.562236
0.553439
0.533307
0.582247
0.572950
0.573859
0.543760
0.536322
0.565917
0.560011
0.574905
0.580424
0.552892
0.555490
0.533156
0.543679
0.549815
0.558443
0.550933
0.529728
...
against
0.678144
0.666114
0.641823
0.674961
0.643710
0.665223
0.658645
0.653221
0.661340
0.654438
0.671841
0.678291
0.673102
0.676866
0.642272
0.658857
0.660895
0.671051
0.667291
0.637523
...
age
0.683851
0.602333
0.580624
0.619497
0.585480
0.610347
0.590372
0.581164
0.588411
0.582264
0.611120
0.617092
0.612341
0.601786
0.578323
0.590032
0.601293
0.611460
0.598278
0.576703
...
aged
0.697371
0.688231
0.663286
0.695528
0.661184
0.685348
0.678068
0.670458
0.678806
0.671718
0.690115
0.696810
0.691459
0.694459
0.662139
0.677942
0.685028
0.696107
0.687061
0.658824
...
agree
0.734056
0.724013
0.697736
0.730814
0.694136
0.720147
0.712600
0.703963
0.712725
0.705284
0.724501
0.731518
0.725904
0.728932
0.696899
0.712251
0.720141
0.731662
0.722033
0.693047
...
...
...
...
...
...
...
...
...
...
...
...
...
...
...
...
...
...
...
...
...
1388 rows × 1388 columns
In [5]:
from sklearn.metrics.pairwise import pairwise_distances
import pandas as pd
df = pd.read_pickle('./Data/blake-songs.txt.cooc..LL.pickle')
CS = pd.DataFrame(pairwise_distances(df, metric = 'cosine'), index = df.index, columns = df.columns)
CS
Out[5]:
1
10
12
13
14
15
17
19
20
23
25
26
27
29
3
33
35
37
38
39
1
-8.881784e-16
2.926204e-01
3.179237e-01
2.839232e-01
3.196156e-01
2.947107e-01
3.032572e-01
3.110717e-01
3.024663e-01
3.097569e-01
2.897714e-01
2.827389e-01
2.882873e-01
2.436688e-01
2.118263e-01
0.258888
2.443366e-01
2.784281e-01
2.938242e-01
3.225538e-01
...
10
2.926204e-01
-6.661338e-16
1.610803e-01
2.348019e-01
3.317601e-01
3.042364e-01
3.121061e-01
3.228705e-01
3.144264e-01
3.215884e-01
3.024852e-01
2.956493e-01
3.010776e-01
2.995410e-01
3.260856e-01
0.313230
2.993731e-01
2.875343e-01
2.409896e-01
2.414255e-01
...
12
3.179237e-01
1.610803e-01
-6.661338e-16
1.333816e-01
2.911730e-01
3.294827e-01
3.370433e-01
3.475454e-01
3.394096e-01
3.463104e-01
3.279261e-01
3.213425e-01
3.265720e-01
3.250407e-01
3.502997e-01
0.338094
3.247412e-01
3.133599e-01
2.471183e-01
2.247707e-01
...
13
2.839232e-01
2.348019e-01
1.333816e-01
-2.220446e-16
1.495609e-01
1.814791e-01
3.049270e-01
3.133821e-01
2.748019e-01
2.823712e-01
2.496271e-01
2.424194e-01
2.784717e-01
2.890239e-01
3.201925e-01
0.305261
2.975607e-01
2.863296e-01
2.748002e-01
2.810252e-01
...
14
3.196156e-01
3.317601e-01
2.911730e-01
1.495609e-01
-4.440892e-16
2.235953e-01
2.771228e-01
3.448314e-01
2.830485e-01
2.905270e-01
2.601264e-01
2.531072e-01
3.128320e-01
3.211057e-01
3.556486e-01
0.339101
3.370409e-01
3.268644e-01
3.306549e-01
3.604447e-01
...
15
2.947107e-01
3.042364e-01
3.294827e-01
1.814791e-01
2.235953e-01
-2.220446e-16
2.341561e-01
2.228707e-01
2.851981e-01
2.926590e-01
2.603307e-01
2.532234e-01
2.888387e-01
2.993181e-01
3.303421e-01
0.315453
3.078825e-01
2.967917e-01
3.060284e-01
3.339901e-01
...
17
3.032572e-01
3.121061e-01
3.370433e-01
3.049270e-01
2.771228e-01
2.341561e-01
-2.220446e-16
1.972283e-01
1.599349e-01
2.416355e-01
3.104186e-01
3.037301e-01
3.090763e-01
3.063749e-01
3.381415e-01
0.322805
3.153730e-01
2.227568e-01
2.338126e-01
3.415028e-01
...
19
3.110717e-01
3.228705e-01
3.475454e-01
3.133821e-01
3.448314e-01
2.228707e-01
1.972283e-01
6.883383e-15
1.413875e-01
2.178575e-01
2.201157e-01
3.096253e-01
3.149087e-01
3.111879e-01
3.472774e-01
0.329902
3.279203e-01
2.420271e-01
2.475929e-01
3.519184e-01
...
20
3.024663e-01
3.144264e-01
3.394096e-01
2.748019e-01
2.830485e-01
2.851981e-01
1.599349e-01
1.413875e-01
-6.661338e-16
1.385386e-01
1.396440e-01
2.728374e-01
3.063939e-01
3.026232e-01
3.391320e-01
0.321557
3.195477e-01
2.326078e-01
2.382460e-01
3.438371e-01
...
23
3.097569e-01
3.215884e-01
3.463104e-01
2.823712e-01
2.905270e-01
2.926590e-01
2.416355e-01
2.178575e-01
1.385386e-01
6.550316e-15
1.790805e-01
1.898224e-01
3.136323e-01
3.099019e-01
3.460374e-01
0.328642
3.266539e-01
3.162435e-01
3.199678e-01
3.506917e-01
...
25
2.897714e-01
3.024852e-01
3.279261e-01
2.496271e-01
2.601264e-01
2.603307e-01
3.104186e-01
2.201157e-01
1.396440e-01
1.790805e-01
-2.220446e-16
1.946096e-01
1.979017e-01
2.914290e-01
3.274055e-01
0.310178
3.080195e-01
2.974030e-01
3.013688e-01
3.324278e-01
...
26
2.827389e-01
2.956493e-01
3.213425e-01
2.424194e-01
2.531072e-01
2.532234e-01
3.037301e-01
3.096253e-01
2.728374e-01
1.898224e-01
1.946096e-01
6.994405e-15
1.899397e-01
1.993622e-01
3.207854e-01
0.303475
3.012806e-01
2.905715e-01
2.945940e-01
3.258879e-01
...
27
2.882873e-01
3.010776e-01
3.265720e-01
2.784717e-01
3.128320e-01
2.888387e-01
3.090763e-01
3.149087e-01
3.063939e-01
3.136323e-01
1.979017e-01
1.899397e-01
-4.440892e-16
1.929308e-01
3.260281e-01
0.308827
3.066533e-01
2.960235e-01
3.000099e-01
3.310824e-01
...
29
2.436688e-01
2.995410e-01
3.250407e-01
2.890239e-01
3.211057e-01
2.993181e-01
3.063749e-01
3.111879e-01
3.026232e-01
3.099019e-01
2.914290e-01
1.993622e-01
1.929308e-01
7.216450e-15
2.850446e-01
0.178803
2.661343e-01
2.935550e-01
2.972461e-01
3.295677e-01
...
3
2.118263e-01
3.260856e-01
3.502997e-01
3.201925e-01
3.556486e-01
3.303421e-01
3.381415e-01
3.472774e-01
3.391320e-01
3.460374e-01
3.274055e-01
3.207854e-01
3.260281e-01
2.850446e-01
-2.220446e-16
0.297733
2.835825e-01
3.145230e-01
3.292384e-01
3.546967e-01
...
33
2.588877e-01
3.132299e-01
3.380943e-01
3.052609e-01
3.391014e-01
3.154531e-01
3.228050e-01
3.299024e-01
3.215575e-01
3.286420e-01
3.101779e-01
3.034752e-01
3.088267e-01
1.788034e-01
2.977327e-01
0.000000
1.542675e-01
3.048710e-01
3.138090e-01
3.425507e-01
...
35
2.443366e-01
2.993731e-01
3.247412e-01
2.975607e-01
3.370409e-01
3.078825e-01
3.153730e-01
3.279203e-01
3.195477e-01
3.266539e-01
3.080195e-01
3.012806e-01
3.066533e-01
2.661343e-01
2.835825e-01
0.154267
-8.881784e-16
2.265959e-01
3.062679e-01
3.292872e-01
...
37
2.784281e-01
2.875343e-01
3.133599e-01
2.863296e-01
3.268644e-01
2.967917e-01
2.227568e-01
2.420271e-01
2.326078e-01
3.162435e-01
2.974030e-01
2.905715e-01
2.960235e-01
2.935550e-01
3.145230e-01
0.304871
2.265959e-01
-1.332268e-15
1.576333e-01
3.179790e-01
...
38
2.938242e-01
2.409896e-01
2.471183e-01
2.748002e-01
3.306549e-01
3.060284e-01
2.338126e-01
2.475929e-01
2.382460e-01
3.199678e-01
3.013688e-01
2.945940e-01
3.000099e-01
2.972461e-01
3.292384e-01
0.313809
3.062679e-01
1.576333e-01
-2.220446e-16
1.655250e-01
...
39
3.225538e-01
2.414255e-01
2.247707e-01
2.810252e-01
3.604447e-01
3.339901e-01
3.415028e-01
3.519184e-01
3.438371e-01
3.506917e-01
3.324278e-01
3.258879e-01
3.310824e-01
3.295677e-01
3.546967e-01
0.342551
3.292872e-01
3.179790e-01
1.655250e-01
-8.881784e-16
...
4
2.803084e-01
2.814987e-01
3.072120e-01
2.802001e-01
3.223279e-01
2.910576e-01
2.996963e-01
3.139346e-01
3.053616e-01
3.126229e-01
2.925942e-01
2.855731e-01
2.911042e-01
2.910451e-01
2.091921e-01
0.301684
2.801291e-01
2.673036e-01
2.902063e-01
3.119134e-01
...
42
3.012522e-01
1.729970e-01
2.857993e-01
2.801035e-01
3.398581e-01
3.126834e-01
3.204604e-01
3.310787e-01
3.227369e-01
3.298121e-01
3.109377e-01
3.041843e-01
3.095469e-01
3.080350e-01
3.342961e-01
0.321574
3.078857e-01
2.961873e-01
2.443100e-01
1.445430e-01
...
45
3.172350e-01
2.147342e-01
3.467848e-01
2.842201e-01
2.942717e-01
2.947398e-01
3.357243e-01
3.459960e-01
2.587618e-01
2.664994e-01
2.467032e-01
2.879742e-01
3.249230e-01
3.234875e-01
3.494321e-01
0.336840
3.234579e-01
3.119985e-01
3.268258e-01
3.511971e-01
...
46
3.145396e-01
2.712795e-01
3.477218e-01
2.844930e-01
2.934859e-01
2.948722e-01
3.332270e-01
3.407488e-01
2.585818e-01
2.663180e-01
2.469971e-01
2.851921e-01
3.201058e-01
3.174624e-01
3.488273e-01
0.333676
3.263690e-01
3.154825e-01
3.243851e-01
3.521097e-01
...
47
3.209793e-01
3.220703e-01
3.463304e-01
3.208193e-01
3.605533e-01
3.310652e-01
3.392203e-01
3.526370e-01
3.445475e-01
3.513992e-01
3.324965e-01
3.258709e-01
3.310903e-01
3.310430e-01
3.507453e-01
0.341101
3.207590e-01
3.086527e-01
3.302653e-01
3.507664e-01
...
48
3.141404e-01
3.152525e-01
3.397570e-01
3.139971e-01
3.541354e-01
3.243456e-01
3.325813e-01
3.461385e-01
3.379678e-01
3.448883e-01
3.257971e-01
3.191052e-01
3.243768e-01
3.243264e-01
3.442119e-01
0.334480
3.139341e-01
3.017076e-01
3.235365e-01
3.442376e-01
...
50
3.019184e-01
3.074904e-01
3.324568e-01
3.033043e-01
3.408735e-01
3.136342e-01
3.213810e-01
3.320903e-01
3.237617e-01
3.308260e-01
3.119989e-01
3.052583e-01
3.106120e-01
3.090607e-01
3.350196e-01
0.322467
3.087981e-01
2.971388e-01
3.122976e-01
3.369642e-01
...
51
2.848808e-01
2.942080e-01
3.198044e-01
2.870989e-01
3.225540e-01
2.975207e-01
3.049387e-01
3.130124e-01
3.044604e-01
3.117225e-01
2.929146e-01
2.860601e-01
2.915408e-01
2.887064e-01
3.208227e-01
0.305356
2.977073e-01
2.864035e-01
2.957284e-01
3.243785e-01
...
53
2.911100e-01
3.003472e-01
3.257205e-01
2.932907e-01
3.284316e-01
3.036223e-01
3.109769e-01
3.189738e-01
3.104959e-01
3.176951e-01
2.990493e-01
2.922541e-01
2.976874e-01
2.948799e-01
3.267339e-01
0.311392
3.038108e-01
2.926038e-01
3.018465e-01
3.302549e-01
...
54
3.214584e-01
3.265467e-01
3.508124e-01
3.221719e-01
3.585229e-01
3.322337e-01
3.398109e-01
3.500108e-01
3.419046e-01
3.487797e-01
3.304189e-01
3.238536e-01
3.290655e-01
3.276423e-01
3.534490e-01
0.340922
3.276221e-01
3.162313e-01
3.309669e-01
3.551975e-01
...
55
2.878206e-01
2.935826e-01
3.190356e-01
2.889842e-01
3.271088e-01
2.995391e-01
3.074885e-01
3.181810e-01
3.096778e-01
3.168895e-01
2.976284e-01
2.907415e-01
2.962086e-01
2.943625e-01
3.218057e-01
0.308284
2.947046e-01
2.827545e-01
2.982112e-01
3.236354e-01
...
56
2.772562e-01
2.586105e-01
2.617082e-01
2.558168e-01
3.175852e-01
2.894987e-01
2.975390e-01
3.085105e-01
2.998872e-01
3.072011e-01
2.876891e-01
2.807077e-01
2.862513e-01
2.843305e-01
3.118497e-01
0.298316
2.845413e-01
2.724472e-01
2.651721e-01
2.666736e-01
...
57
3.101577e-01
2.919648e-01
2.949335e-01
2.893182e-01
3.483126e-01
3.214746e-01
3.291501e-01
3.396438e-01
3.314089e-01
3.383935e-01
3.197628e-01
3.130961e-01
3.183900e-01
3.168930e-01
3.427899e-01
0.330247
3.167335e-01
3.051872e-01
2.982492e-01
2.996754e-01
...
58
3.134369e-01
3.003950e-01
3.048386e-01
2.949203e-01
3.492222e-01
3.252853e-01
3.324278e-01
3.400720e-01
3.318563e-01
3.388325e-01
3.207484e-01
3.141614e-01
3.194271e-01
3.167422e-01
3.478628e-01
0.332852
3.255215e-01
3.146436e-01
3.032482e-01
3.094971e-01
...
59
2.961822e-01
3.056996e-01
3.308934e-01
2.847668e-01
3.213962e-01
2.951693e-01
3.165472e-01
3.247449e-01
3.163399e-01
2.920470e-01
2.597557e-01
2.526458e-01
2.555257e-01
2.650431e-01
3.317458e-01
0.316901
3.093451e-01
2.982795e-01
3.074983e-01
3.353912e-01
...
6
2.806641e-01
2.219660e-01
1.884269e-01
2.219817e-01
3.205559e-01
2.925457e-01
3.005431e-01
3.115132e-01
3.029276e-01
3.102097e-01
2.907905e-01
2.838405e-01
2.893597e-01
2.877875e-01
1.957452e-01
0.301680
2.875902e-01
2.755579e-01
2.683415e-01
2.698146e-01
...
61
3.027803e-01
3.152135e-01
3.401881e-01
2.929248e-01
3.265202e-01
3.030875e-01
3.229552e-01
3.285866e-01
3.202415e-01
2.981568e-01
2.674249e-01
2.604456e-01
2.634530e-01
2.710704e-01
3.396983e-01
0.322727
3.206179e-01
3.101874e-01
3.140687e-01
3.446079e-01
...
62
2.775554e-01
1.991664e-01
1.757603e-01
2.042159e-01
2.922389e-01
2.661531e-01
2.998233e-01
3.062464e-01
2.976257e-01
3.049551e-01
2.613314e-01
2.542833e-01
2.599758e-01
2.810344e-01
3.161971e-01
0.299471
2.971417e-01
2.864560e-01
2.367872e-01
2.462610e-01
...
63
2.913991e-01
1.583170e-01
1.180000e-01
2.008653e-01
3.062729e-01
2.805882e-01
3.135124e-01
3.199298e-01
3.114796e-01
3.186643e-01
2.759793e-01
2.690730e-01
2.746521e-01
2.951970e-01
3.293778e-01
0.313144
3.108303e-01
3.003738e-01
1.605825e-01
1.082029e-01
...
65
2.759945e-01
2.315545e-01
2.395437e-01
2.342572e-01
2.906744e-01
1.877966e-01
2.215862e-01
2.699336e-01
2.960786e-01
3.034242e-01
2.596985e-01
2.526347e-01
2.583399e-01
2.794522e-01
3.147139e-01
0.297939
2.956066e-01
2.848961e-01
1.955312e-01
1.634005e-01
...
67
2.914398e-01
3.046149e-01
3.299989e-01
2.396090e-01
2.157577e-01
1.348683e-01
2.377260e-01
2.850502e-01
2.831230e-01
2.906020e-01
2.472519e-01
2.401005e-01
2.737024e-01
2.944664e-01
3.292614e-01
0.312685
3.104294e-01
2.999247e-01
3.040027e-01
3.344841e-01
...
68
2.902586e-01
3.002155e-01
3.256233e-01
2.475661e-01
2.205210e-01
2.152258e-01
3.114593e-01
3.199939e-01
2.818894e-01
2.893846e-01
2.570286e-01
2.498951e-01
2.854936e-01
2.958430e-01
3.263233e-01
0.311755
3.041060e-01
2.930109e-01
3.023511e-01
3.301547e-01
...
69
3.112485e-01
3.168576e-01
3.414910e-01
2.790633e-01
2.896469e-01
2.896442e-01
3.306434e-01
3.412809e-01
3.018145e-01
3.091056e-01
2.900278e-01
2.831192e-01
3.201129e-01
3.185534e-01
3.439700e-01
0.331696
3.182143e-01
3.067306e-01
3.216865e-01
3.459368e-01
...
7
2.919950e-01
1.371887e-01
1.873225e-01
2.285825e-01
3.337693e-01
3.029136e-01
3.113836e-01
3.254960e-01
3.170681e-01
3.242069e-01
3.045392e-01
2.976397e-01
3.030765e-01
3.029662e-01
3.231611e-01
0.313305
2.921334e-01
2.795511e-01
2.777904e-01
2.734894e-01
...
71
2.861313e-01
2.923149e-01
3.178308e-01
2.880894e-01
3.265137e-01
2.986429e-01
3.065520e-01
3.175330e-01
3.090230e-01
3.162413e-01
2.970105e-01
2.901240e-01
2.955940e-01
2.936400e-01
3.204248e-01
0.307282
2.936850e-01
2.817793e-01
2.972714e-01
3.224366e-01
...
72
1.623775e-01
2.854265e-01
3.112572e-01
2.761281e-01
3.107074e-01
2.867819e-01
2.945597e-01
3.012178e-01
2.925127e-01
2.999013e-01
2.805336e-01
2.735280e-01
2.791138e-01
1.958749e-01
2.689242e-01
0.210772
2.454134e-01
2.759970e-01
2.851670e-01
3.158992e-01
...
9
3.166971e-01
1.761384e-01
2.733306e-01
2.820795e-01
3.540436e-01
3.275648e-01
3.351939e-01
3.454711e-01
3.373083e-01
3.442314e-01
3.257433e-01
3.191324e-01
3.243806e-01
3.229451e-01
3.489172e-01
0.336312
3.229184e-01
3.114492e-01
3.262882e-01
3.506819e-01
...
a
7.951601e-01
7.493878e-01
7.343165e-01
7.200629e-01
7.372548e-01
7.226772e-01
7.714293e-01
8.140500e-01
7.957642e-01
8.088685e-01
7.670403e-01
7.176405e-01
7.536676e-01
8.172790e-01
8.088614e-01
0.834680
7.920847e-01
7.573840e-01
7.506245e-01
7.376483e-01
...
about
2.651426e-01
2.789399e-01
3.052667e-01
2.118463e-01
2.249526e-01
2.230323e-01
2.877736e-01
2.942986e-01
2.569800e-01
2.647317e-01
2.198732e-01
2.124636e-01
2.472255e-01
2.686538e-01
3.044428e-01
0.287417
2.850496e-01
2.741786e-01
2.784410e-01
3.099168e-01
...
abroad
3.098468e-01
3.220197e-01
3.467405e-01
3.129326e-01
3.446871e-01
3.228473e-01
3.295524e-01
3.350125e-01
3.267468e-01
3.337730e-01
3.160487e-01
3.094815e-01
3.147645e-01
3.109503e-01
3.463152e-01
0.329349
3.272892e-01
3.169408e-01
3.207495e-01
3.511170e-01
...
abstract
4.362815e-01
4.406131e-01
4.607731e-01
4.370759e-01
4.673286e-01
4.454283e-01
4.517083e-01
4.602489e-01
4.535178e-01
4.319082e-01
4.162117e-01
4.105298e-01
4.125943e-01
4.105732e-01
4.629172e-01
0.452614
4.415687e-01
4.321239e-01
4.443656e-01
4.644149e-01
...
ache
2.928235e-01
3.049559e-01
3.302845e-01
2.952412e-01
3.275385e-01
3.054234e-01
3.123429e-01
3.176478e-01
3.091652e-01
3.163752e-01
2.981479e-01
2.914036e-01
2.968263e-01
2.930018e-01
3.300000e-01
0.312191
3.101521e-01
2.994881e-01
3.033065e-01
3.347731e-01
...
admired
2.341497e-01
2.329727e-01
2.418605e-01
2.307650e-01
2.886488e-01
2.601770e-01
2.668278e-01
2.775497e-01
2.685866e-01
2.762153e-01
2.576051e-01
2.505616e-01
2.562711e-01
2.506911e-01
2.776963e-01
0.265688
2.479321e-01
2.373585e-01
2.385637e-01
2.468964e-01
...
afar
2.914981e-01
3.036908e-01
3.290671e-01
2.940024e-01
3.263867e-01
3.042011e-01
3.111282e-01
3.164750e-01
3.079780e-01
3.152003e-01
2.969461e-01
2.901908e-01
2.956227e-01
2.917818e-01
3.287654e-01
0.310970
3.089192e-01
2.982418e-01
3.020767e-01
3.335637e-01
...
after
3.402158e-01
3.520474e-01
3.756670e-01
3.485419e-01
3.829854e-01
3.579625e-01
3.643878e-01
3.739803e-01
3.661961e-01
3.728113e-01
3.560130e-01
3.498148e-01
3.547934e-01
3.511674e-01
3.753267e-01
0.363988
3.500321e-01
3.400892e-01
3.560300e-01
3.798506e-01
...
again
4.377642e-01
4.465609e-01
4.666934e-01
4.177527e-01
4.270499e-01
4.261407e-01
4.562395e-01
4.636779e-01
4.340829e-01
4.399889e-01
4.250950e-01
4.195755e-01
4.471079e-01
4.445104e-01
4.668443e-01
0.456321
4.501855e-01
4.415570e-01
4.490670e-01
4.702722e-01
...
against
3.218557e-01
3.338860e-01
3.581771e-01
3.250391e-01
3.562899e-01
3.347766e-01
3.413550e-01
3.467790e-01
3.386598e-01
3.455616e-01
3.281591e-01
3.217093e-01
3.268984e-01
3.231335e-01
3.577284e-01
0.341143
3.391052e-01
3.289491e-01
3.327087e-01
3.624766e-01
...
age
3.161491e-01
3.976675e-01
4.193760e-01
3.805035e-01
4.145200e-01
3.896531e-01
4.096280e-01
4.188362e-01
4.115887e-01
4.177357e-01
3.888797e-01
3.829079e-01
3.876588e-01
3.982139e-01
4.216767e-01
0.409968
3.987074e-01
3.885403e-01
4.017222e-01
4.232973e-01
...
aged
3.026295e-01
3.117690e-01
3.367139e-01
3.044719e-01
3.388161e-01
3.146521e-01
3.219320e-01
3.295416e-01
3.211942e-01
3.282819e-01
3.098854e-01
3.031899e-01
3.085408e-01
3.055412e-01
3.378609e-01
0.322058
3.149718e-01
3.038933e-01
3.129391e-01
3.411761e-01
...
agree
2.659436e-01
2.759868e-01
3.022636e-01
2.691857e-01
3.058644e-01
2.798526e-01
2.873995e-01
2.960366e-01
2.872746e-01
2.947158e-01
2.754985e-01
2.684824e-01
2.740961e-01
2.710684e-01
3.031007e-01
0.287749
2.798585e-01
2.683383e-01
2.779674e-01
3.069533e-01
...
...
...
...
...
...
...
...
...
...
...
...
...
...
...
...
...
...
...
...
...
1388 rows × 1388 columns
And not only is it easier, but it is also faster. Check this out.
In [ ]:
from timeit import timeit
print('1000 iterations of sklearn takes %s seconds' %
timeit('pairwise_distances(df)',
'from __main__ import pairwise_distances, df',
number = 1000))
print('1000 iterations of our function takes %s seconds' %
timeit('CS_matrix(df)',
'''from __main__ import CS_matrix,
CS_num, CS_den_part, CS_den, cos_sim''',
number = 1000))
But how about the results? Write a function below that takes as input an LL DataFrame and compares the results of our function with that of the sklearn function, returning True if they are equal and False if they are not.
Note: $similarity = 1-difference$
In [ ]:
def CS_compare(LL_DF):
#insert your code here
#The following lines are to check your function for errors.
print(CS_compare(df))
Content source: sonofmun/ESU-2014
Similar notebooks: