The formula

$$similarity = cos(\theta) = \frac{A \cdot B}{\|A\|\|B\|} = \frac{\displaystyle \sum_{i=1}^{n} A_i \times B_i}{\displaystyle \sqrt {\displaystyle \sum_{i=1}^{n} (A_i)^2} \displaystyle \sqrt {\displaystyle \sum_{i=1}^{n} (B_i)^2}}$$

Implement the numerator

$$\sum_{i=1}^{n} A_i \times B_i$$

For every element $1 \dots n$, we should multiply the value of that element in array (Series) $A$ with the value for that same element in array $B$ and then sum all of these values.
Implement a function below that takes as input two Series $A$ and $B$ and returns the value required for the numerator of the equation.


In [ ]:
import pandas as pd

def CS_num(A,B):
    #insert your code here
    

# Run the code below to check your code.
df = pd.read_pickle('test_LL.pickle')
print(CS_num(df.ix[0], df.ix[1]))

Implement denominator

$$\sqrt {\sum_{i=1}^{n} (A_i)^2} \sqrt {\sum_{i=1}^{n} (B_i)^2}$$

Part I: $\sqrt {\sum\limits_{i=1}^{n} (A_i)^2}$

Similar to the numerator, we want to multiply each value in Series $A$ by itself, and then find the sum of all of these values.
Write a function below that will take as input a Series $A$ and will return the appropriate value for the first half of the denominator of our cosine similarity equation.


In [ ]:
def CS_den_part(A):
    #insert your code here
    
#The lines below are to check your code for errors.
print(CS_den(df.ix[0]))

Part II: $\sqrt {\sum\limits_{i=1}^{n} (B_i)^2}$

A brief look at the second square root in the denominator should demonstrate that we do can use our previous function (CS_den_part) to calculate the second part of the denominator as well as the first.

Part III: Bring it together

The last pre-calculation for our cosine similarity equation is to bring the two parts of the denominator together. Define a function below that will take two Series $A$ and $B$ and call the CS_den_part function above to do the appropriate calculations and will return the necessary value for the denominator.


In [ ]:
def CS_den(A, B):
    #insert your code here
    
#The lines below are to check your code for errors.
print(CS_den(df.ix[0], df.ix[1]))

Calculate

In order to do the final calculation of our cosine similarity score, we need to write a function that will take two Series $A$ and $B$ as input, call the appropriate functions to do the calculations for the parts of the CS equation, and return a single number that is the cosine similarity of the two Series.


In [ ]:
def cos_sim(A,B):
    #insert your code here
    
#The lines below are to check your code for errors.
print(cos_sim(df.ix[0], df.ix[1]))

Construct the matrix of answers.

Now, finally, we need to wrap everything we have done above in a function that takes a matrix of arrays (DataFrame) $DF\_LL$, performs the necessary calculations to calculate the CS of every row with every other row, and builds a new DataFrame $CS\_DF$ that has the same shape as $DF\_LL$ but contains the cosine similarity scores for the individual arrays.


In [ ]:
def CS_matrix(DF_LL):
    #insert your code here
    
#The lines below are meant to check your function for errors.
print(CS_matrix(df))

What do you notice about your matrix? Do these characteristics that you notice make sense? Why or why not?

Compare to sklearn

Our last step is simply to compare our answers to the cosine distance function in the sci-kit learn package. This function is extremely easy to implement. It is done like this:


In [6]:
1-CS


Out[6]:
1 10 12 13 14 15 17 19 20 23 25 26 27 29 3 33 35 37 38 39
1 1.000000 0.707380 0.682076 0.716077 0.680384 0.705289 0.696743 0.688928 0.697534 0.690243 0.710229 0.717261 0.711713 0.756331 0.788174 0.741112 0.755663 0.721572 0.706176 0.677446 ...
10 0.707380 1.000000 0.838920 0.765198 0.668240 0.695764 0.687894 0.677130 0.685574 0.678412 0.697515 0.704351 0.698922 0.700459 0.673914 0.686770 0.700627 0.712466 0.759010 0.758575 ...
12 0.682076 0.838920 1.000000 0.866618 0.708827 0.670517 0.662957 0.652455 0.660590 0.653690 0.672074 0.678657 0.673428 0.674959 0.649700 0.661906 0.675259 0.686640 0.752882 0.775229 ...
13 0.716077 0.765198 0.866618 1.000000 0.850439 0.818521 0.695073 0.686618 0.725198 0.717629 0.750373 0.757581 0.721528 0.710976 0.679807 0.694739 0.702439 0.713670 0.725200 0.718975 ...
14 0.680384 0.668240 0.708827 0.850439 1.000000 0.776405 0.722877 0.655169 0.716952 0.709473 0.739874 0.746893 0.687168 0.678894 0.644351 0.660899 0.662959 0.673136 0.669345 0.639555 ...
15 0.705289 0.695764 0.670517 0.818521 0.776405 1.000000 0.765844 0.777129 0.714802 0.707341 0.739669 0.746777 0.711161 0.700682 0.669658 0.684547 0.692118 0.703208 0.693972 0.666010 ...
17 0.696743 0.687894 0.662957 0.695073 0.722877 0.765844 1.000000 0.802772 0.840065 0.758365 0.689581 0.696270 0.690924 0.693625 0.661858 0.677195 0.684627 0.777243 0.766187 0.658497 ...
19 0.688928 0.677130 0.652455 0.686618 0.655169 0.777129 0.802772 1.000000 0.858612 0.782143 0.779884 0.690375 0.685091 0.688812 0.652723 0.670098 0.672080 0.757973 0.752407 0.648082 ...
20 0.697534 0.685574 0.660590 0.725198 0.716952 0.714802 0.840065 0.858612 1.000000 0.861461 0.860356 0.727163 0.693606 0.697377 0.660868 0.678443 0.680452 0.767392 0.761754 0.656163 ...
23 0.690243 0.678412 0.653690 0.717629 0.709473 0.707341 0.758365 0.782143 0.861461 1.000000 0.820920 0.810178 0.686368 0.690098 0.653963 0.671358 0.673346 0.683757 0.680032 0.649308 ...
25 0.710229 0.697515 0.672074 0.750373 0.739874 0.739669 0.689581 0.779884 0.860356 0.820920 1.000000 0.805390 0.802098 0.708571 0.672594 0.689822 0.691981 0.702597 0.698631 0.667572 ...
26 0.717261 0.704351 0.678657 0.757581 0.746893 0.746777 0.696270 0.690375 0.727163 0.810178 0.805390 1.000000 0.810060 0.800638 0.679215 0.696525 0.698719 0.709429 0.705406 0.674112 ...
27 0.711713 0.698922 0.673428 0.721528 0.687168 0.711161 0.690924 0.685091 0.693606 0.686368 0.802098 0.810060 1.000000 0.807069 0.673972 0.691173 0.693347 0.703977 0.699990 0.668918 ...
29 0.756331 0.700459 0.674959 0.710976 0.678894 0.700682 0.693625 0.688812 0.697377 0.690098 0.708571 0.800638 0.807069 1.000000 0.714955 0.821197 0.733866 0.706445 0.702754 0.670432 ...
3 0.788174 0.673914 0.649700 0.679807 0.644351 0.669658 0.661858 0.652723 0.660868 0.653963 0.672594 0.679215 0.673972 0.714955 1.000000 0.702267 0.716418 0.685477 0.670762 0.645303 ...
33 0.741112 0.686770 0.661906 0.694739 0.660899 0.684547 0.677195 0.670098 0.678443 0.671358 0.689822 0.696525 0.691173 0.821197 0.702267 1.000000 0.845733 0.695129 0.686191 0.657449 ...
35 0.755663 0.700627 0.675259 0.702439 0.662959 0.692118 0.684627 0.672080 0.680452 0.673346 0.691981 0.698719 0.693347 0.733866 0.716418 0.845733 1.000000 0.773404 0.693732 0.670713 ...
37 0.721572 0.712466 0.686640 0.713670 0.673136 0.703208 0.777243 0.757973 0.767392 0.683757 0.702597 0.709429 0.703977 0.706445 0.685477 0.695129 0.773404 1.000000 0.842367 0.682021 ...
38 0.706176 0.759010 0.752882 0.725200 0.669345 0.693972 0.766187 0.752407 0.761754 0.680032 0.698631 0.705406 0.699990 0.702754 0.670762 0.686191 0.693732 0.842367 1.000000 0.834475 ...
39 0.677446 0.758575 0.775229 0.718975 0.639555 0.666010 0.658497 0.648082 0.656163 0.649308 0.667572 0.674112 0.668918 0.670432 0.645303 0.657449 0.670713 0.682021 0.834475 1.000000 ...
4 0.719692 0.718501 0.692788 0.719800 0.677672 0.708942 0.700304 0.686065 0.694638 0.687377 0.707406 0.714427 0.708896 0.708955 0.790808 0.698316 0.719871 0.732696 0.709794 0.688087 ...
42 0.698748 0.827003 0.714201 0.719897 0.660142 0.687317 0.679540 0.668921 0.677263 0.670188 0.689062 0.695816 0.690453 0.691965 0.665704 0.678426 0.692114 0.703813 0.755690 0.855457 ...
45 0.682765 0.785266 0.653215 0.715780 0.705728 0.705260 0.664276 0.654004 0.741238 0.733501 0.753297 0.712026 0.675077 0.676512 0.650568 0.663160 0.676542 0.688002 0.673174 0.648803 ...
46 0.685460 0.728720 0.652278 0.715507 0.706514 0.705128 0.666773 0.659251 0.741418 0.733682 0.753003 0.714808 0.679894 0.682538 0.651173 0.666324 0.673631 0.684518 0.675615 0.647890 ...
47 0.679021 0.677930 0.653670 0.679181 0.639447 0.668935 0.660780 0.647363 0.655453 0.648601 0.667503 0.674129 0.668910 0.668957 0.649255 0.658899 0.679241 0.691347 0.669735 0.649234 ...
48 0.685860 0.684747 0.660243 0.686003 0.645865 0.675654 0.667419 0.653862 0.662032 0.655112 0.674203 0.680895 0.675623 0.675674 0.655788 0.665520 0.686066 0.698292 0.676463 0.655762 ...
50 0.698082 0.692510 0.667543 0.696696 0.659127 0.686366 0.678619 0.667910 0.676238 0.669174 0.688001 0.694742 0.689388 0.690939 0.664980 0.677533 0.691202 0.702861 0.687702 0.663036 ...
51 0.715119 0.705792 0.680196 0.712901 0.677446 0.702479 0.695061 0.686988 0.695540 0.688277 0.707085 0.713940 0.708459 0.711294 0.679177 0.694644 0.702293 0.713597 0.704272 0.675621 ...
53 0.708890 0.699653 0.674279 0.706709 0.671568 0.696378 0.689023 0.681026 0.689504 0.682305 0.700951 0.707746 0.702313 0.705120 0.673266 0.688608 0.696189 0.707396 0.698154 0.669745 ...
54 0.678542 0.673453 0.649188 0.677828 0.641477 0.667766 0.660189 0.649989 0.658095 0.651220 0.669581 0.676146 0.670935 0.672358 0.646551 0.659078 0.672378 0.683769 0.669033 0.644802 ...
55 0.712179 0.706417 0.680964 0.711016 0.672891 0.700461 0.692512 0.681819 0.690322 0.683110 0.702372 0.709259 0.703791 0.705637 0.678194 0.691716 0.705295 0.717245 0.701789 0.676365 ...
56 0.722744 0.741390 0.738292 0.744183 0.682415 0.710501 0.702461 0.691490 0.700113 0.692799 0.712311 0.719292 0.713749 0.715670 0.688150 0.701684 0.715459 0.727553 0.734828 0.733326 ...
57 0.689842 0.708035 0.705066 0.710682 0.651687 0.678525 0.670850 0.660356 0.668591 0.661607 0.680237 0.686904 0.681610 0.683107 0.657210 0.669753 0.683266 0.694813 0.701751 0.700325 ...
58 0.686563 0.699605 0.695161 0.705080 0.650778 0.674715 0.667572 0.659928 0.668144 0.661167 0.679252 0.685839 0.680573 0.683258 0.652137 0.667148 0.674479 0.685356 0.696752 0.690503 ...
59 0.703818 0.694300 0.669107 0.715233 0.678604 0.704831 0.683453 0.675255 0.683660 0.707953 0.740244 0.747354 0.744474 0.734957 0.668254 0.683099 0.690655 0.701720 0.692502 0.664609 ...
6 0.719336 0.778034 0.811573 0.778018 0.679444 0.707454 0.699457 0.688487 0.697072 0.689790 0.709210 0.716159 0.710640 0.712213 0.804255 0.698320 0.712410 0.724442 0.731658 0.730185 ...
61 0.697220 0.684787 0.659812 0.707075 0.673480 0.696912 0.677045 0.671413 0.679759 0.701843 0.732575 0.739554 0.736547 0.728930 0.660302 0.677273 0.679382 0.689813 0.685931 0.655392 ...
62 0.722445 0.800834 0.824240 0.795784 0.707761 0.733847 0.700177 0.693754 0.702374 0.695045 0.738669 0.745717 0.740024 0.718966 0.683803 0.700529 0.702858 0.713544 0.763213 0.753739 ...
63 0.708601 0.841683 0.882000 0.799135 0.693727 0.719412 0.686488 0.680070 0.688520 0.681336 0.724021 0.730927 0.725348 0.704803 0.670622 0.686856 0.689170 0.699626 0.839418 0.891797 ...
65 0.724006 0.768445 0.760456 0.765743 0.709326 0.812203 0.778414 0.730066 0.703921 0.696576 0.740302 0.747365 0.741660 0.720548 0.685286 0.702061 0.704393 0.715104 0.804469 0.836600 ...
67 0.708560 0.695385 0.670001 0.760391 0.784242 0.865132 0.762274 0.714950 0.716877 0.709398 0.752748 0.759900 0.726298 0.705534 0.670739 0.687315 0.689571 0.700075 0.695997 0.665516 ...
68 0.709741 0.699785 0.674377 0.752434 0.779479 0.784774 0.688541 0.680006 0.718111 0.710615 0.742971 0.750105 0.714506 0.704157 0.673677 0.688245 0.695894 0.706989 0.697649 0.669845 ...
69 0.688751 0.683142 0.658509 0.720937 0.710353 0.710356 0.669357 0.658719 0.698185 0.690894 0.709972 0.716881 0.679887 0.681447 0.656030 0.668304 0.681786 0.693269 0.678314 0.654063 ...
7 0.708005 0.862811 0.812678 0.771417 0.666231 0.697086 0.688616 0.674504 0.682932 0.675793 0.695461 0.702360 0.696923 0.697034 0.676839 0.686695 0.707867 0.720449 0.722210 0.726511 ...
71 0.713869 0.707685 0.682169 0.711911 0.673486 0.701357 0.693448 0.682467 0.690977 0.683759 0.702989 0.709876 0.704406 0.706360 0.679575 0.692718 0.706315 0.718221 0.702729 0.677563 ...
72 0.837623 0.714574 0.688743 0.723872 0.689293 0.713218 0.705440 0.698782 0.707487 0.700099 0.719466 0.726472 0.720886 0.804125 0.731076 0.789228 0.754587 0.724003 0.714833 0.684101 ...
9 0.683303 0.823862 0.726669 0.717921 0.645956 0.672435 0.664806 0.654529 0.662692 0.655769 0.674257 0.680868 0.675619 0.677055 0.651083 0.663688 0.677082 0.688551 0.673712 0.649318 ...
a 0.204840 0.250612 0.265683 0.279937 0.262745 0.277323 0.228571 0.185950 0.204236 0.191131 0.232960 0.282360 0.246332 0.182721 0.191139 0.165320 0.207915 0.242616 0.249375 0.262352 ...
about 0.734857 0.721060 0.694733 0.788154 0.775047 0.776968 0.712226 0.705701 0.743020 0.735268 0.780127 0.787536 0.752775 0.731346 0.695557 0.712583 0.714950 0.725821 0.721559 0.690083 ...
abroad 0.690153 0.677980 0.653260 0.687067 0.655313 0.677153 0.670448 0.664987 0.673253 0.666227 0.683951 0.690518 0.685236 0.689050 0.653685 0.670651 0.672711 0.683059 0.679250 0.648883 ...
abstract 0.563718 0.559387 0.539227 0.562924 0.532671 0.554572 0.548292 0.539751 0.546482 0.568092 0.583788 0.589470 0.587406 0.589427 0.537083 0.547386 0.558431 0.567876 0.555634 0.535585 ...
ache 0.707177 0.695044 0.669716 0.704759 0.672462 0.694577 0.687657 0.682352 0.690835 0.683625 0.701852 0.708596 0.703174 0.706998 0.670000 0.687809 0.689848 0.700512 0.696694 0.665227 ...
admired 0.765850 0.767027 0.758139 0.769235 0.711351 0.739823 0.733172 0.722450 0.731413 0.723785 0.742395 0.749438 0.743729 0.749309 0.722304 0.734312 0.752068 0.762642 0.761436 0.753104 ...
afar 0.708502 0.696309 0.670933 0.705998 0.673613 0.695799 0.688872 0.683525 0.692022 0.684800 0.703054 0.709809 0.704377 0.708218 0.671235 0.689030 0.691081 0.701758 0.697923 0.666436 ...
after 0.659784 0.647953 0.624333 0.651458 0.617015 0.642038 0.635612 0.626020 0.633804 0.627189 0.643987 0.650185 0.645207 0.648833 0.624673 0.636012 0.649968 0.659911 0.643970 0.620149 ...
again 0.562236 0.553439 0.533307 0.582247 0.572950 0.573859 0.543760 0.536322 0.565917 0.560011 0.574905 0.580424 0.552892 0.555490 0.533156 0.543679 0.549815 0.558443 0.550933 0.529728 ...
against 0.678144 0.666114 0.641823 0.674961 0.643710 0.665223 0.658645 0.653221 0.661340 0.654438 0.671841 0.678291 0.673102 0.676866 0.642272 0.658857 0.660895 0.671051 0.667291 0.637523 ...
age 0.683851 0.602333 0.580624 0.619497 0.585480 0.610347 0.590372 0.581164 0.588411 0.582264 0.611120 0.617092 0.612341 0.601786 0.578323 0.590032 0.601293 0.611460 0.598278 0.576703 ...
aged 0.697371 0.688231 0.663286 0.695528 0.661184 0.685348 0.678068 0.670458 0.678806 0.671718 0.690115 0.696810 0.691459 0.694459 0.662139 0.677942 0.685028 0.696107 0.687061 0.658824 ...
agree 0.734056 0.724013 0.697736 0.730814 0.694136 0.720147 0.712600 0.703963 0.712725 0.705284 0.724501 0.731518 0.725904 0.728932 0.696899 0.712251 0.720141 0.731662 0.722033 0.693047 ...
... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...

1388 rows × 1388 columns


In [5]:
from sklearn.metrics.pairwise import pairwise_distances
import pandas as pd
df = pd.read_pickle('./Data/blake-songs.txt.cooc..LL.pickle')
CS = pd.DataFrame(pairwise_distances(df, metric = 'cosine'), index = df.index, columns = df.columns)
CS


Out[5]:
1 10 12 13 14 15 17 19 20 23 25 26 27 29 3 33 35 37 38 39
1 -8.881784e-16 2.926204e-01 3.179237e-01 2.839232e-01 3.196156e-01 2.947107e-01 3.032572e-01 3.110717e-01 3.024663e-01 3.097569e-01 2.897714e-01 2.827389e-01 2.882873e-01 2.436688e-01 2.118263e-01 0.258888 2.443366e-01 2.784281e-01 2.938242e-01 3.225538e-01 ...
10 2.926204e-01 -6.661338e-16 1.610803e-01 2.348019e-01 3.317601e-01 3.042364e-01 3.121061e-01 3.228705e-01 3.144264e-01 3.215884e-01 3.024852e-01 2.956493e-01 3.010776e-01 2.995410e-01 3.260856e-01 0.313230 2.993731e-01 2.875343e-01 2.409896e-01 2.414255e-01 ...
12 3.179237e-01 1.610803e-01 -6.661338e-16 1.333816e-01 2.911730e-01 3.294827e-01 3.370433e-01 3.475454e-01 3.394096e-01 3.463104e-01 3.279261e-01 3.213425e-01 3.265720e-01 3.250407e-01 3.502997e-01 0.338094 3.247412e-01 3.133599e-01 2.471183e-01 2.247707e-01 ...
13 2.839232e-01 2.348019e-01 1.333816e-01 -2.220446e-16 1.495609e-01 1.814791e-01 3.049270e-01 3.133821e-01 2.748019e-01 2.823712e-01 2.496271e-01 2.424194e-01 2.784717e-01 2.890239e-01 3.201925e-01 0.305261 2.975607e-01 2.863296e-01 2.748002e-01 2.810252e-01 ...
14 3.196156e-01 3.317601e-01 2.911730e-01 1.495609e-01 -4.440892e-16 2.235953e-01 2.771228e-01 3.448314e-01 2.830485e-01 2.905270e-01 2.601264e-01 2.531072e-01 3.128320e-01 3.211057e-01 3.556486e-01 0.339101 3.370409e-01 3.268644e-01 3.306549e-01 3.604447e-01 ...
15 2.947107e-01 3.042364e-01 3.294827e-01 1.814791e-01 2.235953e-01 -2.220446e-16 2.341561e-01 2.228707e-01 2.851981e-01 2.926590e-01 2.603307e-01 2.532234e-01 2.888387e-01 2.993181e-01 3.303421e-01 0.315453 3.078825e-01 2.967917e-01 3.060284e-01 3.339901e-01 ...
17 3.032572e-01 3.121061e-01 3.370433e-01 3.049270e-01 2.771228e-01 2.341561e-01 -2.220446e-16 1.972283e-01 1.599349e-01 2.416355e-01 3.104186e-01 3.037301e-01 3.090763e-01 3.063749e-01 3.381415e-01 0.322805 3.153730e-01 2.227568e-01 2.338126e-01 3.415028e-01 ...
19 3.110717e-01 3.228705e-01 3.475454e-01 3.133821e-01 3.448314e-01 2.228707e-01 1.972283e-01 6.883383e-15 1.413875e-01 2.178575e-01 2.201157e-01 3.096253e-01 3.149087e-01 3.111879e-01 3.472774e-01 0.329902 3.279203e-01 2.420271e-01 2.475929e-01 3.519184e-01 ...
20 3.024663e-01 3.144264e-01 3.394096e-01 2.748019e-01 2.830485e-01 2.851981e-01 1.599349e-01 1.413875e-01 -6.661338e-16 1.385386e-01 1.396440e-01 2.728374e-01 3.063939e-01 3.026232e-01 3.391320e-01 0.321557 3.195477e-01 2.326078e-01 2.382460e-01 3.438371e-01 ...
23 3.097569e-01 3.215884e-01 3.463104e-01 2.823712e-01 2.905270e-01 2.926590e-01 2.416355e-01 2.178575e-01 1.385386e-01 6.550316e-15 1.790805e-01 1.898224e-01 3.136323e-01 3.099019e-01 3.460374e-01 0.328642 3.266539e-01 3.162435e-01 3.199678e-01 3.506917e-01 ...
25 2.897714e-01 3.024852e-01 3.279261e-01 2.496271e-01 2.601264e-01 2.603307e-01 3.104186e-01 2.201157e-01 1.396440e-01 1.790805e-01 -2.220446e-16 1.946096e-01 1.979017e-01 2.914290e-01 3.274055e-01 0.310178 3.080195e-01 2.974030e-01 3.013688e-01 3.324278e-01 ...
26 2.827389e-01 2.956493e-01 3.213425e-01 2.424194e-01 2.531072e-01 2.532234e-01 3.037301e-01 3.096253e-01 2.728374e-01 1.898224e-01 1.946096e-01 6.994405e-15 1.899397e-01 1.993622e-01 3.207854e-01 0.303475 3.012806e-01 2.905715e-01 2.945940e-01 3.258879e-01 ...
27 2.882873e-01 3.010776e-01 3.265720e-01 2.784717e-01 3.128320e-01 2.888387e-01 3.090763e-01 3.149087e-01 3.063939e-01 3.136323e-01 1.979017e-01 1.899397e-01 -4.440892e-16 1.929308e-01 3.260281e-01 0.308827 3.066533e-01 2.960235e-01 3.000099e-01 3.310824e-01 ...
29 2.436688e-01 2.995410e-01 3.250407e-01 2.890239e-01 3.211057e-01 2.993181e-01 3.063749e-01 3.111879e-01 3.026232e-01 3.099019e-01 2.914290e-01 1.993622e-01 1.929308e-01 7.216450e-15 2.850446e-01 0.178803 2.661343e-01 2.935550e-01 2.972461e-01 3.295677e-01 ...
3 2.118263e-01 3.260856e-01 3.502997e-01 3.201925e-01 3.556486e-01 3.303421e-01 3.381415e-01 3.472774e-01 3.391320e-01 3.460374e-01 3.274055e-01 3.207854e-01 3.260281e-01 2.850446e-01 -2.220446e-16 0.297733 2.835825e-01 3.145230e-01 3.292384e-01 3.546967e-01 ...
33 2.588877e-01 3.132299e-01 3.380943e-01 3.052609e-01 3.391014e-01 3.154531e-01 3.228050e-01 3.299024e-01 3.215575e-01 3.286420e-01 3.101779e-01 3.034752e-01 3.088267e-01 1.788034e-01 2.977327e-01 0.000000 1.542675e-01 3.048710e-01 3.138090e-01 3.425507e-01 ...
35 2.443366e-01 2.993731e-01 3.247412e-01 2.975607e-01 3.370409e-01 3.078825e-01 3.153730e-01 3.279203e-01 3.195477e-01 3.266539e-01 3.080195e-01 3.012806e-01 3.066533e-01 2.661343e-01 2.835825e-01 0.154267 -8.881784e-16 2.265959e-01 3.062679e-01 3.292872e-01 ...
37 2.784281e-01 2.875343e-01 3.133599e-01 2.863296e-01 3.268644e-01 2.967917e-01 2.227568e-01 2.420271e-01 2.326078e-01 3.162435e-01 2.974030e-01 2.905715e-01 2.960235e-01 2.935550e-01 3.145230e-01 0.304871 2.265959e-01 -1.332268e-15 1.576333e-01 3.179790e-01 ...
38 2.938242e-01 2.409896e-01 2.471183e-01 2.748002e-01 3.306549e-01 3.060284e-01 2.338126e-01 2.475929e-01 2.382460e-01 3.199678e-01 3.013688e-01 2.945940e-01 3.000099e-01 2.972461e-01 3.292384e-01 0.313809 3.062679e-01 1.576333e-01 -2.220446e-16 1.655250e-01 ...
39 3.225538e-01 2.414255e-01 2.247707e-01 2.810252e-01 3.604447e-01 3.339901e-01 3.415028e-01 3.519184e-01 3.438371e-01 3.506917e-01 3.324278e-01 3.258879e-01 3.310824e-01 3.295677e-01 3.546967e-01 0.342551 3.292872e-01 3.179790e-01 1.655250e-01 -8.881784e-16 ...
4 2.803084e-01 2.814987e-01 3.072120e-01 2.802001e-01 3.223279e-01 2.910576e-01 2.996963e-01 3.139346e-01 3.053616e-01 3.126229e-01 2.925942e-01 2.855731e-01 2.911042e-01 2.910451e-01 2.091921e-01 0.301684 2.801291e-01 2.673036e-01 2.902063e-01 3.119134e-01 ...
42 3.012522e-01 1.729970e-01 2.857993e-01 2.801035e-01 3.398581e-01 3.126834e-01 3.204604e-01 3.310787e-01 3.227369e-01 3.298121e-01 3.109377e-01 3.041843e-01 3.095469e-01 3.080350e-01 3.342961e-01 0.321574 3.078857e-01 2.961873e-01 2.443100e-01 1.445430e-01 ...
45 3.172350e-01 2.147342e-01 3.467848e-01 2.842201e-01 2.942717e-01 2.947398e-01 3.357243e-01 3.459960e-01 2.587618e-01 2.664994e-01 2.467032e-01 2.879742e-01 3.249230e-01 3.234875e-01 3.494321e-01 0.336840 3.234579e-01 3.119985e-01 3.268258e-01 3.511971e-01 ...
46 3.145396e-01 2.712795e-01 3.477218e-01 2.844930e-01 2.934859e-01 2.948722e-01 3.332270e-01 3.407488e-01 2.585818e-01 2.663180e-01 2.469971e-01 2.851921e-01 3.201058e-01 3.174624e-01 3.488273e-01 0.333676 3.263690e-01 3.154825e-01 3.243851e-01 3.521097e-01 ...
47 3.209793e-01 3.220703e-01 3.463304e-01 3.208193e-01 3.605533e-01 3.310652e-01 3.392203e-01 3.526370e-01 3.445475e-01 3.513992e-01 3.324965e-01 3.258709e-01 3.310903e-01 3.310430e-01 3.507453e-01 0.341101 3.207590e-01 3.086527e-01 3.302653e-01 3.507664e-01 ...
48 3.141404e-01 3.152525e-01 3.397570e-01 3.139971e-01 3.541354e-01 3.243456e-01 3.325813e-01 3.461385e-01 3.379678e-01 3.448883e-01 3.257971e-01 3.191052e-01 3.243768e-01 3.243264e-01 3.442119e-01 0.334480 3.139341e-01 3.017076e-01 3.235365e-01 3.442376e-01 ...
50 3.019184e-01 3.074904e-01 3.324568e-01 3.033043e-01 3.408735e-01 3.136342e-01 3.213810e-01 3.320903e-01 3.237617e-01 3.308260e-01 3.119989e-01 3.052583e-01 3.106120e-01 3.090607e-01 3.350196e-01 0.322467 3.087981e-01 2.971388e-01 3.122976e-01 3.369642e-01 ...
51 2.848808e-01 2.942080e-01 3.198044e-01 2.870989e-01 3.225540e-01 2.975207e-01 3.049387e-01 3.130124e-01 3.044604e-01 3.117225e-01 2.929146e-01 2.860601e-01 2.915408e-01 2.887064e-01 3.208227e-01 0.305356 2.977073e-01 2.864035e-01 2.957284e-01 3.243785e-01 ...
53 2.911100e-01 3.003472e-01 3.257205e-01 2.932907e-01 3.284316e-01 3.036223e-01 3.109769e-01 3.189738e-01 3.104959e-01 3.176951e-01 2.990493e-01 2.922541e-01 2.976874e-01 2.948799e-01 3.267339e-01 0.311392 3.038108e-01 2.926038e-01 3.018465e-01 3.302549e-01 ...
54 3.214584e-01 3.265467e-01 3.508124e-01 3.221719e-01 3.585229e-01 3.322337e-01 3.398109e-01 3.500108e-01 3.419046e-01 3.487797e-01 3.304189e-01 3.238536e-01 3.290655e-01 3.276423e-01 3.534490e-01 0.340922 3.276221e-01 3.162313e-01 3.309669e-01 3.551975e-01 ...
55 2.878206e-01 2.935826e-01 3.190356e-01 2.889842e-01 3.271088e-01 2.995391e-01 3.074885e-01 3.181810e-01 3.096778e-01 3.168895e-01 2.976284e-01 2.907415e-01 2.962086e-01 2.943625e-01 3.218057e-01 0.308284 2.947046e-01 2.827545e-01 2.982112e-01 3.236354e-01 ...
56 2.772562e-01 2.586105e-01 2.617082e-01 2.558168e-01 3.175852e-01 2.894987e-01 2.975390e-01 3.085105e-01 2.998872e-01 3.072011e-01 2.876891e-01 2.807077e-01 2.862513e-01 2.843305e-01 3.118497e-01 0.298316 2.845413e-01 2.724472e-01 2.651721e-01 2.666736e-01 ...
57 3.101577e-01 2.919648e-01 2.949335e-01 2.893182e-01 3.483126e-01 3.214746e-01 3.291501e-01 3.396438e-01 3.314089e-01 3.383935e-01 3.197628e-01 3.130961e-01 3.183900e-01 3.168930e-01 3.427899e-01 0.330247 3.167335e-01 3.051872e-01 2.982492e-01 2.996754e-01 ...
58 3.134369e-01 3.003950e-01 3.048386e-01 2.949203e-01 3.492222e-01 3.252853e-01 3.324278e-01 3.400720e-01 3.318563e-01 3.388325e-01 3.207484e-01 3.141614e-01 3.194271e-01 3.167422e-01 3.478628e-01 0.332852 3.255215e-01 3.146436e-01 3.032482e-01 3.094971e-01 ...
59 2.961822e-01 3.056996e-01 3.308934e-01 2.847668e-01 3.213962e-01 2.951693e-01 3.165472e-01 3.247449e-01 3.163399e-01 2.920470e-01 2.597557e-01 2.526458e-01 2.555257e-01 2.650431e-01 3.317458e-01 0.316901 3.093451e-01 2.982795e-01 3.074983e-01 3.353912e-01 ...
6 2.806641e-01 2.219660e-01 1.884269e-01 2.219817e-01 3.205559e-01 2.925457e-01 3.005431e-01 3.115132e-01 3.029276e-01 3.102097e-01 2.907905e-01 2.838405e-01 2.893597e-01 2.877875e-01 1.957452e-01 0.301680 2.875902e-01 2.755579e-01 2.683415e-01 2.698146e-01 ...
61 3.027803e-01 3.152135e-01 3.401881e-01 2.929248e-01 3.265202e-01 3.030875e-01 3.229552e-01 3.285866e-01 3.202415e-01 2.981568e-01 2.674249e-01 2.604456e-01 2.634530e-01 2.710704e-01 3.396983e-01 0.322727 3.206179e-01 3.101874e-01 3.140687e-01 3.446079e-01 ...
62 2.775554e-01 1.991664e-01 1.757603e-01 2.042159e-01 2.922389e-01 2.661531e-01 2.998233e-01 3.062464e-01 2.976257e-01 3.049551e-01 2.613314e-01 2.542833e-01 2.599758e-01 2.810344e-01 3.161971e-01 0.299471 2.971417e-01 2.864560e-01 2.367872e-01 2.462610e-01 ...
63 2.913991e-01 1.583170e-01 1.180000e-01 2.008653e-01 3.062729e-01 2.805882e-01 3.135124e-01 3.199298e-01 3.114796e-01 3.186643e-01 2.759793e-01 2.690730e-01 2.746521e-01 2.951970e-01 3.293778e-01 0.313144 3.108303e-01 3.003738e-01 1.605825e-01 1.082029e-01 ...
65 2.759945e-01 2.315545e-01 2.395437e-01 2.342572e-01 2.906744e-01 1.877966e-01 2.215862e-01 2.699336e-01 2.960786e-01 3.034242e-01 2.596985e-01 2.526347e-01 2.583399e-01 2.794522e-01 3.147139e-01 0.297939 2.956066e-01 2.848961e-01 1.955312e-01 1.634005e-01 ...
67 2.914398e-01 3.046149e-01 3.299989e-01 2.396090e-01 2.157577e-01 1.348683e-01 2.377260e-01 2.850502e-01 2.831230e-01 2.906020e-01 2.472519e-01 2.401005e-01 2.737024e-01 2.944664e-01 3.292614e-01 0.312685 3.104294e-01 2.999247e-01 3.040027e-01 3.344841e-01 ...
68 2.902586e-01 3.002155e-01 3.256233e-01 2.475661e-01 2.205210e-01 2.152258e-01 3.114593e-01 3.199939e-01 2.818894e-01 2.893846e-01 2.570286e-01 2.498951e-01 2.854936e-01 2.958430e-01 3.263233e-01 0.311755 3.041060e-01 2.930109e-01 3.023511e-01 3.301547e-01 ...
69 3.112485e-01 3.168576e-01 3.414910e-01 2.790633e-01 2.896469e-01 2.896442e-01 3.306434e-01 3.412809e-01 3.018145e-01 3.091056e-01 2.900278e-01 2.831192e-01 3.201129e-01 3.185534e-01 3.439700e-01 0.331696 3.182143e-01 3.067306e-01 3.216865e-01 3.459368e-01 ...
7 2.919950e-01 1.371887e-01 1.873225e-01 2.285825e-01 3.337693e-01 3.029136e-01 3.113836e-01 3.254960e-01 3.170681e-01 3.242069e-01 3.045392e-01 2.976397e-01 3.030765e-01 3.029662e-01 3.231611e-01 0.313305 2.921334e-01 2.795511e-01 2.777904e-01 2.734894e-01 ...
71 2.861313e-01 2.923149e-01 3.178308e-01 2.880894e-01 3.265137e-01 2.986429e-01 3.065520e-01 3.175330e-01 3.090230e-01 3.162413e-01 2.970105e-01 2.901240e-01 2.955940e-01 2.936400e-01 3.204248e-01 0.307282 2.936850e-01 2.817793e-01 2.972714e-01 3.224366e-01 ...
72 1.623775e-01 2.854265e-01 3.112572e-01 2.761281e-01 3.107074e-01 2.867819e-01 2.945597e-01 3.012178e-01 2.925127e-01 2.999013e-01 2.805336e-01 2.735280e-01 2.791138e-01 1.958749e-01 2.689242e-01 0.210772 2.454134e-01 2.759970e-01 2.851670e-01 3.158992e-01 ...
9 3.166971e-01 1.761384e-01 2.733306e-01 2.820795e-01 3.540436e-01 3.275648e-01 3.351939e-01 3.454711e-01 3.373083e-01 3.442314e-01 3.257433e-01 3.191324e-01 3.243806e-01 3.229451e-01 3.489172e-01 0.336312 3.229184e-01 3.114492e-01 3.262882e-01 3.506819e-01 ...
a 7.951601e-01 7.493878e-01 7.343165e-01 7.200629e-01 7.372548e-01 7.226772e-01 7.714293e-01 8.140500e-01 7.957642e-01 8.088685e-01 7.670403e-01 7.176405e-01 7.536676e-01 8.172790e-01 8.088614e-01 0.834680 7.920847e-01 7.573840e-01 7.506245e-01 7.376483e-01 ...
about 2.651426e-01 2.789399e-01 3.052667e-01 2.118463e-01 2.249526e-01 2.230323e-01 2.877736e-01 2.942986e-01 2.569800e-01 2.647317e-01 2.198732e-01 2.124636e-01 2.472255e-01 2.686538e-01 3.044428e-01 0.287417 2.850496e-01 2.741786e-01 2.784410e-01 3.099168e-01 ...
abroad 3.098468e-01 3.220197e-01 3.467405e-01 3.129326e-01 3.446871e-01 3.228473e-01 3.295524e-01 3.350125e-01 3.267468e-01 3.337730e-01 3.160487e-01 3.094815e-01 3.147645e-01 3.109503e-01 3.463152e-01 0.329349 3.272892e-01 3.169408e-01 3.207495e-01 3.511170e-01 ...
abstract 4.362815e-01 4.406131e-01 4.607731e-01 4.370759e-01 4.673286e-01 4.454283e-01 4.517083e-01 4.602489e-01 4.535178e-01 4.319082e-01 4.162117e-01 4.105298e-01 4.125943e-01 4.105732e-01 4.629172e-01 0.452614 4.415687e-01 4.321239e-01 4.443656e-01 4.644149e-01 ...
ache 2.928235e-01 3.049559e-01 3.302845e-01 2.952412e-01 3.275385e-01 3.054234e-01 3.123429e-01 3.176478e-01 3.091652e-01 3.163752e-01 2.981479e-01 2.914036e-01 2.968263e-01 2.930018e-01 3.300000e-01 0.312191 3.101521e-01 2.994881e-01 3.033065e-01 3.347731e-01 ...
admired 2.341497e-01 2.329727e-01 2.418605e-01 2.307650e-01 2.886488e-01 2.601770e-01 2.668278e-01 2.775497e-01 2.685866e-01 2.762153e-01 2.576051e-01 2.505616e-01 2.562711e-01 2.506911e-01 2.776963e-01 0.265688 2.479321e-01 2.373585e-01 2.385637e-01 2.468964e-01 ...
afar 2.914981e-01 3.036908e-01 3.290671e-01 2.940024e-01 3.263867e-01 3.042011e-01 3.111282e-01 3.164750e-01 3.079780e-01 3.152003e-01 2.969461e-01 2.901908e-01 2.956227e-01 2.917818e-01 3.287654e-01 0.310970 3.089192e-01 2.982418e-01 3.020767e-01 3.335637e-01 ...
after 3.402158e-01 3.520474e-01 3.756670e-01 3.485419e-01 3.829854e-01 3.579625e-01 3.643878e-01 3.739803e-01 3.661961e-01 3.728113e-01 3.560130e-01 3.498148e-01 3.547934e-01 3.511674e-01 3.753267e-01 0.363988 3.500321e-01 3.400892e-01 3.560300e-01 3.798506e-01 ...
again 4.377642e-01 4.465609e-01 4.666934e-01 4.177527e-01 4.270499e-01 4.261407e-01 4.562395e-01 4.636779e-01 4.340829e-01 4.399889e-01 4.250950e-01 4.195755e-01 4.471079e-01 4.445104e-01 4.668443e-01 0.456321 4.501855e-01 4.415570e-01 4.490670e-01 4.702722e-01 ...
against 3.218557e-01 3.338860e-01 3.581771e-01 3.250391e-01 3.562899e-01 3.347766e-01 3.413550e-01 3.467790e-01 3.386598e-01 3.455616e-01 3.281591e-01 3.217093e-01 3.268984e-01 3.231335e-01 3.577284e-01 0.341143 3.391052e-01 3.289491e-01 3.327087e-01 3.624766e-01 ...
age 3.161491e-01 3.976675e-01 4.193760e-01 3.805035e-01 4.145200e-01 3.896531e-01 4.096280e-01 4.188362e-01 4.115887e-01 4.177357e-01 3.888797e-01 3.829079e-01 3.876588e-01 3.982139e-01 4.216767e-01 0.409968 3.987074e-01 3.885403e-01 4.017222e-01 4.232973e-01 ...
aged 3.026295e-01 3.117690e-01 3.367139e-01 3.044719e-01 3.388161e-01 3.146521e-01 3.219320e-01 3.295416e-01 3.211942e-01 3.282819e-01 3.098854e-01 3.031899e-01 3.085408e-01 3.055412e-01 3.378609e-01 0.322058 3.149718e-01 3.038933e-01 3.129391e-01 3.411761e-01 ...
agree 2.659436e-01 2.759868e-01 3.022636e-01 2.691857e-01 3.058644e-01 2.798526e-01 2.873995e-01 2.960366e-01 2.872746e-01 2.947158e-01 2.754985e-01 2.684824e-01 2.740961e-01 2.710684e-01 3.031007e-01 0.287749 2.798585e-01 2.683383e-01 2.779674e-01 3.069533e-01 ...
... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...

1388 rows × 1388 columns

And not only is it easier, but it is also faster. Check this out.


In [ ]:
from timeit import timeit

print('1000 iterations of sklearn takes %s seconds' % 
      timeit('pairwise_distances(df)', 
             'from __main__ import pairwise_distances, df',
             number = 1000))
print('1000 iterations of our function takes %s seconds' % 
      timeit('CS_matrix(df)', 
             '''from __main__ import CS_matrix, 
             CS_num, CS_den_part, CS_den, cos_sim''',
             number = 1000))

But how about the results? Write a function below that takes as input an LL DataFrame and compares the results of our function with that of the sklearn function, returning True if they are equal and False if they are not.

Note: $similarity = 1-difference$


In [ ]:
def CS_compare(LL_DF):
    #insert your code here
    
#The following lines are to check your function for errors.
print(CS_compare(df))