In [2]:
import pandas as pd
import graphlab as gl

In [5]:
sf = gl.SFrame('../data/jokes.dat', format = 'tsv')


Finished parsing file /Users/apoorvc/recommender_caseStudy/data/jokes.dat
Parsing completed. Parsed 100 lines in 0.044875 secs.
------------------------------------------------------
Inferred types from first 100 line(s) of file as 
column_type_hints=[str]
If parsing fails due to incorrect types, you can correct
the inferred type list above and pass it to read_csv in
the column_type_hints argument
------------------------------------------------------
Finished parsing file /Users/apoorvc/recommender_caseStudy/data/jokes.dat
Parsing completed. Parsed 1276 lines in 0.02049 secs.

In [8]:
sf[3]


Out[8]:
{'1:': 'The man replies, "Well, thank God I don't have cancer!"'}

In [19]:
with open('../data/jokes.dat','r') as f:
    df = pd.DataFrame(i for i in f)

print(df)


                                                      0
0                                                  1:\n
1                                               <p>\r\n
2     A man visits the doctor. The doctor says, &quo...
3                                            <br />\r\n
4     The man replies, &quot;Well, thank God I don&#...
5                                              </p>\r\n
6                                                    \n
7                                                  2:\n
8                                               <p>\r\n
9     This couple had an excellent relationship goin...
10                                           <br />\r\n
11    &quot;What could they possibly have said to ma...
12                                           <br />\r\n
13    &quot;They told me that you were a pedophile.&...
14                                           <br />\r\n
15    He replied, &quot;That&#039;s an awfully big w...
16                                             </p>\r\n
17                                                   \n
18                                                 3:\n
19                                              <p>\r\n
20    Q. What&#039;s 200 feet long and has 4 teeth?<...
21                                           <br />\r\n
22     A. The front row at a Willie Nelson concert.\r\n
23                                             </p>\r\n
24                                                   \n
25                                                 4:\n
26                                              <p>\r\n
27    Q. What&#039;s the difference between a man an...
28                                           <br />\r\n
29    A. A toilet doesn&#039;t follow you around aft...
...                                                 ...
1397                                         <br />\r\n
1398  The teacher answered quickly, &quot;That would...
1399                                         <br />\r\n
1400  St. Peter turned to the garbage man and, figur...
1401                                         <br />\r\n
1402  Fortunately for him, the trash man had just se...
1403                                         <br />\r\n
1404  &quot;That&#039;s right! You may enter.&quot;<...
1405                                         <br />\r\n
1406  St. Peter turned to the lawyer: &quot;Name the...
1407                                           </p>\r\n
1408                                                 \n
1409                                             149:\n
1410                                            <p>\r\n
1411  A little girl asked her father, &quot;Daddy? D...
1412                                         <br />\r\n
1413  He replied, &quot;No, there is a whole series ...
1414                                           </p>\r\n
1415                                                 \n
1416                                             150:\n
1417                                            <p>\r\n
1418  In an interview with David Letterman, Carter p...
1419                                         <br />\r\n
1420  He told the joke, then waited for the translat...
1421                                         <br />\r\n
1422  After the speech, Carter wanted to meet the tr...
1423                                         <br />\r\n
1424  When Carter asked how the joke had been told i...
1425                                           </p>\r\n
1426                                                 \n

[1427 rows x 1 columns]

In [22]:
df.iloc(4)


Out[22]:
<pandas.core.indexing._iLocIndexer at 0x11b30ac90>

In [23]:
numpymatrx = df.as_matrix()

In [24]:
numpymatrx


Out[24]:
array([['1:\n'],
       ['<p>\r\n'],
       [ 'A man visits the doctor. The doctor says, &quot;I have bad news for you. You have cancer and Alzheimer&#039;s disease&quot;.<br />\r\n'],
       ..., 
       [ 'When Carter asked how the joke had been told in Japanese, the translator responded, &quot;I told them, &#039;President Carter has told a very funny joke. Please laugh now.&#039;&quot;\r\n'],
       ['</p>\r\n'],
       ['\n']], dtype=object)

In [26]:
numpymatrx[0:30]


Out[26]:
array([['1:\n'],
       ['<p>\r\n'],
       [ 'A man visits the doctor. The doctor says, &quot;I have bad news for you. You have cancer and Alzheimer&#039;s disease&quot;.<br />\r\n'],
       ['<br />\r\n'],
       [ 'The man replies, &quot;Well, thank God I don&#039;t have cancer!&quot;\r\n'],
       ['</p>\r\n'],
       ['\n'],
       ['2:\n'],
       ['<p>\r\n'],
       [ 'This couple had an excellent relationship going until one day he came home from work to find his girlfriend packing. He asked her why she was leaving him and she told him that she had heard awful things about him. <br />\r\n'],
       ['<br />\r\n'],
       [ '&quot;What could they possibly have said to make you move out?&quot;<br />\r\n'],
       ['<br />\r\n'],
       ['&quot;They told me that you were a pedophile.&quot;<br />\r\n'],
       ['<br />\r\n'],
       [ 'He replied, &quot;That&#039;s an awfully big word for a ten year old.&quot;\r\n'],
       ['</p>\r\n'],
       ['\n'],
       ['3:\n'],
       ['<p>\r\n'],
       ['Q. What&#039;s 200 feet long and has 4 teeth?<br />\r\n'],
       ['<br />\r\n'],
       ['A. The front row at a Willie Nelson concert.\r\n'],
       ['</p>\r\n'],
       ['\n'],
       ['4:\n'],
       ['<p>\r\n'],
       ['Q. What&#039;s the difference between a man and a toilet?<br />\r\n'],
       ['<br />\r\n'],
       ['A. A toilet doesn&#039;t follow you around after you use it.\r\n']], dtype=object)

In [27]:
df_n = pd.read_table('../data/jokes.dat')

In [28]:
df_n


Out[28]:
1:
0 <p>
1 A man visits the doctor. The doctor says, &quo...
2 <br />
3 The man replies, &quot;Well, thank God I don&#...
4 </p>
5 2:
6 <p>
7 This couple had an excellent relationship goin...
8 <br />
9 &quot;What could they possibly have said to ma...
10 <br />
11 &quot;They told me that you were a pedophile.&...
12 <br />
13 He replied, &quot;That&#039;s an awfully big w...
14 </p>
15 3:
16 <p>
17 Q. What&#039;s 200 feet long and has 4 teeth?<...
18 <br />
19 A. The front row at a Willie Nelson concert.
20 </p>
21 4:
22 <p>
23 Q. What&#039;s the difference between a man an...
24 <br />
25 A. A toilet doesn&#039;t follow you around aft...
26 </p>
27 5:
28 <p>
29 Q. What&#039;s O. J. Simpson&#039;s web addres...
... ...
1246 Recently a teacher, a garbage collector, and a...
1247 <br />
1248 St. Peter addressed the teacher and asked, &qu...
1249 <br />
1250 The teacher answered quickly, &quot;That would...
1251 <br />
1252 St. Peter turned to the garbage man and, figur...
1253 <br />
1254 Fortunately for him, the trash man had just se...
1255 <br />
1256 &quot;That&#039;s right! You may enter.&quot;<...
1257 <br />
1258 St. Peter turned to the lawyer: &quot;Name the...
1259 </p>
1260 149:
1261 <p>
1262 A little girl asked her father, &quot;Daddy? D...
1263 <br />
1264 He replied, &quot;No, there is a whole series ...
1265 </p>
1266 150:
1267 <p>
1268 In an interview with David Letterman, Carter p...
1269 <br />
1270 He told the joke, then waited for the translat...
1271 <br />
1272 After the speech, Carter wanted to meet the tr...
1273 <br />
1274 When Carter asked how the joke had been told i...
1275 </p>

1276 rows × 1 columns


In [ ]: