notebook.community

Edit and run



In [1]:

    
import pandas as pd



In [3]:

    
file = r'.\data\reddit\ethereum\slim_sorted_comments.csv'
reader = pd.read_csv(file, chunksize=1000, header=0, index_col='commentId')
for df in reader:
    # grab the first chunk and leave ...
    break

reader.close()



In [ ]:

    
## Looking at the comment data



In [5]:

    
df.head(n = 10)









    Out[5]:







  
    
      
      author
      body
      created_utc
      postId
      score
    
    
      commentId
      
      
      
      
      
    
  
  
    
      t1_ceht9kr
      qaezel
      The links are out!
      1.388923e+09
      t3_1ucwto
      1
    
    
      t1_cei0u4u
      vbenes
      Mind: blown!  Contracts (their Turing complete...
      1.388950e+09
      t3_1ucwto
      3
    
    
      t1_celpxzb
      salwilliam
      "is decentralized Bitcoin-Ethereum exchange po...
      1.389313e+09
      t3_1ucwto
      1
    
    
      t1_celr9kl
      vbuterin
      Almost. It's pointless to have an Ethereum con...
      1.389317e+09
      t3_1ucwto
      8
    
    
      t1_cembxgf
      salwilliam
      http://4.bp.blogspot.com/-dFGIcV-hH5w/Uoz0JKRL...
      1.389381e+09
      t3_1ucwto
      -4
    
    
      t1_cemsiby
      chem_deth
      Your site is down, buddy. Get some Cloudflare ...
      1.389423e+09
      t3_1ucwto
      3
    
    
      t1_cemt8dp
      coiv
      I don't think many people know what "Turing co...
      1.389426e+09
      t3_1ucwto
      24
    
    
      t1_cemtctw
      free593
      This looks really good. I like how the concept...
      1.389427e+09
      t3_1ucwto
      6
    
    
      t1_cemxn5d
      needsTimeMachine
      The ethereum site is down, so do you happen to...
      1.389453e+09
      t3_1ucwto
      4
    
    
      t1_cemzmit
      standardcrypto
      Already slashdotted.  This is going to be huge :)
      1.389459e+09
      t3_1ucwto
      4



In [ ]:

    
## finding deleted authors



In [8]:

    
df[df['author'] == '[deleted]'].head(n = 10)









    Out[8]:







  
    
      
      author
      body
      created_utc
      postId
      score
    
    
      commentId
      
      
      
      
      
    
  
  
    
      t1_cewocxg
      [deleted]
      Hey Vik!   I'd love to get involved with your ...
      1.390460e+09
      t3_1ucwto
      1
    
    
      t1_cexos6j
      [deleted]
      [deleted]
      1.390571e+09
      t3_1ucwto
      3
    
    
      t1_cezwnv3
      [deleted]
      [deleted]
      1.390806e+09
      t3_1ucwto
      1
    
    
      t1_cfa3sgi
      [deleted]
      [deleted]
      1.391855e+09
      t3_1ucwto
      6
    
    
      t1_cenl9ca
      [deleted]
      The problem with CPU mining is that it is only...
      1.389519e+09
      t3_1uzumc
      1
    
    
      t1_cenqlt6
      [deleted]
      A 5-20x speedup is still dangerous. With that ...
      1.389547e+09
      t3_1uzumc
      1
    
    
      t1_ceowmzu
      [deleted]
      [deleted]
      1.389662e+09
      t3_1v5i4l
      2
    
    
      t1_cepztgx
      [deleted]
      What conditions should be met to switch to tru...
      1.389770e+09
      t3_1v78zi
      1
    
    
      t1_ceq456q
      [deleted]
      > There is currently a lot of reasearch/discus...
      1.389794e+09
      t3_1v78zi
      1
    
    
      t1_ceq5pnu
      [deleted]
      Seems so.
      1.389799e+09
      t3_1v78zi
      0



In [ ]:

    
## grouping authors by score (high/low)



In [18]:

    
df.groupby('author')['score'].sum().to_frame().reset_index().sort_values('score', ascending=False).head(n = 10)









    Out[18]:







  
    
      
      author
      score
    
  
  
    
      216
      vbuterin
      191
    
    
      59
      [deleted]
      93
    
    
      56
      Ursium
      92
    
    
      167
      nucleo_io
      78
    
    
      147
      malefizer
      73
    
    
      180
      que23
      72
    
    
      110
      ericcart
      58
    
    
      22
      ItsAConspiracy
      44
    
    
      229
      zerox102
      41
    
    
      101
      ddink7
      39



In [21]:

    
df.groupby('author')['score'].sum().to_frame().reset_index().sort_values('score', ascending=True).head(n = 10)









    Out[21]:







  
    
      
      author
      score
    
  
  
    
      188
      salwilliam
      -3
    
    
      66
      ambrozy007
      -1
    
    
      32
      MaxK
      -1
    
    
      70
      antanst
      0
    
    
      201
      stop_runs
      0
    
    
      91
      coin-table
      0
    
    
      13
      Dartanan
      0
    
    
      211
      twisthype
      0
    
    
      52
      SyncoBeat
      1
    
    
      51
      Symphonic_Rainboom
      1



In [ ]:

    
## grouping authors by # of comments



In [19]:

    
df.groupby('author')['postId'].count().to_frame().reset_index().sort_values('postId', ascending=False).head(n = 10)









    Out[19]:







  
    
      
      author
      postId
    
  
  
    
      216
      vbuterin
      57
    
    
      56
      Ursium
      53
    
    
      59
      [deleted]
      48
    
    
      180
      que23
      47
    
    
      167
      nucleo_io
      45
    
    
      110
      ericcart
      37
    
    
      147
      malefizer
      34
    
    
      229
      zerox102
      28
    
    
      22
      ItsAConspiracy
      20
    
    
      19
      Haposhi
      18



In [ ]:

    
## grouping authors by total comment length



In [28]:

    
df.groupby('author')['body'].sum().map(lambda x: len(x)).to_frame().reset_index().sort_values('body', ascending=False).head(n = 10)









    Out[28]:







  
    
      
      author
      body
    
  
  
    
      216
      vbuterin
      26156
    
    
      167
      nucleo_io
      13325
    
    
      63
      aaron-lebo
      12099
    
    
      56
      Ursium
      11219
    
    
      180
      que23
      10230
    
    
      147
      malefizer
      9409
    
    
      110
      ericcart
      7750
    
    
      59
      [deleted]
      7321
    
    
      22
      ItsAConspiracy
      7229
    
    
      30
      Maegfaer
      6954



In [ ]:

	author	body	created_utc	postId	score
commentId
t1_ceht9kr	qaezel	The links are out!	1.388923e+09	t3_1ucwto	1
t1_cei0u4u	vbenes	Mind: blown! Contracts (their Turing complete...	1.388950e+09	t3_1ucwto	3
t1_celpxzb	salwilliam	"is decentralized Bitcoin-Ethereum exchange po...	1.389313e+09	t3_1ucwto	1
t1_celr9kl	vbuterin	Almost. It's pointless to have an Ethereum con...	1.389317e+09	t3_1ucwto	8
t1_cembxgf	salwilliam	http://4.bp.blogspot.com/-dFGIcV-hH5w/Uoz0JKRL...	1.389381e+09	t3_1ucwto	-4
t1_cemsiby	chem_deth	Your site is down, buddy. Get some Cloudflare ...	1.389423e+09	t3_1ucwto	3
t1_cemt8dp	coiv	I don't think many people know what "Turing co...	1.389426e+09	t3_1ucwto	24
t1_cemtctw	free593	This looks really good. I like how the concept...	1.389427e+09	t3_1ucwto	6
t1_cemxn5d	needsTimeMachine	The ethereum site is down, so do you happen to...	1.389453e+09	t3_1ucwto	4
t1_cemzmit	standardcrypto	Already slashdotted. This is going to be huge :)	1.389459e+09	t3_1ucwto	4

	author	body	created_utc	postId	score
commentId
t1_cewocxg	[deleted]	Hey Vik! I'd love to get involved with your ...	1.390460e+09	t3_1ucwto	1
t1_cexos6j	[deleted]	[deleted]	1.390571e+09	t3_1ucwto	3
t1_cezwnv3	[deleted]	[deleted]	1.390806e+09	t3_1ucwto	1
t1_cfa3sgi	[deleted]	[deleted]	1.391855e+09	t3_1ucwto	6
t1_cenl9ca	[deleted]	The problem with CPU mining is that it is only...	1.389519e+09	t3_1uzumc	1
t1_cenqlt6	[deleted]	A 5-20x speedup is still dangerous. With that ...	1.389547e+09	t3_1uzumc	1
t1_ceowmzu	[deleted]	[deleted]	1.389662e+09	t3_1v5i4l	2
t1_cepztgx	[deleted]	What conditions should be met to switch to tru...	1.389770e+09	t3_1v78zi	1
t1_ceq456q	[deleted]	> There is currently a lot of reasearch/discus...	1.389794e+09	t3_1v78zi	1
t1_ceq5pnu	[deleted]	Seems so.	1.389799e+09	t3_1v78zi	0

	author	score
216	vbuterin	191
59	[deleted]	93
56	Ursium	92
167	nucleo_io	78
147	malefizer	73
180	que23	72
110	ericcart	58
22	ItsAConspiracy	44
229	zerox102	41
101	ddink7	39

	author	score
188	salwilliam	-3
66	ambrozy007	-1
32	MaxK	-1
70	antanst	0
201	stop_runs	0
91	coin-table	0
13	Dartanan	0
211	twisthype	0
52	SyncoBeat	1
51	Symphonic_Rainboom	1

	author	body
216	vbuterin	26156
167	nucleo_io	13325
63	aaron-lebo	12099
56	Ursium	11219
180	que23	10230
147	malefizer	9409
110	ericcart	7750
59	[deleted]	7321
22	ItsAConspiracy	7229
30	Maegfaer	6954