Data Exploration

Perform exploratory data analysis on highlights and fulltexts of articles from Medium.com

Before trying to train a model, I performed exploratory analysis to visualize features of sentences from the corpus that I had scraped from Medium.com.


In [4]:
import os, time, re, pickle
import matplotlib.pyplot as plt
import numpy as np
import pandas as pd
from datetime import timedelta, date
import urllib
import html5lib
from selenium import webdriver
from selenium.webdriver.common.keys import Keys
from bs4 import BeautifulSoup, SoupStrainer

NOTES

  • FIND DUPLICATES

  • To calculate:

    • Length of highlight

    • Find highlight in fulltext (calculate when it starts, as percentage) -- break fulltext and highlight into words?

    • Percent length of highlight in fulltext

    • TF/IDF

    • Grade level of fulltext and highlight

CHECK LENGTH OF HIGHLIGHTS


In [3]:
with open('/Users/clarencecheng/Dropbox/~Insight/skimr/datasets/highlights_20170606_00-54-27.txt','r') as fhigh:
    hlist = fhigh.readlines()
    
matword = []
matchar = []
id_hl = []
id_no = []
ct_hl = 0
ct_no = 0
htext = []

for line in hlist:
    
    # Analyze length of highlights and number of lines with highlights
    line = line.strip().split('\t')
    if len(line) <= 2:
        id_no.append(line[1])
        ct_no += 1
    else:
        lenword = len(line[2].split())
        lenchar = len(line[2])
        # print( line )
        # print( length )
        matword.append(lenword)
        matchar.append(lenchar)
        id_hl.append(line[1])
        htext.append(line[2])
        ct_hl += 1
        
print( 'articles without highlights: '+str(ct_no) )
print( 'articles with highlights: '+str(ct_hl) )
print( htext[:10] )
print( id_hl[:10] )

plt.hist(matword, bins=50)
plt.title("Highlights, 20170606_00-54-27")
plt.xlabel("Number of words")
plt.ylabel("Frequency")
plt.show()

plt.hist(matchar, bins=50)
plt.title("Highlights, 20170606_00-54-27")
plt.xlabel("Number of letters")
plt.ylabel("Frequency")
plt.show()


articles without highlights: 391
articles with highlights: 1993
['Until you appreciate what you currently have, more won’t make your life better.', 'And let’s not feel terrified, but full of renewed solidarity for one another as terrorists try\u200a—\u200aand fail\u200a—\u200ato spread terror.', 'Hard work is doing the work other people don’t want to do.', 'He’d made up his mind to skip the dog thing altogether, decided to collect pussy cats instead, something like that.', 'Realizing that our actions, feelings and behaviour are the result of our own images and beliefs gives us the level that psychology has always needed for changing personality.', 'So, the advice I have is to find what your north star is, to understand yourself, to understand what your work/life balance and ambitions are. What your financial ambitions are. What your life challenges are. The problems that you want to try and solve. The mountains that you want to climb while you’re here. You can’t listen to anyone else. You need to listen to yourself, and do you.', 'Cryptoeconomic approaches combine cryptography and economics to create robust decentralized P2P networks that thrive over time despite adversaries attempting to disrupt the network.', 'You can feel however you want about Durant’s decision to join a 73-win team. He very well might win four consecutive championships and it might mean nothing to you. But there’s no denying the pure splendor of watching a top-50 all-time player flourish in a system that heightens his talents, especially after eight years of Oklahoma City caveman ball.', 'A good principle should be a tradeoff, a choice.', 'But do I think there are going to be 12 people that listen to my podcast or an appearance I’ve made as a guest on another show who don’t know a thing about me that are going to become a fan and like what I do? Absolutely.']
['2', '4', '5', '10', '12', '13', '14', '15', '16', '17']

CHECK FOR DUPLICATES

Because I scraped 30 popular articles per day with only 5 days between, some articles might be duplicates. Here, check for duplicates in the dataset.


In [4]:
seen = set()
seen_add = seen.add

htext_uniq = []
id_hl_uniq = []    # store ids of first unique highlights
id_hl_nonu = []    # store ids of non-unique highlights
idnum = 0

for x in htext:    # get unique highlights, preserving order
    if x in seen:
        id_hl_nonu.append(id_hl[idnum])
        idnum += 1
        continue
    seen_add(x)
    htext_uniq.append(x)
    id_hl_uniq.append(id_hl[idnum])
    idnum += 1

print(id_hl_uniq[49])    # check that id_hl_uniq matches htext_uniq -- it does!!
print(htext_uniq[49])
counts = []
counts2 = []
for x in htext_uniq:
    counts.append(htext_uniq.count(x))
    counts2.append(htext.count(x))
plt.hist(counts)
plt.show()
plt.hist(counts2)
plt.show()

print('number of unique highlights: '+str(len(id_hl_uniq)))
print('number of non-unique highlights: '+str(len(id_hl_nonu)))


66
If you want to be good at something, do it every day. No exceptions. You can use your time effectively and productivity if you are consistent.
number of unique highlights: 1908
number of non-unique highlights: 85

RE-CHECK LENGTH OF HIGHLIGHTS AFTER REMOVING DUPLICATES


In [5]:
matword_uniq = []
matchar_uniq = []
idnum = 0
for line in htext_uniq:
#     print(id_hl_uniq[idnum]+'\t'+line)
#     print(htext_uniq)
    lenword = len(line.split())
    matword_uniq.append(lenword)
    lenchar = len(line)
    matchar_uniq.append(lenchar)
    idnum += 1
    
plt.hist(matword_uniq, bins=50)
plt.title("Highlights (Unique), 20170606_00-54-27")
plt.xlabel("Number of words")
plt.ylabel("Frequency")
plt.show()

plt.hist(matchar_uniq, bins=50)
plt.title("Highlights (Unique), 20170606_00-54-27")
plt.xlabel("Number of letters")
plt.ylabel("Frequency")
plt.show()


CLEAN UP FULLTEXT

Get full text of article from html.

  • note: fixed to avoid getting comments as well as main text

In [25]:
fhtml = open('/Users/clarencecheng/Dropbox/~Insight/skimr/datasets/fullhtml_20170606_00-54-27_isolate.txt','r')

id_ft = []
fullt = []

# num = 0
for line in fhtml:
    text = line.strip().split('\t')
    fullh = text[2]
    fullt_line = []

    # testing SoupStrainer
# #     content = SoupStrainer('div',attrs={'data-source':'post_page'})
#     soup = BeautifulSoup(fullh,'lxml')#,parse_only=content)
#     txt0 = soup.find('div',attrs={'data-source':'post_page'})#class_='postArticle-content')
# #     print(txt0)
#     txt1 = txt0.find_all('p',class_='graf')
# #     print(txt1)

    soup = BeautifulSoup(fullh,'lxml')                            #,parse_only=content)
    txt0 = soup.find('div',attrs={'data-source':'post_page'})     #class_='postArticle-content')
    if not txt0:
        print('error! skipping '+text[1])
        continue
    txt1 = txt0.find_all('p',class_='graf')
    id_ft.append(text[1])

    for line in txt1:
        txt2 = re.sub('<[^>]+>', '', str(line) )
        fullt_line.append(txt2)
#     num+=1
#     if num == 10:
#         break

#     print(fullt_line)
    
    fullt.append( fullt_line )

print(id_ft[2])
print(fullt[2])

    
# fhtmldelim.write('\t'.join(fullh))
# fhtmldelim.close()


error! skipping 46
error! skipping 63
error! skipping 79
error! skipping 102
error! skipping 104
error! skipping 132
error! skipping 137
error! skipping 146
error! skipping 151
error! skipping 193
error! skipping 211
error! skipping 233
error! skipping 255
error! skipping 259
error! skipping 265
error! skipping 271
error! skipping 310
error! skipping 313
error! skipping 322
error! skipping 323
error! skipping 329
error! skipping 440
error! skipping 464
error! skipping 500
error! skipping 557
error! skipping 569
error! skipping 607
error! skipping 609
error! skipping 626
error! skipping 656
error! skipping 676
error! skipping 689
error! skipping 693
error! skipping 719
error! skipping 727
error! skipping 735
error! skipping 786
error! skipping 802
error! skipping 817
error! skipping 825
error! skipping 834
error! skipping 855
error! skipping 881
error! skipping 894
error! skipping 908
error! skipping 918
error! skipping 929
error! skipping 972
error! skipping 981
error! skipping 985
error! skipping 988
error! skipping 1029
error! skipping 1032
error! skipping 1037
error! skipping 1058
error! skipping 1069
error! skipping 1085
error! skipping 1086
error! skipping 1091
error! skipping 1111
error! skipping 1144
error! skipping 1187
error! skipping 1219
error! skipping 1232
error! skipping 1259
error! skipping 1266
error! skipping 1363
error! skipping 1374
error! skipping 1409
error! skipping 1432
error! skipping 1442
error! skipping 1446
error! skipping 1454
error! skipping 1460
error! skipping 1497
error! skipping 1500
error! skipping 1514
error! skipping 1563
error! skipping 1603
error! skipping 1619
error! skipping 1632
error! skipping 1641
error! skipping 1652
error! skipping 1708
error! skipping 1714
error! skipping 1725
error! skipping 1742
error! skipping 1747
error! skipping 1761
error! skipping 1772
error! skipping 1781
error! skipping 1790
error! skipping 1827
error! skipping 1879
error! skipping 1923
error! skipping 1970
error! skipping 2046
error! skipping 2088
error! skipping 2101
error! skipping 2129
error! skipping 2138
error! skipping 2148
error! skipping 2202
error! skipping 2223
error! skipping 2228
error! skipping 2253
error! skipping 2330
error! skipping 2349
3
['As a little girl, her first memories were clouded by confusion. It was the 1960’s, and she was six years old when her father’s Air Force unit got deployed to Vietnam. Her dad was heading to a jungle, and her mother was no longer excited about life.', 'She was the youngest of four, and her home was chaotic. She laid in bed at night and wondered if anything would ever be the same.', 'Soon, the Vietnam War permeated all aspects of the culture. Her mother tried to shield her, but the media was relentless. She would watch cartoons, and immediately after they ended, footage of jungles and dead American soldiers filled the screen. Every image filled her mind with terror and added to the uncertainty. Years passed, and her father eventually returned, but things would never be the same. The time apart, the fear she felt worrying about him, and the images on TV would never leave her mind.', 'The little girl grew up, and like everyone else affected by Vietnam, she did her best to put it behind her. She turned her focus towards college, and completed a bachelor’s degree in theatre arts and a master’s degree in writing. She worked odd jobs before catching a break at 29, when she landed a gig writing for TV. With a wealth of life experiences under her belt, she churned out scripts for Nickelodeon, working on shows like Clarissa Explains it All, The Mystery Files of Shelby Woo, and Clifford.', 'She had turned her passion for writing into a job. At this point, it would be easy for anyone to get comfortable, and she did. A decade passed, and without knowing it, each stroke of her pen was bringing her closer to facing her own demons and creating a story that would resonate around the world.', 'At 40, she took the leap and wrote her own book. It was released the next year, and sold enough copies for her publisher to order the next book in the series. For the next five years she wrote sequels in that same series of books, and in 2007, at 45 years old, she called it quits. It hadn’t broken out in a big way, and she was exhausted. She was married with two children, and knew she needed to find a new story to write, but it was hard to find time. There wasn’t that one BIG idea that was pulling her forward.', 'One late night, in a daze, she was flipping through the channels on TV. Reality television was on one channel and footage of the Iraq War on the next. In this light, she caught a glimpse of what the media had become: one big reality show. The media had turned war into a horrible reality show… In that moment, the feeling of separation from her father reemerged. She was no longer watching a reality show, but instead watching a story where she had an emotional connection. She knew what it felt like to be a girl and have her father’s number called. The powerlessness she felt as a child crept back over her as she watched what was, for most people, a reality show. It was real to her, and she knew it was even more real for the men and women over there.', 'But now she was no longer powerless. She had her pen, and immense skills that had developed over decades of honing her craft and putting in the work. She didn’t have this new idea, it had her. And in a reality TV obsessed culture, the time was right to tell it.', 'At 46 years old, she put pen to paper and wrote her story. A sequel followed, and then it became a trilogy. Within 14 months, 1.5 million copies were in circulation and the book was taking off. A film adaptation followed.', 'Today, Suzanne Collins has sold over 87 million books.', 'Her trilogy, The Hunger Games, its message, and its impact on culture are hardly ever discussed openly. Collins’ message isn’t just brave… it’s a radical deviation from stereotypical norms in entertainment. Collins’ story is a stark contrast to others in modern day pop culture. Most modern “artists” chose to glorify war without presenting the whole truth\u200a—\u200ait doesn’t end well.', 'So what type of debate did she want to spark? Collins’ answers are illuminating:', 'At least in reality shows, empathy is created with characters. Collins points out that the media treats war as a type of reality show, but doesn’t even cover the basics of creating empathetic connections with the cast. She makes the point that most of the audience is prevented from making connections with those who risk their lives. It’s not like Survivor where you follow each cast member religiously for weeks and weeks.', 'Massive Spoiler Alert ahead!', 'Suzanne Collins is one of the first authors to create a story with widespread pop culture appeal that doesn’t have a happy ending or a glorified message about war. For good reason. There is rarely a good reason that justifies war.', 'So will we continue to accept the same, boring, sacrificial altar because politicians can’t learn to negotiate? Now that Collins’ message is deep within the culture, we may be able to escape it.', 'The beauty of the story vehicle that Collins created was that it was disguised as a “Young Adult” story. As she says:', 'She wants people to identify their relationship with reality TV and the news, to realize what they take for granted, and to act against questionable government decisions. Ultimately, she wants individuals to be aware of the news and to act or make changes where they think necessary.', 'There was a good chance that her story wouldn’t work out. With her touchy subject, she could have offended someone at the publisher, or gotten blacklisted by the entertainment industry. Hollywood pays lip service to promoting a meritocracy of ideas, but they usually won’t promote anything that prompts serious thought or debate. The ideas that threaten their egos are the first to be killed… and they aren’t usually tolerant of any messages about virtue. In a day and age when people are terrified of speaking up for what they believe for fear of losing a “job,” Suzanne Collins decided to tell a story that would put her entire career at risk.', 'The Hunger Games is an example of a creative genius hiding philosophy inside a story. Collins used just the right amount of war glorification to introduce the series, then took a radical deviation once the reader was inside.', 'She articulates an unapologetic analysis of the United States. In The Hunger Games trilogy, Suzanne Collins carefully renames the U.S. to ‘Panem,’ after the Latin saying, ‘panem et circenses’ (bread and circuses)\u200a—\u200atwo things Roman rulers used to keep the public distracted.', 'Without the horrible separation from her father, and not knowing if he would live or die, she wouldn’t have been able to ensoul her books with such emotional authenticity. Herein lies the magic of The Hunger Games, and what the love (and separation) between a father and daughter can create. Collins chose a peaceful presentation of ideas and debate, and reframed how young people view war.', 'As children, we might be powerless to free those we care about from society’s sacrificial altars, but as adults, we don’t have to spend our lives on the sidelines. We can build up our own expertise, become powerful, and prove that the pen can be mightier than the sword.', 'Chad Grills is the founder of The Mission, your #1 source for accelerated learning. You can subscribe to their M-F newsletter here.', 'If you enjoyed this story, please recommend and share to help others find it! Feel free to leave a comment below and let us know what story you want to hear next\xa0:)']

Check that id_ft matches fullt matches id_hl_uniq matches htext_uniq -- all good!!


In [27]:
print(id_ft[64])
print(fullt[64])


67
['Hot on the heels of Sketch 43, we’ve used this release to focus on one of Sketch’s most fundamental features\u200a—\u200aArtboards. We’ve also added better support for missing fonts, dramatically improved resizing controls and added a great new feature to Sketch Cloud. Here’s an overview of the headline updates in Sketch 44:', 'They might not be the most exciting feature, but Artboards are a common starting point for so many Sketch projects. They’re already pretty flexible, but we figured they could do more for you in Sketch 44.', 'With this release, we’ve redesigned the Artboard picker to make it clearer and more flexible. We’ve simplified that long list of Artboard presets\u200a—\u200asplitting them up into categories that you can select using a drop-down menu. Also, instead of having a preset for each orientation of a device, you can now select portrait or landscape using a set of dedicated buttons. The result? Creating and controlling Artboards is now quicker and easier than ever before.', 'Better still, you can now select an Artboard and change its size on the fly by selecting a new preset in the Inspector. If you want to create a new preset, it’ll now take into account the size of the Artboard you have selected.', 'We’ve made it easier for you to select multiple Artboards in Sketch 44, too. Simply click and drag on the Canvas and ensure that all of the Artboards you want are fully contained within your selection.', 'Finally\u200a—\u200aand this is a big one\u200a—\u200ayou can now have an Artboard adjust the size of its contents as you resize the Artboard itself. We reckon this will seriously speed up the design process as you work across different screen sizes.', 'If you’re used to collaborating on designs with other Sketch users, there’s a good chance that (at some point) you’ll have been sent a document containing fonts that you’re missing. Having your whole design thrown slightly by a font change is never fun, but we hope this update will at least make it a little easier to deal with.', 'If you find yourself in this position with Sketch 44, you’ll now be notified about missing fonts and we’ll give you a couple of options to help fix this.', 'Even if you don’t have the font you need for the document you’re working on, we’ve made sure you’ll know about it, so you can adjust your work accordingly.', 'Select an object inside a group or Artboard and you’ll see a big change in the Inspector. We’ve completely overhauled Sketch 44’s resizing tools to be more powerful and flexible.', 'Firstly, the new controls are clearer and more intuitive. We’ve taken away the guesswork of figuring out which edge an object will pin to by giving you the option to choose yourself. Secondly, along with more powerful pinning, resizing itself is more flexible. In Sketch 44, you can select whether an object’s height or width (or both) change as you resize its parent.', 'We think these controls will make a huge difference to how you use resizable and responsive objects in your workflow.', 'Previously in Sketch, rounded corners on vector paths weren’t always handled correctly when the angle of the paths going into a point was particularly wide or narrow. That meant some shapes didn’t look quite right.', 'With Sketch 44, we’ve improved this behaviour, so rounded corners run smoothly, particularly when you choose to round a straight point type that’s connected to a curve.', 'Sketch Cloud is great for quickly sharing your designs with colleagues for feedback, but there’s been a missing piece to the puzzle…until now.', 'The Sketch 44 update coincides with a fresh version of Sketch Cloud\u200a—\u200aone that includes commenting. Now, right alongside your Artboards in Sketch Cloud, you’ll see a comments area where anyone you’ve shared your design with can leave their thoughts on what you’ve been working on. It’s ideal for quick rounds of feedback as you work on your next big project.', 'To make collaboration and feedback even easier, we’ve also updated Sketch Mirror so it’ll open and display any Sketch Cloud link that’s shared with you.', 'As ever, thanks for all of your feedback and suggestions\u200a—\u200ait’s what makes Sketch great. Here are a few highlights of other changes you’ll see in Sketch 44:', 'You can find a full list of bug fixes and improvements on our updates page.', 'Sketch 44 is a free update for everyone with an active license. If you need to renew your license, you’ll get Sketch 44 and a whole year’s worth of updates after that.', 'Do let us know what you think of this update\u200a—\u200awe’d love to hear from you. If you’ve got questions or feedback, you can get in touch with us visa our support page or join in the conversation on Twitter, or on our Facebook group.', 'Stay tuned for news about Sketch 45\u200a—\u200awe’re already working hard on it!']

CHECK LENGTH OF HIGHLIGHTS for second dataset

As mentioned before, I originally scraped articles with 10 days' spacing; now, scrape more articles with 10 days' spacing except offset by 5 days relative to original dates scraped.


In [16]:
with open('/Users/clarencecheng/Dropbox/~Insight/skimr/datasets/highlights_20170606_10-45-58.txt','r') as fhigh2:
    hlist2 = fhigh2.readlines()
    
matword2 = []
matchar2 = []
id_hl2 = []
id_no2 = []
ct_hl2 = 0
ct_no2 = 0
htext2 = []

for line in hlist2:
    
    # Analyze length of highlights and number of lines with highlights
    line = line.strip().split('\t')
    if len(line) <= 2:
        id_no2.append(str(int(line[1])+2384))
        ct_no2 += 1
    else:
        lenword2 = len(line[2].split())
        lenchar2 = len(line[2])
        # print( line )
        # print( length )
        matword2.append(lenword2)
        matchar2.append(lenchar2)
        id_hl2.append(str(int(line[1])+2384))
        htext2.append(line[2])
        ct_hl2 += 1
        
print( 'articles without highlights: '+str(ct_no2) )
print( 'articles with highlights: '+str(ct_hl2) )
print( htext2[:10] )
print( id_hl2[:10] )

plt.hist(matword2, bins=50)
plt.title("Highlights, 20170606_10-45-58")
plt.xlabel("Number of words")
plt.ylabel("Frequency")
plt.show()

plt.hist(matchar2, bins=50)
plt.title("Highlights, 20170606_10-45-58")
plt.xlabel("Number of letters")
plt.ylabel("Frequency")
plt.show()


articles without highlights: 297
articles with highlights: 1879
['If you continue to treat as many of the people you meet with kindness as you can, over the course of time, you will create a better world. Take the pressure off.', 'Getting where you want to be isn’t always about accumulating things, sometimes it’s about letting things go, learning to be more content, and being a more compassionate person.', '“Every next level of your life will demand a different you.”\u200a—\u200aLeonardo DiCaprio', 'Prioritize placing buttons at the bottom of the screen.', 'Amazon gives you the online price if you’re a Prime member, but the jacket price if you’re not.', 'I basically learned programming through just random online resources and books, but an important way that I learned was just by reading other people’s code.', 'When you focus on the process, the outcome will follow automatically.', 'It’s all proven to me that the collective power of human ambition cannot be stopped and it cannot be contained by anything except the limitations of our own time.', 'duality is both everything, nothing and one.', 'I’ll have another human in my house one day and they’ll be all like “hey, you got any lemon juice?” and I’ll be all like “sure do, in the fridge” and they’ll be like, “thanks, mate. Hey, if I want to set z-index on a flex item, do I need to specify a position?” and I’ll be all “nah bro, not for flex items.”']
['2387', '2388', '2389', '2390', '2391', '2392', '2394', '2396', '2397', '2400']

CLEAN UP FULLTEXT for second dataset


In [29]:
fhtml2 = open('/Users/clarencecheng/Dropbox/~Insight/skimr/datasets/fullhtml_20170606_10-45-58_edit_isolate.txt','r')

id_ft2 = []
fullt2 = []

# num = 0
for line in fhtml2:
    text = line.strip().split('\t')
    fullh = text[2]
    fullt_line = []

#     insetCol = SoupStrainer('div',{'class':'section-inner sectionLayout--insetColumn'})
    soup = BeautifulSoup(fullh,'lxml')                            #,parse_only=content)
    txt0 = soup.find('div',attrs={'data-source':'post_page'})     #class_='postArticle-content')
    if not txt0:
        print('error! skipping '+text[1])
        continue
    txt1 = txt0.find_all('p',class_='graf')
    id_ft2.append(str(int(text[1])+2384))

#     soup = BeautifulSoup(fullh,'lxml')#,parse_only=insetCol)
#     txt0 = soup.find('div',class_='section-inner sectionLayout--insetColumn')
#     txt1 = txt0.find_all('p',class_='graf')

    for line in txt1:
        txt2 = re.sub('<[^>]+>', '', str(line) )
        fullt_line.append(txt2)

#     print(fullt_line)
    fullt2.append( fullt_line )
#     num+=1
#     if num == 22:
#         break

print(id_ft2[9])
print(fullt2[9])
print(id_ft2)
print(str(len(id_ft2)))
print(str(len(fullt2)))

    
# fhtmldelim.write('\t'.join(fullh))
# fhtmldelim.close()


error! skipping 20
error! skipping 68
error! skipping 70
error! skipping 121
error! skipping 180
error! skipping 237
error! skipping 257
error! skipping 266
error! skipping 287
error! skipping 298
error! skipping 302
error! skipping 317
error! skipping 333
error! skipping 380
error! skipping 382
error! skipping 400
error! skipping 422
error! skipping 439
error! skipping 441
error! skipping 450
error! skipping 482
error! skipping 523
error! skipping 532
error! skipping 544
error! skipping 556
error! skipping 584
error! skipping 594
error! skipping 615
error! skipping 638
error! skipping 660
error! skipping 684
error! skipping 705
error! skipping 707
error! skipping 708
error! skipping 731
error! skipping 739
error! skipping 771
error! skipping 776
error! skipping 843
error! skipping 863
error! skipping 873
error! skipping 881
error! skipping 890
error! skipping 904
error! skipping 924
error! skipping 946
error! skipping 949
error! skipping 951
error! skipping 962
error! skipping 970
error! skipping 971
error! skipping 976
error! skipping 1006
error! skipping 1010
error! skipping 1012
error! skipping 1047
error! skipping 1048
error! skipping 1059
error! skipping 1113
error! skipping 1119
error! skipping 1157
error! skipping 1197
error! skipping 1237
error! skipping 1246
error! skipping 1254
error! skipping 1293
error! skipping 1329
error! skipping 1336
error! skipping 1373
error! skipping 1380
error! skipping 1484
error! skipping 1514
error! skipping 1541
error! skipping 1588
error! skipping 1593
error! skipping 1594
error! skipping 1601
error! skipping 1605
error! skipping 1611
error! skipping 1647
error! skipping 1653
error! skipping 1695
error! skipping 1698
error! skipping 1701
error! skipping 1703
error! skipping 1710
error! skipping 1735
error! skipping 1782
error! skipping 1791
error! skipping 1827
error! skipping 1829
error! skipping 1851
error! skipping 1862
error! skipping 1874
error! skipping 1951
error! skipping 1953
error! skipping 1956
error! skipping 1958
error! skipping 1984
error! skipping 1998
error! skipping 2018
error! skipping 2025
error! skipping 2038
error! skipping 2045
error! skipping 2059
error! skipping 2070
error! skipping 2113
error! skipping 2122
error! skipping 2143
error! skipping 2161
error! skipping 2173
2394
['I love practical advice that you can immediately apply to your life. And Zen, a school of Mahayana Buddhism, is full of practical wisdom.', 'When I tell my friends, colleagues, and people I work with that I like reading about Zen Buddhism, they often make remarks like: “When are you going to grow your hair, walk around bare feet, and talk about yoga all day?”', 'That’s the hipster way of life. Not the Zen way.', 'What is Zen, actually? To be honest, I don’t know. It’s not a religion, belief, or piece of knowledge.', 'I started reading more about Zen when I learned that legendary basketball coach Phil Jackson is very into Zen and used the concepts to coach Michael Jordan and Kobe Bryant.', 'And especially Kobe, a person who I have immense respect for, embraced Zen principles. When I found out about that, I wanted to learn more about Zen.', 'Phil Jackson also mentions a Zen quote in his book Eleven Rings (which is about the championship runs of the Chicago Bulls and LA Lakers):', 'My interpretation is that no matter what happens in your life; you must keep doing your task. I live by that philosophy too. You can replace enlightenment with any life goal. Nothing changes once you achieve something. You still have to do what you’re meant to do.', 'Over the past few years, I’ve read more about Zen and everything that’s related to it. What I’ve found is that it’s not a smart thing to get hung up on definitions, movements, and groups. Buddhism, Taoism, Zen\u200a—\u200athey share many of the same ideas. I also don’t care what is what and who invented certain ideas. I’ll leave that to the pseudo intellectualists of this world.', 'All I know is that many of the Zen teachings are very useful for living a peaceful and happy life. So I’ve made a list of 5 Zen lessons I’ve found practical and easily applicable to modern day life. Here we go.', 'The most important part of a Zen monk’s life is meditation. I’ve tried sitting meditation in the past. It’s not for me.', 'So I’ve turned running and strength training into my meditation. The most important thing about meditation is this: Practice being in the moment.', 'I’ve found it doesn’t matter what type of activity you use. Sitting meditation, yoga, running, strength training\u200a—\u200ayou can MAKE it work for you. Make sure you’re one with your body, clear your mind, and do it regularly.', 'One note: Meditation doesn’t work when you try to do six thousand things at the same time. I’ve recently learned to do one thing at a time.', 'I’ve stopped doing things like listening to audiobooks and podcasts when you’re working on something important, or when you’re exercising.', 'Ever since I quit that type of multitasking behavior, my workouts have improved drastically. These days, I completely focus on the task at hand: Running, lifting weights, my muscles, the way I breath, etc. I still like to listen to music because that easily moves to the background. You don’t have to focus on it.', 'This quote from Thích Nhất Hạnh, a Vietnamese Zen Monk, says it all:', 'Look, you don’t have to do groundbreaking things to live a meaningful life. You don’t need to be the youngest person to climb Everest. Actually, you don’t need to be the first person who does anything.', 'Just makes sure you enjoy most moments of your day. I say most because you’re probably way too busy to enjoy every moment. That’s not realistic unless you’re a Monk. But stopping for a few seconds a day, and enjoying the moment, that’s something everyone can do. No excuses.', 'We often look at outside sources for happiness: Travel, a new job, moving to a different city or county, a new partner, more experiences, etc. But if you’re unhappy now, you will probably be an unhappy person with new experiences.', 'A quote from the Japanese Zen Master Dogen explains it well:', 'Don’t look for happiness in other places. Find it right where you are. Once you become happy, it’s easier to stay happy.', 'Zen Monks and Masters don’t care about results. They focus on habits, rituals, and processes that support the Zen way of living.', 'Too often, we stare blindly on the results we want to achieve that we forget why we do something in the first place.', 'I don’t think there’s anything wrong with trying to achieve things. You don’t have to give up everything and move to a monastery.', 'But make sure you develop habits and rituals that support what you’re trying to achieve in life. When you focus on the process, the outcome will follow automatically.', 'Alan Watts was a British philosopher who was introduced to Zen in 1936, when he attended a conference where D. T. Suzuki spoke. Suzuki, a Japanese author, singlehandedly influenced the spreading of Zen in the West.', 'And ever since that moment, Watts (21 years old at the time) was fascinated with Zen. He wrote many books. One of the most popular books is Way of Zen. Watts also built a large following in the West. And I have to say that I like his work a lot.', 'Especially his perspective on the meaning of life. He said:', 'This sounds fucking obvious, but I’m going to say it anyway: Instead of thinking, spend your life living. Make yourself useful, solve problems, add value, and most importantly: Enjoy it.', 'Don’t rush life. Before you know, it will all be over. To me, that’s the true Zen way of living.']
['2385', '2386', '2387', '2388', '2389', '2390', '2391', '2392', '2393', '2394', '2395', '2396', '2397', '2398', '2399', '2400', '2401', '2402', '2403', '2405', '2406', '2407', '2408', '2409', '2410', '2411', '2412', '2413', '2414', '2415', '2416', '2417', '2418', '2419', '2420', '2421', '2422', '2423', '2424', '2425', '2426', '2427', '2428', '2429', '2430', '2431', '2432', '2433', '2434', '2435', '2436', '2437', '2438', '2439', '2440', '2441', '2442', '2443', '2444', '2445', '2446', '2447', '2448', '2449', '2450', '2451', '2453', '2455', '2456', '2457', '2458', '2459', '2460', '2461', '2462', '2463', '2464', '2465', '2466', '2467', '2468', '2469', '2470', '2471', '2472', '2473', '2474', '2475', '2476', '2477', '2478', '2479', '2480', '2481', '2482', '2483', '2484', '2485', '2486', '2487', '2488', '2489', '2490', '2491', '2492', '2493', '2494', '2495', '2496', '2497', '2498', '2499', '2500', '2501', '2502', '2503', '2504', '2506', '2507', '2508', '2509', '2510', '2511', '2512', '2513', '2514', '2515', '2516', '2517', '2518', '2519', '2520', '2521', '2522', '2523', '2524', '2525', '2526', '2527', '2528', '2529', '2530', '2531', '2532', '2533', '2534', '2535', '2536', '2537', '2538', '2539', '2540', '2541', '2542', '2543', '2544', '2545', '2546', '2547', '2548', '2549', '2550', '2551', '2552', '2553', '2554', '2555', '2556', '2557', '2558', '2559', '2560', '2561', '2562', '2563', '2565', '2566', '2567', '2568', '2569', '2570', '2571', '2572', '2573', '2574', '2575', '2576', '2577', '2578', '2579', '2580', '2581', '2582', '2583', '2584', '2585', '2586', '2587', '2588', '2589', '2590', '2591', '2592', '2593', '2594', '2595', '2596', '2597', '2598', '2599', '2600', '2601', '2602', '2603', '2604', '2605', '2606', '2607', '2608', '2609', '2610', '2611', '2612', '2613', '2614', '2615', '2616', '2617', '2618', '2619', '2620', '2622', '2623', '2624', '2625', '2626', '2627', '2628', '2629', '2630', '2631', '2632', '2633', '2634', '2635', '2636', '2637', '2638', '2639', '2640', '2642', '2643', '2644', '2645', '2646', '2647', '2648', '2649', '2651', '2652', '2653', '2654', '2655', '2656', '2657', '2658', '2659', '2660', '2661', '2662', '2663', '2664', '2665', '2666', '2667', '2668', '2669', '2670', '2672', '2673', '2674', '2675', '2676', '2677', '2678', '2679', '2680', '2681', '2683', '2684', '2685', '2687', '2688', '2689', '2690', '2691', '2692', '2693', '2694', '2695', '2696', '2697', '2698', '2699', '2700', '2702', '2703', '2704', '2705', '2706', '2707', '2708', '2709', '2710', '2711', '2712', '2713', '2714', '2715', '2716', '2718', '2719', '2720', '2721', '2722', '2723', '2724', '2725', '2726', '2727', '2728', '2729', '2730', '2731', '2732', '2733', '2734', '2735', '2736', '2737', '2738', '2739', '2740', '2741', '2742', '2743', '2744', '2745', '2746', '2747', '2748', '2749', '2750', '2751', '2752', '2753', '2754', '2755', '2756', '2757', '2758', '2759', '2760', '2761', '2762', '2763', '2765', '2767', '2768', '2769', '2770', '2771', '2772', '2773', '2774', '2775', '2776', '2777', '2778', '2779', '2780', '2781', '2782', '2783', '2785', '2786', '2787', '2788', '2789', '2790', '2791', '2792', '2793', '2794', '2795', '2796', '2797', '2798', '2799', '2800', '2801', '2802', '2803', '2804', '2805', '2807', '2808', '2809', '2810', '2811', '2812', '2813', '2814', '2815', '2816', '2817', '2818', '2819', '2820', '2821', '2822', '2824', '2826', '2827', '2828', '2829', '2830', '2831', '2832', '2833', '2835', '2836', '2837', '2838', '2839', '2840', '2841', '2842', '2843', '2844', '2845', '2846', '2847', '2848', '2849', '2850', '2851', '2852', '2853', '2854', '2855', '2856', '2857', '2858', '2859', '2860', '2861', '2862', '2863', '2864', '2865', '2867', '2868', '2869', '2870', '2871', '2872', '2873', '2874', '2875', '2876', '2877', '2878', '2879', '2880', '2881', '2882', '2883', '2884', '2885', '2886', '2887', '2888', '2889', '2890', '2891', '2892', '2893', '2894', '2895', '2896', '2897', '2898', '2899', '2900', '2901', '2902', '2903', '2904', '2905', '2906', '2908', '2909', '2910', '2911', '2912', '2913', '2914', '2915', '2917', '2918', '2919', '2920', '2921', '2922', '2923', '2924', '2925', '2926', '2927', '2929', '2930', '2931', '2932', '2933', '2934', '2935', '2936', '2937', '2938', '2939', '2941', '2942', '2943', '2944', '2945', '2946', '2947', '2948', '2949', '2950', '2951', '2952', '2953', '2954', '2955', '2956', '2957', '2958', '2959', '2960', '2961', '2962', '2963', '2964', '2965', '2966', '2967', '2969', '2970', '2971', '2972', '2973', '2974', '2975', '2976', '2977', '2979', '2980', '2981', '2982', '2983', '2984', '2985', '2986', '2987', '2988', '2989', '2990', '2991', '2992', '2993', '2994', '2995', '2996', '2997', '2998', '3000', '3001', '3002', '3003', '3004', '3005', '3006', '3007', '3008', '3009', '3010', '3011', '3012', '3013', '3014', '3015', '3016', '3017', '3018', '3019', '3020', '3021', '3023', '3024', '3025', '3026', '3027', '3028', '3029', '3030', '3031', '3032', '3033', '3034', '3035', '3036', '3037', '3038', '3039', '3040', '3041', '3042', '3043', '3045', '3046', '3047', '3048', '3049', '3050', '3051', '3052', '3053', '3054', '3055', '3056', '3057', '3058', '3059', '3060', '3061', '3062', '3063', '3064', '3065', '3066', '3067', '3069', '3070', '3071', '3072', '3073', '3074', '3075', '3076', '3077', '3078', '3079', '3080', '3081', '3082', '3083', '3084', '3085', '3086', '3087', '3088', '3090', '3093', '3094', '3095', '3096', '3097', '3098', '3099', '3100', '3101', '3102', '3103', '3104', '3105', '3106', '3107', '3108', '3109', '3110', '3111', '3112', '3113', '3114', '3116', '3117', '3118', '3119', '3120', '3121', '3122', '3124', '3125', '3126', '3127', '3128', '3129', '3130', '3131', '3132', '3133', '3134', '3135', '3136', '3137', '3138', '3139', '3140', '3141', '3142', '3143', '3144', '3145', '3146', '3147', '3148', '3149', '3150', '3151', '3152', '3153', '3154', '3156', '3157', '3158', '3159', '3161', '3162', '3163', '3164', '3165', '3166', '3167', '3168', '3169', '3170', '3171', '3172', '3173', '3174', '3175', '3176', '3177', '3178', '3179', '3180', '3181', '3182', '3183', '3184', '3185', '3186', '3187', '3188', '3189', '3190', '3191', '3192', '3193', '3194', '3195', '3196', '3197', '3198', '3199', '3200', '3201', '3202', '3203', '3204', '3205', '3206', '3207', '3208', '3209', '3210', '3211', '3212', '3213', '3214', '3215', '3216', '3217', '3218', '3219', '3220', '3221', '3222', '3223', '3224', '3225', '3226', '3228', '3229', '3230', '3231', '3232', '3233', '3234', '3235', '3236', '3237', '3238', '3239', '3240', '3241', '3242', '3243', '3244', '3245', '3246', '3248', '3249', '3250', '3251', '3252', '3253', '3254', '3255', '3256', '3258', '3259', '3260', '3261', '3262', '3263', '3264', '3266', '3267', '3268', '3269', '3270', '3271', '3272', '3273', '3275', '3276', '3277', '3278', '3279', '3280', '3281', '3282', '3283', '3284', '3285', '3286', '3287', '3289', '3290', '3291', '3292', '3293', '3294', '3295', '3296', '3297', '3298', '3299', '3300', '3301', '3302', '3303', '3304', '3305', '3306', '3307', '3309', '3310', '3311', '3312', '3313', '3314', '3315', '3316', '3317', '3318', '3319', '3320', '3321', '3322', '3323', '3324', '3325', '3326', '3327', '3328', '3329', '3331', '3332', '3334', '3336', '3337', '3338', '3339', '3340', '3341', '3342', '3343', '3344', '3345', '3347', '3348', '3349', '3350', '3351', '3352', '3353', '3356', '3357', '3358', '3359', '3361', '3362', '3363', '3364', '3365', '3366', '3367', '3368', '3369', '3370', '3371', '3372', '3373', '3374', '3375', '3376', '3377', '3378', '3379', '3380', '3381', '3382', '3383', '3384', '3385', '3386', '3387', '3388', '3389', '3391', '3392', '3393', '3395', '3397', '3398', '3399', '3400', '3401', '3402', '3403', '3404', '3405', '3406', '3407', '3408', '3409', '3410', '3411', '3412', '3413', '3414', '3415', '3416', '3417', '3418', '3419', '3420', '3421', '3422', '3423', '3424', '3425', '3426', '3427', '3428', '3429', '3430', '3433', '3434', '3435', '3436', '3437', '3438', '3439', '3440', '3441', '3442', '3444', '3445', '3446', '3447', '3448', '3449', '3450', '3451', '3452', '3453', '3454', '3455', '3456', '3457', '3458', '3459', '3460', '3461', '3462', '3463', '3464', '3465', '3466', '3467', '3468', '3469', '3470', '3471', '3472', '3473', '3474', '3475', '3476', '3477', '3478', '3479', '3480', '3481', '3482', '3483', '3484', '3485', '3486', '3487', '3488', '3489', '3490', '3491', '3492', '3493', '3494', '3495', '3496', '3498', '3499', '3500', '3501', '3502', '3504', '3505', '3506', '3507', '3508', '3509', '3510', '3511', '3512', '3513', '3514', '3515', '3516', '3517', '3518', '3519', '3520', '3521', '3522', '3523', '3524', '3525', '3526', '3527', '3528', '3529', '3530', '3531', '3532', '3533', '3534', '3535', '3536', '3537', '3538', '3539', '3540', '3542', '3543', '3544', '3545', '3546', '3547', '3548', '3549', '3550', '3551', '3552', '3553', '3554', '3555', '3556', '3557', '3558', '3559', '3560', '3561', '3562', '3563', '3564', '3565', '3566', '3567', '3568', '3569', '3570', '3571', '3572', '3573', '3574', '3575', '3576', '3577', '3578', '3579', '3580', '3582', '3583', '3584', '3585', '3586', '3587', '3588', '3589', '3590', '3591', '3592', '3593', '3594', '3595', '3596', '3597', '3598', '3599', '3600', '3601', '3602', '3603', '3604', '3605', '3606', '3607', '3608', '3609', '3610', '3611', '3612', '3613', '3614', '3615', '3616', '3617', '3618', '3619', '3620', '3622', '3623', '3624', '3625', '3626', '3627', '3628', '3629', '3631', '3632', '3633', '3634', '3635', '3636', '3637', '3639', '3640', '3641', '3642', '3643', '3644', '3645', '3646', '3647', '3648', '3649', '3650', '3651', '3652', '3653', '3654', '3655', '3656', '3657', '3658', '3659', '3660', '3661', '3662', '3663', '3664', '3665', '3666', '3667', '3668', '3669', '3670', '3671', '3672', '3673', '3674', '3675', '3676', '3678', '3679', '3680', '3681', '3682', '3683', '3684', '3685', '3686', '3687', '3688', '3689', '3690', '3691', '3692', '3693', '3694', '3695', '3696', '3697', '3698', '3699', '3700', '3701', '3702', '3703', '3704', '3705', '3706', '3707', '3708', '3709', '3710', '3711', '3712', '3714', '3715', '3716', '3717', '3718', '3719', '3721', '3722', '3723', '3724', '3725', '3726', '3727', '3728', '3729', '3730', '3731', '3732', '3733', '3734', '3735', '3736', '3737', '3738', '3739', '3740', '3741', '3742', '3743', '3744', '3745', '3746', '3747', '3748', '3749', '3750', '3751', '3752', '3753', '3754', '3755', '3756', '3758', '3759', '3760', '3761', '3762', '3763', '3765', '3766', '3767', '3768', '3769', '3770', '3771', '3772', '3773', '3774', '3775', '3776', '3777', '3778', '3779', '3780', '3781', '3782', '3783', '3784', '3785', '3786', '3787', '3788', '3789', '3790', '3791', '3792', '3793', '3794', '3795', '3796', '3797', '3798', '3799', '3800', '3801', '3802', '3803', '3804', '3805', '3806', '3807', '3808', '3809', '3810', '3811', '3812', '3813', '3814', '3815', '3816', '3817', '3818', '3819', '3820', '3821', '3822', '3823', '3824', '3825', '3826', '3827', '3828', '3829', '3830', '3831', '3832', '3833', '3834', '3835', '3836', '3837', '3838', '3839', '3840', '3841', '3842', '3843', '3844', '3845', '3846', '3847', '3848', '3849', '3850', '3851', '3852', '3853', '3854', '3855', '3856', '3857', '3858', '3859', '3860', '3861', '3862', '3863', '3864', '3865', '3866', '3867', '3869', '3870', '3871', '3872', '3873', '3874', '3875', '3876', '3877', '3878', '3879', '3880', '3881', '3882', '3883', '3884', '3885', '3886', '3887', '3888', '3889', '3890', '3891', '3892', '3893', '3894', '3895', '3896', '3897', '3899', '3900', '3901', '3902', '3903', '3904', '3905', '3906', '3907', '3908', '3909', '3910', '3911', '3912', '3913', '3914', '3915', '3916', '3917', '3918', '3919', '3920', '3921', '3922', '3923', '3924', '3926', '3927', '3928', '3929', '3930', '3931', '3932', '3933', '3934', '3935', '3936', '3937', '3938', '3939', '3940', '3941', '3942', '3943', '3944', '3945', '3946', '3947', '3948', '3949', '3950', '3951', '3952', '3953', '3954', '3955', '3956', '3957', '3958', '3959', '3960', '3961', '3962', '3963', '3964', '3965', '3966', '3967', '3968', '3969', '3970', '3971', '3973', '3974', '3975', '3976', '3979', '3980', '3981', '3982', '3983', '3984', '3986', '3987', '3988', '3990', '3991', '3992', '3993', '3994', '3996', '3997', '3998', '3999', '4000', '4001', '4002', '4003', '4004', '4005', '4006', '4007', '4008', '4009', '4010', '4011', '4012', '4013', '4014', '4015', '4016', '4017', '4018', '4019', '4020', '4021', '4022', '4023', '4024', '4025', '4026', '4027', '4028', '4029', '4030', '4032', '4033', '4034', '4035', '4036', '4038', '4039', '4040', '4041', '4042', '4043', '4044', '4045', '4046', '4047', '4048', '4049', '4050', '4051', '4052', '4053', '4054', '4055', '4056', '4057', '4058', '4059', '4060', '4061', '4062', '4063', '4064', '4065', '4066', '4067', '4068', '4069', '4070', '4071', '4072', '4073', '4074', '4075', '4076', '4077', '4078', '4080', '4081', '4083', '4084', '4086', '4088', '4089', '4090', '4091', '4092', '4093', '4095', '4096', '4097', '4098', '4099', '4100', '4101', '4102', '4103', '4104', '4105', '4106', '4107', '4108', '4109', '4110', '4111', '4112', '4113', '4114', '4115', '4116', '4117', '4118', '4120', '4121', '4122', '4123', '4124', '4125', '4126', '4127', '4128', '4129', '4130', '4131', '4132', '4133', '4134', '4135', '4136', '4137', '4138', '4139', '4140', '4141', '4142', '4143', '4144', '4145', '4146', '4147', '4148', '4149', '4150', '4151', '4152', '4153', '4154', '4155', '4156', '4157', '4158', '4159', '4160', '4161', '4162', '4163', '4164', '4165', '4167', '4168', '4169', '4170', '4171', '4172', '4173', '4174', '4176', '4177', '4178', '4179', '4180', '4181', '4182', '4183', '4184', '4185', '4186', '4187', '4188', '4189', '4190', '4191', '4192', '4193', '4194', '4195', '4196', '4197', '4198', '4199', '4200', '4201', '4202', '4203', '4204', '4205', '4206', '4207', '4208', '4209', '4210', '4212', '4214', '4215', '4216', '4217', '4218', '4219', '4220', '4221', '4222', '4223', '4224', '4225', '4226', '4227', '4228', '4229', '4230', '4231', '4232', '4233', '4234', '4236', '4237', '4238', '4239', '4240', '4241', '4242', '4243', '4244', '4245', '4247', '4248', '4249', '4250', '4251', '4252', '4253', '4254', '4255', '4256', '4257', '4259', '4260', '4261', '4262', '4263', '4264', '4265', '4266', '4267', '4268', '4269', '4270', '4271', '4272', '4273', '4274', '4275', '4276', '4277', '4278', '4279', '4280', '4281', '4282', '4283', '4284', '4285', '4286', '4287', '4288', '4289', '4290', '4291', '4292', '4293', '4294', '4295', '4296', '4297', '4298', '4299', '4300', '4301', '4302', '4303', '4304', '4305', '4306', '4307', '4308', '4309', '4310', '4311', '4312', '4313', '4314', '4315', '4316', '4317', '4318', '4319', '4320', '4321', '4322', '4323', '4324', '4325', '4326', '4327', '4328', '4329', '4330', '4331', '4332', '4333', '4334', '4336', '4338', '4339', '4341', '4343', '4344', '4345', '4346', '4347', '4348', '4349', '4350', '4351', '4352', '4353', '4354', '4355', '4356', '4357', '4358', '4359', '4360', '4361', '4362', '4363', '4364', '4365', '4366', '4367', '4369', '4370', '4371', '4372', '4373', '4374', '4375', '4376', '4377', '4378', '4379', '4380', '4381', '4383', '4384', '4385', '4386', '4387', '4388', '4389', '4390', '4391', '4392', '4393', '4394', '4395', '4396', '4397', '4398', '4399', '4400', '4401', '4403', '4404', '4405', '4406', '4407', '4408', '4410', '4411', '4412', '4413', '4414', '4415', '4416', '4417', '4418', '4419', '4420', '4421', '4423', '4424', '4425', '4426', '4427', '4428', '4430', '4431', '4432', '4433', '4434', '4435', '4436', '4437', '4438', '4439', '4440', '4441', '4442', '4444', '4445', '4446', '4447', '4448', '4449', '4450', '4451', '4452', '4453', '4455', '4456', '4457', '4458', '4459', '4460', '4461', '4462', '4463', '4464', '4465', '4466', '4467', '4468', '4469', '4470', '4471', '4472', '4473', '4474', '4475', '4476', '4477', '4478', '4479', '4480', '4481', '4482', '4483', '4484', '4485', '4486', '4487', '4488', '4489', '4490', '4491', '4492', '4493', '4494', '4495', '4496', '4498', '4499', '4500', '4501', '4502', '4503', '4504', '4505', '4507', '4508', '4509', '4510', '4511', '4512', '4513', '4514', '4515', '4516', '4517', '4518', '4519', '4520', '4521', '4522', '4523', '4524', '4525', '4526', '4528', '4529', '4530', '4531', '4532', '4533', '4534', '4535', '4536', '4537', '4538', '4539', '4540', '4541', '4542', '4543', '4544', '4546', '4547', '4548', '4549', '4550', '4551', '4552', '4553', '4554', '4555', '4556', '4558', '4559', '4560']
2065
2065

COMBINE DATASETS 1 AND 2


In [ ]:
id_ft_all = id_ft + id_ft2
fullt_all = fullt + fullt2

id_hl_all = id_hl + id_hl2
htext_all = htext + htext2

print(str(len(id_ft)))
print(str(len(id_ft2)))
print(str(len(id_ft_all)))
print(str(len(fullt)))
print(str(len(fullt2)))
print(str(len(fullt_all)))

print(str(len(id_hl)))
print(str(len(id_hl2)))
print(str(len(id_hl_all)))
print(str(len(htext)))
print(str(len(htext2)))
print(str(len(htext_all)))

print(str(id_ft[-1]))

CHECK FOR DUPLICATES AFTER COMBINING DATASETS


In [33]:
seen = set()
seen_add = seen.add

htext_unq = []
id_hl_unq = []    # store ids of first unique highlights
id_hl_non = []    # store ids of non-unique highlights
idnum = 0

for x in htext_all:    # get unique highlights, preserving order
    if x in seen:
        id_hl_non.append(id_hl_all[idnum])
        idnum += 1
        continue
    seen_add(x)
    htext_unq.append(x)
    id_hl_unq.append(id_hl_all[idnum])
    idnum += 1

print(id_hl_unq[49])    # check that id_hl_uniq matches htext_uniq -- it does!!
print(htext_unq[49])
counts = []
counts2 = []
for x in htext_unq:
    counts.append(htext_unq.count(x))
    counts2.append(htext_all.count(x))
plt.hist(counts)
plt.show()
plt.hist(counts2)
plt.show()

print('number of unique highlights: '+str(len(id_hl_unq)))
print('number of non-unique highlights: '+str(len(id_hl_non)))


66
If you want to be good at something, do it every day. No exceptions. You can use your time effectively and productivity if you are consistent.
number of unique highlights: 3357
number of non-unique highlights: 515

Create dictionaries to identify only articles containing a top highlight

• combine highlights+ids in dict, fulltext+ids in dict; compare and pull out only fulltext with highlights


In [86]:
keys_fullt = id_ft_all
vals_fullt = fullt_all
dict_fullt = dict(zip(keys_fullt,vals_fullt))

keys_htext = id_hl_unq
vals_htext = htext_unq
dict_htext = dict(zip(keys_htext,vals_htext))

keysf = set(dict_fullt.keys())
keysh = set(dict_htext.keys())
intersect = keysf & keysh
# print(intersect)

print(len(keysf))
print(len(keysh))
print(len(intersect))

interx = list(map(int,intersect))
interx.sort()
interx = list(map(str,interx))
# print(interx)

keys_h = []
vals_h = []
keys_f = []
vals_f = []
for i in interx:
    keys_h.append(i)
    vals_h.append(dict_htext[i])
    keys_f.append(i)
    vals_f.append(dict_fullt[i])

dict_h = dict(zip(keys_h,vals_h))
dict_f = dict(zip(keys_f,vals_f))

print(len(keys_h))
print(len(vals_h))
print(len(keys_f))
print(len(vals_f))

keys_h == keys_f # True

vals_all = zip(vals_h, vals_f)
dict_all = dict(zip(keys_h, vals_all))

dict_all['2']


4341
3357
3201
3201
3201
3201
3201

Save dict_all with pickle


In [89]:
# fdict_all = open('/Users/clarencecheng/Dropbox/~Insight/skimr/datasets/dict_all','wb')
# pickle.dump(dict_all, fdict_all)


Out[89]:
True

retrieve with pickle


In [9]:
dict_temp = pickle.load(open('/Users/clarencecheng/Dropbox/~Insight/skimr/dict_all','rb'))

# dict_temp == dict_all

Write to files


In [81]:
file_high = open('/Users/clarencecheng/Dropbox/~Insight/skimr/datasets/final_highlights.txt','w')
file_full = open('/Users/clarencecheng/Dropbox/~Insight/skimr/datasets/final_fulltext.txt','w')
file_all = open('/Users/clarencecheng/Dropbox/~Insight/skimr/datasets/final_all.txt','w')

for i in interx:
    file_high.write(i+'\t'+dict_h[i]+'\n')
    file_full.write(i+'\t'+'|'.join(dict_f[i])+'\n')
    file_all.write(i+'\t'+dict_h[i]+'\t'+'|'.join(dict_f[i])+'\n')

file_high.close()
file_full.close()
file_all.close()

In [207]:
# DELETE HIGHLIGHTS FROM FULLTEXT SENTENCES

n = 0
sent_all = []
for i in data['ids']:
    full = str(' '.join(data['text'][n]))
    high = data['highlights'][n]
    fnoh = full.replace(high,' ')

    sent = tokenizer.tokenize(fnoh)
    for j in sent: 
        sent_all.append(j)    # collect all sentences from all full texts into one list

    n+=1

print(len(sent_all))


238462

Save all sentences with pickle and save dataframe with pickle


In [209]:
fsent_all = open('/Users/clarencecheng/Dropbox/~Insight/skimr/datasets/sent_all','wb')
pickle.dump(sent_all, fsent_all)

In [214]:
# data = pd.DataFrame({'ids':keys_h, 'highlights':vals_h, 'text':vals_f})
fdata = open('/Users/clarencecheng/Dropbox/~Insight/skimr/datasets/data_pd','wb')
pickle.dump(data,fdata)

In [215]:
data_temp = pickle.load(open('/Users/clarencecheng/Dropbox/~Insight/skimr/datasets/data_pd','rb'))
data_temp == data


Out[215]:
highlights ids text
0 True True True
1 True True True
2 True True True
3 True True True
4 True True True
5 True True True
6 True True True
7 True True True
8 True True True
9 True True True
10 True True True
11 True True True
12 True True True
13 True True True
14 True True True
15 True True True
16 True True True
17 True True True
18 True True True
19 True True True
20 True True True
21 True True True
22 True True True
23 True True True
24 True True True
25 True True True
26 True True True
27 True True True
28 True True True
29 True True True
... ... ... ...
3171 True True True
3172 True True True
3173 True True True
3174 True True True
3175 True True True
3176 True True True
3177 True True True
3178 True True True
3179 True True True
3180 True True True
3181 True True True
3182 True True True
3183 True True True
3184 True True True
3185 True True True
3186 True True True
3187 True True True
3188 True True True
3189 True True True
3190 True True True
3191 True True True
3192 True True True
3193 True True True
3194 True True True
3195 True True True
3196 True True True
3197 True True True
3198 True True True
3199 True True True
3200 True True True

3201 rows × 3 columns


In [ ]:


In [ ]:

REANALYSIS - 6/8/2017

load dict_all from pickled file as dict_temp


In [17]:
# print(dict_temp['2'])
keys = []
vals_h = []
vals_f = []
for key, val in dict_temp.items():
    keys.append(key)
    vals_h.append(val[0])
    vals_f.append(val[1])
data_tmp = pd.DataFrame({'ids':keys, 'highlights':vals_h, 'text':vals_f})

data_temp = pickle.load(open('/Users/clarencecheng/Dropbox/~Insight/skimr/datasets/data_pd','rb'))
data_tmp == data_temp
data_tmp.equals(data_temp)


Out[17]:
True

FIND HIGHLIGHT IN FULL TEXT (START AND END POSITION)

  • calculate fraction of way through document (in sentences) (maybe words later?) (why not characters?)
  • calculate fraction length (highlight / document) (in words and sentences)

In [ ]:
import nltk.data

# # read dict_all into list
# data = [(k,v) for k,v in dict_all.items()]
# # d_high = data[1][0]
# # print(data[1])

# READ dict_all into pandas
sent_join = []

# CONVERT list of paragraphs in 'text' column into string containing all text
for i in data['text']:
    sent = str(' '.join(i))
    sent_join.append(str(sent))
# print(sent) # '\u200a' appears when not using print fxn
# sent
# print(sent_join[-1])
# sent_join[-1]

# BREAK text into sentences and find fraction of sentences into text that highlight appears
tokenizer = nltk.data.load('tokenizers/punkt/english.pickle')

n = 0
count_in = 0
count_out = 0
lenfs = []
poshs = []
fracs = []
# sent_all = []
for i in data['ids']:
    full = str(' '.join(data['text'][n]))
    sent = tokenizer.tokenize(full)
#     for j in sent: 
#         sent_all.append(j)    # collect all sentences from all full texts into one list
    lenf = len(sent)
    lenfs.append(lenf)
    high = tokenizer.tokenize(data['highlights'][n])
    try:
        posh = sent.index(high[0])
        poshs.append(posh)
        fracs.append(posh/lenf)
#         print('highlight pos: '+str(posh)+'\tnumber of sent: '+str(lenf)+'\tfraction: '+str(posh/lenf))
        count_in += 1
    except ValueError:
#         print('highlight not in sentence list!')
        count_out += 1
        pass
    n += 1
#     if n == 6:
#         break
    
print(count_in)
print(count_out)

# print(len(sent_all))

ALTERNATIVE:

  • Find fraction of words into text that highlight appears, then...
  • Find which sentence num that word belongs to and calc sentence fraction for highlight

In [ ]: