Import needed libraries


In [1]:
import requests
from lxml import html

We used the library "request" last time in getting Twitter data (REST-ful). We are introducing the new "lxml" library for analyzing & extracting HTML elements and attributes here.

Use Requests to get HackerNews content

HackerNews is a community contributed news website with an emphasis on technology related content. Let's grab the set of articles that are at the top of the HN list.


In [2]:
response = requests.get('http://news.ycombinator.com/')
response


Out[2]:
<Response [200]>

In [3]:
response.content


Out[3]:
'<html op="news"><head><meta name="referrer" content="origin"><meta name="viewport" content="width=device-width, initial-scale=1.0"><link rel="stylesheet" type="text/css" href="news.css?WwbEhbljl4NoDa7axYx5">\n        <link rel="shortcut icon" href="favicon.ico">\n          <link rel="alternate" type="application/rss+xml" title="RSS" href="rss">\n        <title>Hacker News</title>\n      </head><body><center><table id="hnmain" border="0" cellpadding="0" cellspacing="0" width="85%" bgcolor="#f6f6ef">\n        <tr><td bgcolor="#ff6600"><table border="0" cellpadding="0" cellspacing="0" width="100%" style="padding:2px"><tr><td style="width:18px;padding-right:4px"><a href="http://www.ycombinator.com"><img src="y18.gif" width="18" height="18" style="border:1px white solid;"></a></td>\n                  <td style="line-height:12pt; height:10px;"><span class="pagetop"><b class="hnname"><a href="news">Hacker News</a></b>\n              <a href="newest">new</a> | <a href="newcomments">comments</a> | <a href="show">show</a> | <a href="ask">ask</a> | <a href="jobs">jobs</a> | <a href="submit">submit</a>            </span></td><td style="text-align:right;padding-right:4px;"><span class="pagetop">\n                              <a href="login?goto=news">login</a>\n                          </span></td>\n              </tr></table></td></tr>\n<tr style="height:10px"></tr><tr><td><table border="0" cellpadding="0" cellspacing="0" class="itemlist">\n              <tr class=\'athing\' id=\'12768768\'>\n      <td align="right" valign="top" class="title"><span class="rank">1.</span></td>      <td valign="top" class="votelinks"><center><a id=\'up_12768768\' href=\'vote?id=12768768&amp;how=up&amp;goto=news\'><div class=\'votearrow\' title=\'upvote\'></div></a></center></td><td class="title"><a href="https://www.brainpickings.org/2014/10/13/kierkegaard-diary-bullying-trolling-haters/" class="storylink">Why Haters Hate: Kierkegaard Explains the Psychology of Trolling in 1847</a><span class="sitebit comhead"> (<a href="from?site=brainpickings.org"><span class="sitestr">brainpickings.org</span></a>)</span></td></tr><tr><td colspan="2"></td><td class="subtext">\n        <span class="score" id="score_12768768">74 points</span> by <a href="user?id=DyslexicAtheist" class="hnuser">DyslexicAtheist</a> <span class="age"><a href="item?id=12768768">3 hours ago</a></span> <span id="unv_12768768"></span> | <a href="hide?id=12768768&amp;goto=news">hide</a> | <a href="item?id=12768768">63&nbsp;comments</a>              </td></tr>\n      <tr class="spacer" style="height:5px"></tr>\n                <tr class=\'athing\' id=\'12769261\'>\n      <td align="right" valign="top" class="title"><span class="rank">2.</span></td>      <td valign="top" class="votelinks"><center><a id=\'up_12769261\' href=\'vote?id=12769261&amp;how=up&amp;goto=news\'><div class=\'votearrow\' title=\'upvote\'></div></a></center></td><td class="title"><a href="http://libvmi.com/" class="storylink">LibVMI: virtual machine introspection</a><span class="sitebit comhead"> (<a href="from?site=libvmi.com"><span class="sitestr">libvmi.com</span></a>)</span></td></tr><tr><td colspan="2"></td><td class="subtext">\n        <span class="score" id="score_12769261">25 points</span> by <a href="user?id=ingve" class="hnuser">ingve</a> <span class="age"><a href="item?id=12769261">1 hour ago</a></span> <span id="unv_12769261"></span> | <a href="hide?id=12769261&amp;goto=news">hide</a> | <a href="item?id=12769261">2&nbsp;comments</a>              </td></tr>\n      <tr class="spacer" style="height:5px"></tr>\n                <tr class=\'athing\' id=\'12768881\'>\n      <td align="right" valign="top" class="title"><span class="rank">3.</span></td>      <td valign="top" class="votelinks"><center><a id=\'up_12768881\' href=\'vote?id=12768881&amp;how=up&amp;goto=news\'><div class=\'votearrow\' title=\'upvote\'></div></a></center></td><td class="title"><a href="https://about.gitlab.com/2016/10/22/gitlab-8-13-released/" class="storylink">GitLab 8.13 Released with Multiple Issue Boards and Merge Conflict Editor</a><span class="sitebit comhead"> (<a href="from?site=gitlab.com"><span class="sitestr">gitlab.com</span></a>)</span></td></tr><tr><td colspan="2"></td><td class="subtext">\n        <span class="score" id="score_12768881">90 points</span> by <a href="user?id=Smibu" class="hnuser">Smibu</a> <span class="age"><a href="item?id=12768881">2 hours ago</a></span> <span id="unv_12768881"></span> | <a href="hide?id=12768881&amp;goto=news">hide</a> | <a href="item?id=12768881">14&nbsp;comments</a>              </td></tr>\n      <tr class="spacer" style="height:5px"></tr>\n                <tr class=\'athing\' id=\'12768319\'>\n      <td align="right" valign="top" class="title"><span class="rank">4.</span></td>      <td valign="top" class="votelinks"><center><a id=\'up_12768319\' href=\'vote?id=12768319&amp;how=up&amp;goto=news\'><div class=\'votearrow\' title=\'upvote\'></div></a></center></td><td class="title"><a href="http://www.bloomberg.com/news/articles/2016-09-22/the-professor-who-was-right-about-index-funds-all-along" class="storylink">A Professor Who Was Right About Index Funds All Along</a><span class="sitebit comhead"> (<a href="from?site=bloomberg.com"><span class="sitestr">bloomberg.com</span></a>)</span></td></tr><tr><td colspan="2"></td><td class="subtext">\n        <span class="score" id="score_12768319">128 points</span> by <a href="user?id=carlosgg" class="hnuser">carlosgg</a> <span class="age"><a href="item?id=12768319">5 hours ago</a></span> <span id="unv_12768319"></span> | <a href="hide?id=12768319&amp;goto=news">hide</a> | <a href="item?id=12768319">111&nbsp;comments</a>              </td></tr>\n      <tr class="spacer" style="height:5px"></tr>\n                <tr class=\'athing\' id=\'12768425\'>\n      <td align="right" valign="top" class="title"><span class="rank">5.</span></td>      <td valign="top" class="votelinks"><center><a id=\'up_12768425\' href=\'vote?id=12768425&amp;how=up&amp;goto=news\'><div class=\'votearrow\' title=\'upvote\'></div></a></center></td><td class="title"><a href="http://www.nongnu.org/lzip/xz_inadequate.html" class="storylink">Xz format inadequate for long-term archiving</a><span class="sitebit comhead"> (<a href="from?site=nongnu.org"><span class="sitestr">nongnu.org</span></a>)</span></td></tr><tr><td colspan="2"></td><td class="subtext">\n        <span class="score" id="score_12768425">106 points</span> by <a href="user?id=martianh" class="hnuser">martianh</a> <span class="age"><a href="item?id=12768425">4 hours ago</a></span> <span id="unv_12768425"></span> | <a href="hide?id=12768425&amp;goto=news">hide</a> | <a href="item?id=12768425">48&nbsp;comments</a>              </td></tr>\n      <tr class="spacer" style="height:5px"></tr>\n                <tr class=\'athing\' id=\'12768719\'>\n      <td align="right" valign="top" class="title"><span class="rank">6.</span></td>      <td valign="top" class="votelinks"><center><a id=\'up_12768719\' href=\'vote?id=12768719&amp;how=up&amp;goto=news\'><div class=\'votearrow\' title=\'upvote\'></div></a></center></td><td class="title"><a href="https://en.wikipedia.org/wiki/ZMODEM" class="storylink">ZMODEM</a><span class="sitebit comhead"> (<a href="from?site=wikipedia.org"><span class="sitestr">wikipedia.org</span></a>)</span></td></tr><tr><td colspan="2"></td><td class="subtext">\n        <span class="score" id="score_12768719">48 points</span> by <a href="user?id=turrini" class="hnuser">turrini</a> <span class="age"><a href="item?id=12768719">3 hours ago</a></span> <span id="unv_12768719"></span> | <a href="hide?id=12768719&amp;goto=news">hide</a> | <a href="item?id=12768719">29&nbsp;comments</a>              </td></tr>\n      <tr class="spacer" style="height:5px"></tr>\n                <tr class=\'athing\' id=\'12767821\'>\n      <td align="right" valign="top" class="title"><span class="rank">7.</span></td>      <td valign="top" class="votelinks"><center><a id=\'up_12767821\' href=\'vote?id=12767821&amp;how=up&amp;goto=news\'><div class=\'votearrow\' title=\'upvote\'></div></a></center></td><td class="title"><a href="https://www.youtube.com/watch?v=hyry8mgXiTk" class="storylink">1177 BC \xe2\x80\x93 The Year Civilization Collapsed [video]</a><span class="sitebit comhead"> (<a href="from?site=youtube.com"><span class="sitestr">youtube.com</span></a>)</span></td></tr><tr><td colspan="2"></td><td class="subtext">\n        <span class="score" id="score_12767821">207 points</span> by <a href="user?id=dmlhllnd" class="hnuser">dmlhllnd</a> <span class="age"><a href="item?id=12767821">8 hours ago</a></span> <span id="unv_12767821"></span> | <a href="hide?id=12767821&amp;goto=news">hide</a> | <a href="item?id=12767821">43&nbsp;comments</a>              </td></tr>\n      <tr class="spacer" style="height:5px"></tr>\n                <tr class=\'athing\' id=\'12769178\'>\n      <td align="right" valign="top" class="title"><span class="rank">8.</span></td>      <td valign="top" class="votelinks"><center><a id=\'up_12769178\' href=\'vote?id=12769178&amp;how=up&amp;goto=news\'><div class=\'votearrow\' title=\'upvote\'></div></a></center></td><td class="title"><a href="https://tech.slashdot.org/story/16/10/22/008216/google-has-quietly-dropped-ban-on-personally-identifiable-web-tracking?utm_source=feedburner&amp;utm_medium=feed&amp;utm_campaign=Feed%3A+Slashdot%2Fslashdot+%28Slashdot%29" class="storylink">Google Has Dropped Ban on Personally Identifiable Web Tracking</a><span class="sitebit comhead"> (<a href="from?site=slashdot.org"><span class="sitestr">slashdot.org</span></a>)</span></td></tr><tr><td colspan="2"></td><td class="subtext">\n        <span class="score" id="score_12769178">50 points</span> by <a href="user?id=pcunite" class="hnuser">pcunite</a> <span class="age"><a href="item?id=12769178">1 hour ago</a></span> <span id="unv_12769178"></span> | <a href="hide?id=12769178&amp;goto=news">hide</a> | <a href="item?id=12769178">8&nbsp;comments</a>              </td></tr>\n      <tr class="spacer" style="height:5px"></tr>\n                <tr class=\'athing\' id=\'12767560\'>\n      <td align="right" valign="top" class="title"><span class="rank">9.</span></td>      <td valign="top" class="votelinks"><center><a id=\'up_12767560\' href=\'vote?id=12767560&amp;how=up&amp;goto=news\'><div class=\'votearrow\' title=\'upvote\'></div></a></center></td><td class="title"><a href="https://vuejs.org/guide/comparison.html" class="storylink">Comparison with Other Frameworks</a><span class="sitebit comhead"> (<a href="from?site=vuejs.org"><span class="sitestr">vuejs.org</span></a>)</span></td></tr><tr><td colspan="2"></td><td class="subtext">\n        <span class="score" id="score_12767560">192 points</span> by <a href="user?id=wanderer42" class="hnuser">wanderer42</a> <span class="age"><a href="item?id=12767560">10 hours ago</a></span> <span id="unv_12767560"></span> | <a href="hide?id=12767560&amp;goto=news">hide</a> | <a href="item?id=12767560">97&nbsp;comments</a>              </td></tr>\n      <tr class="spacer" style="height:5px"></tr>\n                <tr class=\'athing\' id=\'12766493\'>\n      <td align="right" valign="top" class="title"><span class="rank">10.</span></td>      <td valign="top" class="votelinks"><center><a id=\'up_12766493\' href=\'vote?id=12766493&amp;how=up&amp;goto=news\'><div class=\'votearrow\' title=\'upvote\'></div></a></center></td><td class="title"><a href="https://www.infoq.com/presentations/category-theory-propositions-principle" class="storylink">Category Theory for the Working Hacker [video]</a><span class="sitebit comhead"> (<a href="from?site=infoq.com"><span class="sitestr">infoq.com</span></a>)</span></td></tr><tr><td colspan="2"></td><td class="subtext">\n        <span class="score" id="score_12766493">45 points</span> by <a href="user?id=louthy" class="hnuser">louthy</a> <span class="age"><a href="item?id=12766493">5 hours ago</a></span> <span id="unv_12766493"></span> | <a href="hide?id=12766493&amp;goto=news">hide</a> | <a href="item?id=12766493">7&nbsp;comments</a>              </td></tr>\n      <tr class="spacer" style="height:5px"></tr>\n                <tr class=\'athing\' id=\'12769105\'>\n      <td align="right" valign="top" class="title"><span class="rank">11.</span></td>      <td valign="top" class="votelinks"><center><a id=\'up_12769105\' href=\'vote?id=12769105&amp;how=up&amp;goto=news\'><div class=\'votearrow\' title=\'upvote\'></div></a></center></td><td class="title"><a href="http://www.juliabloggers.com/optimizing-details-of-vectorization-and-metaprogramming/?utm_source=ReviveOldPost&amp;utm_medium=social&amp;utm_campaign=ReviveOldPost" class="storylink">Optimizing .*: Details of Vectorization and Metaprogramming \xe2\x80\x93 Juliabloggers.com</a><span class="sitebit comhead"> (<a href="from?site=juliabloggers.com"><span class="sitestr">juliabloggers.com</span></a>)</span></td></tr><tr><td colspan="2"></td><td class="subtext">\n        <span class="score" id="score_12769105">9 points</span> by <a href="user?id=leephillips" class="hnuser">leephillips</a> <span class="age"><a href="item?id=12769105">1 hour ago</a></span> <span id="unv_12769105"></span> | <a href="hide?id=12769105&amp;goto=news">hide</a> | <a href="item?id=12769105">1&nbsp;comment</a>              </td></tr>\n      <tr class="spacer" style="height:5px"></tr>\n                <tr class=\'athing\' id=\'12766847\'>\n      <td align="right" valign="top" class="title"><span class="rank">12.</span></td>      <td valign="top" class="votelinks"><center><a id=\'up_12766847\' href=\'vote?id=12766847&amp;how=up&amp;goto=news\'><div class=\'votearrow\' title=\'upvote\'></div></a></center></td><td class="title"><a href="https://github.com/SamyPesse/How-to-Make-a-Computer-Operating-System" class="storylink">How to Make a Computer Operating System</a><span class="sitebit comhead"> (<a href="from?site=github.com"><span class="sitestr">github.com</span></a>)</span></td></tr><tr><td colspan="2"></td><td class="subtext">\n        <span class="score" id="score_12766847">24 points</span> by <a href="user?id=hitr" class="hnuser">hitr</a> <span class="age"><a href="item?id=12766847">4 hours ago</a></span> <span id="unv_12766847"></span> | <a href="hide?id=12766847&amp;goto=news">hide</a> | <a href="item?id=12766847">discuss</a>              </td></tr>\n      <tr class="spacer" style="height:5px"></tr>\n                <tr class=\'athing\' id=\'12767038\'>\n      <td align="right" valign="top" class="title"><span class="rank">13.</span></td>      <td valign="top" class="votelinks"><center><a id=\'up_12767038\' href=\'vote?id=12767038&amp;how=up&amp;goto=news\'><div class=\'votearrow\' title=\'upvote\'></div></a></center></td><td class="title"><a href="http://www.blikstein.com/paulo/projects/project_water.html" class="storylink">Programmable Water (2003)</a><span class="sitebit comhead"> (<a href="from?site=blikstein.com"><span class="sitestr">blikstein.com</span></a>)</span></td></tr><tr><td colspan="2"></td><td class="subtext">\n        <span class="score" id="score_12767038">31 points</span> by <a href="user?id=Phithagoras" class="hnuser">Phithagoras</a> <span class="age"><a href="item?id=12767038">6 hours ago</a></span> <span id="unv_12767038"></span> | <a href="hide?id=12767038&amp;goto=news">hide</a> | <a href="item?id=12767038">6&nbsp;comments</a>              </td></tr>\n      <tr class="spacer" style="height:5px"></tr>\n                <tr class=\'athing\' id=\'12767747\'>\n      <td align="right" valign="top" class="title"><span class="rank">14.</span></td>      <td valign="top" class="votelinks"><center><a id=\'up_12767747\' href=\'vote?id=12767747&amp;how=up&amp;goto=news\'><div class=\'votearrow\' title=\'upvote\'></div></a></center></td><td class="title"><a href="http://www.bbc.co.uk/news/resources/idt-150d11df-c541-44a9-9332-560a19828c47" class="storylink">Aberfan: The mistake that cost a village its children</a><span class="sitebit comhead"> (<a href="from?site=bbc.co.uk"><span class="sitestr">bbc.co.uk</span></a>)</span></td></tr><tr><td colspan="2"></td><td class="subtext">\n        <span class="score" id="score_12767747">84 points</span> by <a href="user?id=Patient0" class="hnuser">Patient0</a> <span class="age"><a href="item?id=12767747">9 hours ago</a></span> <span id="unv_12767747"></span> | <a href="hide?id=12767747&amp;goto=news">hide</a> | <a href="item?id=12767747">35&nbsp;comments</a>              </td></tr>\n      <tr class="spacer" style="height:5px"></tr>\n                <tr class=\'athing\' id=\'12769264\'>\n      <td align="right" valign="top" class="title"><span class="rank">15.</span></td>      <td valign="top" class="votelinks"><center><a id=\'up_12769264\' href=\'vote?id=12769264&amp;how=up&amp;goto=news\'><div class=\'votearrow\' title=\'upvote\'></div></a></center></td><td class="title"><a href="http://www.popularmechanics.com/science/energy/news/a23490/iceland-3-mile-hole-magma/" class="storylink">Iceland Is Drilling a 3-Mile Hole to Tap Magma Power</a><span class="sitebit comhead"> (<a href="from?site=popularmechanics.com"><span class="sitestr">popularmechanics.com</span></a>)</span></td></tr><tr><td colspan="2"></td><td class="subtext">\n        <span class="score" id="score_12769264">24 points</span> by <a href="user?id=jonbaer" class="hnuser">jonbaer</a> <span class="age"><a href="item?id=12769264">1 hour ago</a></span> <span id="unv_12769264"></span> | <a href="hide?id=12769264&amp;goto=news">hide</a> | <a href="item?id=12769264">5&nbsp;comments</a>              </td></tr>\n      <tr class="spacer" style="height:5px"></tr>\n                <tr class=\'athing\' id=\'12769196\'>\n      <td align="right" valign="top" class="title"><span class="rank">16.</span></td>      <td valign="top" class="votelinks"><center><a id=\'up_12769196\' href=\'vote?id=12769196&amp;how=up&amp;goto=news\'><div class=\'votearrow\' title=\'upvote\'></div></a></center></td><td class="title"><a href="item?id=12769196" class="storylink">Ask HN: How did Dyn fail to fend off DDOS?</a></td></tr><tr><td colspan="2"></td><td class="subtext">\n        <span class="score" id="score_12769196">18 points</span> by <a href="user?id=ruler88" class="hnuser">ruler88</a> <span class="age"><a href="item?id=12769196">1 hour ago</a></span> <span id="unv_12769196"></span> | <a href="hide?id=12769196&amp;goto=news">hide</a> | <a href="item?id=12769196">12&nbsp;comments</a>              </td></tr>\n      <tr class="spacer" style="height:5px"></tr>\n                <tr class=\'athing\' id=\'12768782\'>\n      <td align="right" valign="top" class="title"><span class="rank">17.</span></td>      <td valign="top" class="votelinks"><center><a id=\'up_12768782\' href=\'vote?id=12768782&amp;how=up&amp;goto=news\'><div class=\'votearrow\' title=\'upvote\'></div></a></center></td><td class="title"><a href="https://github.com/okTurtles/dnschain" class="storylink" rel="nofollow">OkTurtles/dnschain: A blockchain-based DNS and HTTP server</a><span class="sitebit comhead"> (<a href="from?site=github.com"><span class="sitestr">github.com</span></a>)</span></td></tr><tr><td colspan="2"></td><td class="subtext">\n        <span class="score" id="score_12768782">9 points</span> by <a href="user?id=callaars" class="hnuser">callaars</a> <span class="age"><a href="item?id=12768782">2 hours ago</a></span> <span id="unv_12768782"></span> | <a href="hide?id=12768782&amp;goto=news">hide</a> | <a href="item?id=12768782">1&nbsp;comment</a>              </td></tr>\n      <tr class="spacer" style="height:5px"></tr>\n                <tr class=\'athing\' id=\'12766846\'>\n      <td align="right" valign="top" class="title"><span class="rank">18.</span></td>      <td valign="top" class="votelinks"><center><a id=\'up_12766846\' href=\'vote?id=12766846&amp;how=up&amp;goto=news\'><div class=\'votearrow\' title=\'upvote\'></div></a></center></td><td class="title"><a href="http://www.jayconrod.com/posts/52/a-tour-of-v8-object-representation" class="storylink">A tour of V8: object representation (2013)</a><span class="sitebit comhead"> (<a href="from?site=jayconrod.com"><span class="sitestr">jayconrod.com</span></a>)</span></td></tr><tr><td colspan="2"></td><td class="subtext">\n        <span class="score" id="score_12766846">23 points</span> by <a href="user?id=tambourine_man" class="hnuser">tambourine_man</a> <span class="age"><a href="item?id=12766846">6 hours ago</a></span> <span id="unv_12766846"></span> | <a href="hide?id=12766846&amp;goto=news">hide</a> | <a href="item?id=12766846">7&nbsp;comments</a>              </td></tr>\n      <tr class="spacer" style="height:5px"></tr>\n                <tr class=\'athing\' id=\'12766839\'>\n      <td align="right" valign="top" class="title"><span class="rank">19.</span></td>      <td valign="top" class="votelinks"><center><a id=\'up_12766839\' href=\'vote?id=12766839&amp;how=up&amp;goto=news\'><div class=\'votearrow\' title=\'upvote\'></div></a></center></td><td class="title"><a href="http://www.theparisreview.org/interviews/3605/the-art-of-fiction-no-64-kurt-vonnegut" class="storylink" rel="nofollow">Vonnegut: the art of fiction</a><span class="sitebit comhead"> (<a href="from?site=theparisreview.org"><span class="sitestr">theparisreview.org</span></a>)</span></td></tr><tr><td colspan="2"></td><td class="subtext">\n        <span class="score" id="score_12766839">9 points</span> by <a href="user?id=kapitza" class="hnuser">kapitza</a> <span class="age"><a href="item?id=12766839">3 hours ago</a></span> <span id="unv_12766839"></span> | <a href="hide?id=12766839&amp;goto=news">hide</a> | <a href="item?id=12766839">discuss</a>              </td></tr>\n      <tr class="spacer" style="height:5px"></tr>\n                <tr class=\'athing\' id=\'12766174\'>\n      <td align="right" valign="top" class="title"><span class="rank">20.</span></td>      <td valign="top" class="votelinks"><center><a id=\'up_12766174\' href=\'vote?id=12766174&amp;how=up&amp;goto=news\'><div class=\'votearrow\' title=\'upvote\'></div></a></center></td><td class="title"><a href="https://medium.com/@fagnerbrack/how-to-accept-over-engineering-for-what-it-really-is-6fca9a919263" class="storylink">How to Accept Over-Engineering for What It Really Is</a><span class="sitebit comhead"> (<a href="from?site=medium.com"><span class="sitestr">medium.com</span></a>)</span></td></tr><tr><td colspan="2"></td><td class="subtext">\n        <span class="score" id="score_12766174">142 points</span> by <a href="user?id=fagnerbrack" class="hnuser">fagnerbrack</a> <span class="age"><a href="item?id=12766174">14 hours ago</a></span> <span id="unv_12766174"></span> | <a href="hide?id=12766174&amp;goto=news">hide</a> | <a href="item?id=12766174">88&nbsp;comments</a>              </td></tr>\n      <tr class="spacer" style="height:5px"></tr>\n                <tr class=\'athing\' id=\'12766458\'>\n      <td align="right" valign="top" class="title"><span class="rank">21.</span></td>      <td valign="top" class="votelinks"><center><a id=\'up_12766458\' href=\'vote?id=12766458&amp;how=up&amp;goto=news\'><div class=\'votearrow\' title=\'upvote\'></div></a></center></td><td class="title"><a href="http://www.nextplatform.com/2016/09/01/cpu-gpu-put-deep-learning-framework-test/" class="storylink">CPU, GPU Put to Deep Learning Framework Test</a><span class="sitebit comhead"> (<a href="from?site=nextplatform.com"><span class="sitestr">nextplatform.com</span></a>)</span></td></tr><tr><td colspan="2"></td><td class="subtext">\n        <span class="score" id="score_12766458">22 points</span> by <a href="user?id=adamnemecek" class="hnuser">adamnemecek</a> <span class="age"><a href="item?id=12766458">6 hours ago</a></span> <span id="unv_12766458"></span> | <a href="hide?id=12766458&amp;goto=news">hide</a> | <a href="item?id=12766458">11&nbsp;comments</a>              </td></tr>\n      <tr class="spacer" style="height:5px"></tr>\n                <tr class=\'athing\' id=\'12766691\'>\n      <td align="right" valign="top" class="title"><span class="rank">22.</span></td>      <td valign="top" class="votelinks"><center><a id=\'up_12766691\' href=\'vote?id=12766691&amp;how=up&amp;goto=news\'><div class=\'votearrow\' title=\'upvote\'></div></a></center></td><td class="title"><a href="http://blog.adacore.com/how-to-prevent-drone-crashes-using-spark" class="storylink">How to avoid runtime errors on drones using SPARK (2015)</a><span class="sitebit comhead"> (<a href="from?site=adacore.com"><span class="sitestr">adacore.com</span></a>)</span></td></tr><tr><td colspan="2"></td><td class="subtext">\n        <span class="score" id="score_12766691">49 points</span> by <a href="user?id=0srv" class="hnuser">0srv</a> <span class="age"><a href="item?id=12766691">14 hours ago</a></span> <span id="unv_12766691"></span> | <a href="hide?id=12766691&amp;goto=news">hide</a> | <a href="item?id=12766691">20&nbsp;comments</a>              </td></tr>\n      <tr class="spacer" style="height:5px"></tr>\n                <tr class=\'athing\' id=\'12748863\'>\n      <td align="right" valign="top" class="title"><span class="rank">23.</span></td>      <td valign="top" class="votelinks"><center><a id=\'up_12748863\' href=\'vote?id=12748863&amp;how=up&amp;goto=news\'><div class=\'votearrow\' title=\'upvote\'></div></a></center></td><td class="title"><a href="https://www.tesla.com/blog/all-tesla-cars-being-produced-now-have-full-self-driving-hardware" class="storylink">All Tesla Cars Being Produced Now Have Full Self-Driving Hardware</a><span class="sitebit comhead"> (<a href="from?site=tesla.com"><span class="sitestr">tesla.com</span></a>)</span></td></tr><tr><td colspan="2"></td><td class="subtext">\n        <span class="score" id="score_12748863">1423 points</span> by <a href="user?id=impish19" class="hnuser">impish19</a> <span class="age"><a href="item?id=12748863">2 days ago</a></span> <span id="unv_12748863"></span> | <a href="hide?id=12748863&amp;goto=news">hide</a> | <a href="item?id=12748863">1069&nbsp;comments</a>              </td></tr>\n      <tr class="spacer" style="height:5px"></tr>\n                <tr class=\'athing\' id=\'12766613\'>\n      <td align="right" valign="top" class="title"><span class="rank">24.</span></td>      <td valign="top" class="votelinks"><center><a id=\'up_12766613\' href=\'vote?id=12766613&amp;how=up&amp;goto=news\'><div class=\'votearrow\' title=\'upvote\'></div></a></center></td><td class="title"><a href="http://internetcensus2012.bitbucket.org/paper.html" class="storylink" rel="nofollow">Internet Census: Port scanning /0 using insecure embedded devices (2012)</a><span class="sitebit comhead"> (<a href="from?site=bitbucket.org"><span class="sitestr">bitbucket.org</span></a>)</span></td></tr><tr><td colspan="2"></td><td class="subtext">\n        <span class="score" id="score_12766613">12 points</span> by <a href="user?id=bootload" class="hnuser">bootload</a> <span class="age"><a href="item?id=12766613">5 hours ago</a></span> <span id="unv_12766613"></span> | <a href="hide?id=12766613&amp;goto=news">hide</a> | <a href="item?id=12766613">discuss</a>              </td></tr>\n      <tr class="spacer" style="height:5px"></tr>\n                <tr class=\'athing\' id=\'12766419\'>\n      <td align="right" valign="top" class="title"><span class="rank">25.</span></td>      <td valign="top" class="votelinks"><center><a id=\'up_12766419\' href=\'vote?id=12766419&amp;how=up&amp;goto=news\'><div class=\'votearrow\' title=\'upvote\'></div></a></center></td><td class="title"><a href="http://www.japantimes.co.jp/community/2016/06/22/issues/japans-koseki-system-dull-uncaring-terribly-efficient/#.WAq-taOZNP0" class="storylink">Japan\xe2\x80\x99s koseki system: dull, uncaring but efficient</a><span class="sitebit comhead"> (<a href="from?site=japantimes.co.jp"><span class="sitestr">japantimes.co.jp</span></a>)</span></td></tr><tr><td colspan="2"></td><td class="subtext">\n        <span class="score" id="score_12766419">204 points</span> by <a href="user?id=Thevet" class="hnuser">Thevet</a> <span class="age"><a href="item?id=12766419">16 hours ago</a></span> <span id="unv_12766419"></span> | <a href="hide?id=12766419&amp;goto=news">hide</a> | <a href="item?id=12766419">109&nbsp;comments</a>              </td></tr>\n      <tr class="spacer" style="height:5px"></tr>\n                <tr class=\'athing\' id=\'12760235\'>\n      <td align="right" valign="top" class="title"><span class="rank">26.</span></td>      <td valign="top" class="votelinks"><center><a id=\'up_12760235\' href=\'vote?id=12760235&amp;how=up&amp;goto=news\'><div class=\'votearrow\' title=\'upvote\'></div></a></center></td><td class="title"><a href="http://www.bbc.com/news/technology-37713939" class="storylink">Samsung \'blocks\' exploding Note 7 parody videos</a><span class="sitebit comhead"> (<a href="from?site=bbc.com"><span class="sitestr">bbc.com</span></a>)</span></td></tr><tr><td colspan="2"></td><td class="subtext">\n        <span class="score" id="score_12760235">584 points</span> by <a href="user?id=Lio" class="hnuser">Lio</a> <span class="age"><a href="item?id=12760235">1 day ago</a></span> <span id="unv_12760235"></span> | <a href="hide?id=12760235&amp;goto=news">hide</a> | <a href="item?id=12760235">206&nbsp;comments</a>              </td></tr>\n      <tr class="spacer" style="height:5px"></tr>\n                <tr class=\'athing\' id=\'12767214\'>\n      <td align="right" valign="top" class="title"><span class="rank">27.</span></td>      <td valign="top" class="votelinks"><center><a id=\'up_12767214\' href=\'vote?id=12767214&amp;how=up&amp;goto=news\'><div class=\'votearrow\' title=\'upvote\'></div></a></center></td><td class="title"><a href="http://news.stanford.edu/2016/10/20/stanford-researchers-create-new-special-purpose-computer/" class="storylink">Researchers create new computer combining optical and electronic technology</a><span class="sitebit comhead"> (<a href="from?site=stanford.edu"><span class="sitestr">stanford.edu</span></a>)</span></td></tr><tr><td colspan="2"></td><td class="subtext">\n        <span class="score" id="score_12767214">25 points</span> by <a href="user?id=hn-user" class="hnuser">hn-user</a> <span class="age"><a href="item?id=12767214">12 hours ago</a></span> <span id="unv_12767214"></span> | <a href="hide?id=12767214&amp;goto=news">hide</a> | <a href="item?id=12767214">5&nbsp;comments</a>              </td></tr>\n      <tr class="spacer" style="height:5px"></tr>\n                <tr class=\'athing\' id=\'12762462\'>\n      <td align="right" valign="top" class="title"><span class="rank">28.</span></td>      <td valign="top" class="votelinks"><center><a id=\'up_12762462\' href=\'vote?id=12762462&amp;how=up&amp;goto=news\'><div class=\'votearrow\' title=\'upvote\'></div></a></center></td><td class="title"><a href="http://www.esa.int/Our_Activities/Space_Science/ExoMars/Mars_Reconnaissance_Orbiter_views_Schiaparelli_landing_site" class="storylink">Mars Reconnaissance Orbiter views Schiaparelli landing site</a><span class="sitebit comhead"> (<a href="from?site=esa.int"><span class="sitestr">esa.int</span></a>)</span></td></tr><tr><td colspan="2"></td><td class="subtext">\n        <span class="score" id="score_12762462">265 points</span> by <a href="user?id=okket" class="hnuser">okket</a> <span class="age"><a href="item?id=12762462">1 day ago</a></span> <span id="unv_12762462"></span> | <a href="hide?id=12762462&amp;goto=news">hide</a> | <a href="item?id=12762462">108&nbsp;comments</a>              </td></tr>\n      <tr class="spacer" style="height:5px"></tr>\n                <tr class=\'athing\' id=\'12759697\'>\n      <td align="right" valign="top" class="title"><span class="rank">29.</span></td>      <td valign="top" class="votelinks"><center><a id=\'up_12759697\' href=\'vote?id=12759697&amp;how=up&amp;goto=news\'><div class=\'votearrow\' title=\'upvote\'></div></a></center></td><td class="title"><a href="https://www.dynstatus.com/incidents/nlr4yrr162t8" class="storylink">DDoS Attack Against Dyn Managed DNS</a><span class="sitebit comhead"> (<a href="from?site=dynstatus.com"><span class="sitestr">dynstatus.com</span></a>)</span></td></tr><tr><td colspan="2"></td><td class="subtext">\n        <span class="score" id="score_12759697">1538 points</span> by <a href="user?id=owenwil" class="hnuser">owenwil</a> <span class="age"><a href="item?id=12759697">1 day ago</a></span> <span id="unv_12759697"></span> | <a href="hide?id=12759697&amp;goto=news">hide</a> | <a href="item?id=12759697">658&nbsp;comments</a>              </td></tr>\n      <tr class="spacer" style="height:5px"></tr>\n                <tr class=\'athing\' id=\'12766123\'>\n      <td align="right" valign="top" class="title"><span class="rank">30.</span></td>      <td valign="top" class="votelinks"><center><a id=\'up_12766123\' href=\'vote?id=12766123&amp;how=up&amp;goto=news\'><div class=\'votearrow\' title=\'upvote\'></div></a></center></td><td class="title"><a href="https://www.flashpoint-intel.com/mirai-botnet-linked-dyn-dns-ddos-attacks/" class="storylink">Mirai Botnet Linked to Dyn DNS DDoS Attacks</a><span class="sitebit comhead"> (<a href="from?site=flashpoint-intel.com"><span class="sitestr">flashpoint-intel.com</span></a>)</span></td></tr><tr><td colspan="2"></td><td class="subtext">\n        <span class="score" id="score_12766123">111 points</span> by <a href="user?id=ashitlerferad" class="hnuser">ashitlerferad</a> <span class="age"><a href="item?id=12766123">17 hours ago</a></span> <span id="unv_12766123"></span> | <a href="hide?id=12766123&amp;goto=news">hide</a> | <a href="item?id=12766123">93&nbsp;comments</a>              </td></tr>\n      <tr class="spacer" style="height:5px"></tr>\n            <tr class="morespace" style="height:10px"></tr><tr><td colspan="2"></td><td class="title"><a href="news?p=2" class="morelink" rel="nofollow">More</a></td></tr>\n  </table>\n</td></tr>\n<tr><td><img src="s.gif" height="10" width="0"><table width="100%" cellspacing="0" cellpadding="1"><tr><td bgcolor="#ff6600"></td></tr></table><br><center><span class="yclinks"><a href="newsguidelines.html">Guidelines</a>\n        | <a href="newsfaq.html">FAQ</a>\n        | <a href="mailto:hn@ycombinator.com">Support</a>\n        | <a href="https://github.com/HackerNews/API">API</a>\n        | <a href="security.html">Security</a>\n        | <a href="lists">Lists</a>\n        | <a href="bookmarklet.html">Bookmarklet</a>\n        | <a href="dmca.html">DMCA</a>\n        | <a href="http://www.ycombinator.com/apply/">Apply to YC</a>\n        | <a href="mailto:hn@ycombinator.com">Contact</a></span><br><br><form method="get" action="//hn.algolia.com/">Search:\n          <input type="text" name="q" value="" size="17" autocorrect="off" spellcheck="false" autocapitalize="off" autocomplete="false"></form>\n            </center></td></tr>      </table></center></body><script type=\'text/javascript\' src=\'hn.js?WwbEhbljl4NoDa7axYx5\'></script></html>\n'

We will now use lxml to create a programmatic access to the content from HackerNews.

Analyzing HTML Content


In [5]:
page = html.fromstring(response.content)
page


Out[5]:
<Element html at 0x103fc0578>

CSS Selectors

For those of you who are web designers, you are likely very familiar with Cascading Stylesheets (CSS). Here is an example for how to use CSS selector for finding specific HTML elements


In [7]:
posts = page.cssselect('.title')

In [8]:
len(posts)


Out[8]:
61

Details of how to use CSS selectors can be found in the w3 schools site:

http://www.w3schools.com/cssref/css_selectors.asp

XPath

Alternatively, we can use a standard called "XPath" to find specific content in the HTML.


In [9]:
posts = page.xpath('//td[contains(@class, "title")]')

In [10]:
len(posts)


Out[10]:
61

We are only interested in those "td" tags that contain an anchor link to the referred article.


In [11]:
posts = page.xpath('//td[contains(@class, "title")]/a')

In [12]:
len(posts)


Out[12]:
31

So, only half of those "td" tags with "title" contain posts that we are interested in. Let's take a look at the first such post.


In [13]:
first_post = posts[0]
first_post.text


Out[13]:
'Why Haters Hate: Kierkegaard Explains the Psychology of Trolling in 1847'

There is a lot of "content" in the td tag's attributes.


In [14]:
first_post.attrib


Out[14]:
{'href': 'https://www.brainpickings.org/2014/10/13/kierkegaard-diary-bullying-trolling-haters/', 'class': 'storylink'}

In [15]:
first_post.attrib["href"]


Out[15]:
'https://www.brainpickings.org/2014/10/13/kierkegaard-diary-bullying-trolling-haters/'

In [16]:
all_links = []
for p in posts:
    all_links.append((p.text, p.attrib["href"]))

In [17]:
all_links


Out[17]:
[('Why Haters Hate: Kierkegaard Explains the Psychology of Trolling in 1847',
  'https://www.brainpickings.org/2014/10/13/kierkegaard-diary-bullying-trolling-haters/'),
 ('LibVMI: virtual machine introspection', 'http://libvmi.com/'),
 ('GitLab 8.13 Released with Multiple Issue Boards and Merge Conflict Editor',
  'https://about.gitlab.com/2016/10/22/gitlab-8-13-released/'),
 ('A Professor Who Was Right About Index Funds All Along',
  'http://www.bloomberg.com/news/articles/2016-09-22/the-professor-who-was-right-about-index-funds-all-along'),
 ('Xz format inadequate for long-term archiving',
  'http://www.nongnu.org/lzip/xz_inadequate.html'),
 ('ZMODEM', 'https://en.wikipedia.org/wiki/ZMODEM'),
 (u'1177 BC \xe2\x80\x93 The Year Civilization Collapsed [video]',
  'https://www.youtube.com/watch?v=hyry8mgXiTk'),
 ('Google Has Dropped Ban on Personally Identifiable Web Tracking',
  'https://tech.slashdot.org/story/16/10/22/008216/google-has-quietly-dropped-ban-on-personally-identifiable-web-tracking?utm_source=feedburner&utm_medium=feed&utm_campaign=Feed%3A+Slashdot%2Fslashdot+%28Slashdot%29'),
 ('Comparison with Other Frameworks',
  'https://vuejs.org/guide/comparison.html'),
 ('Category Theory for the Working Hacker [video]',
  'https://www.infoq.com/presentations/category-theory-propositions-principle'),
 (u'Optimizing .*: Details of Vectorization and Metaprogramming \xe2\x80\x93 Juliabloggers.com',
  'http://www.juliabloggers.com/optimizing-details-of-vectorization-and-metaprogramming/?utm_source=ReviveOldPost&utm_medium=social&utm_campaign=ReviveOldPost'),
 ('How to Make a Computer Operating System',
  'https://github.com/SamyPesse/How-to-Make-a-Computer-Operating-System'),
 ('Programmable Water (2003)',
  'http://www.blikstein.com/paulo/projects/project_water.html'),
 ('Aberfan: The mistake that cost a village its children',
  'http://www.bbc.co.uk/news/resources/idt-150d11df-c541-44a9-9332-560a19828c47'),
 ('Iceland Is Drilling a 3-Mile Hole to Tap Magma Power',
  'http://www.popularmechanics.com/science/energy/news/a23490/iceland-3-mile-hole-magma/'),
 ('Ask HN: How did Dyn fail to fend off DDOS?', 'item?id=12769196'),
 ('OkTurtles/dnschain: A blockchain-based DNS and HTTP server',
  'https://github.com/okTurtles/dnschain'),
 ('A tour of V8: object representation (2013)',
  'http://www.jayconrod.com/posts/52/a-tour-of-v8-object-representation'),
 ('Vonnegut: the art of fiction',
  'http://www.theparisreview.org/interviews/3605/the-art-of-fiction-no-64-kurt-vonnegut'),
 ('How to Accept Over-Engineering for What It Really Is',
  'https://medium.com/@fagnerbrack/how-to-accept-over-engineering-for-what-it-really-is-6fca9a919263'),
 ('CPU, GPU Put to Deep Learning Framework Test',
  'http://www.nextplatform.com/2016/09/01/cpu-gpu-put-deep-learning-framework-test/'),
 ('How to avoid runtime errors on drones using SPARK (2015)',
  'http://blog.adacore.com/how-to-prevent-drone-crashes-using-spark'),
 ('All Tesla Cars Being Produced Now Have Full Self-Driving Hardware',
  'https://www.tesla.com/blog/all-tesla-cars-being-produced-now-have-full-self-driving-hardware'),
 ('Internet Census: Port scanning /0 using insecure embedded devices (2012)',
  'http://internetcensus2012.bitbucket.org/paper.html'),
 (u'Japan\xe2\x80\x99s koseki system: dull, uncaring but efficient',
  'http://www.japantimes.co.jp/community/2016/06/22/issues/japans-koseki-system-dull-uncaring-terribly-efficient/#.WAq-taOZNP0'),
 ("Samsung 'blocks' exploding Note 7 parody videos",
  'http://www.bbc.com/news/technology-37713939'),
 ('Researchers create new computer combining optical and electronic technology',
  'http://news.stanford.edu/2016/10/20/stanford-researchers-create-new-special-purpose-computer/'),
 ('Mars Reconnaissance Orbiter views Schiaparelli landing site',
  'http://www.esa.int/Our_Activities/Space_Science/ExoMars/Mars_Reconnaissance_Orbiter_views_Schiaparelli_landing_site'),
 ('DDoS Attack Against Dyn Managed DNS',
  'https://www.dynstatus.com/incidents/nlr4yrr162t8'),
 ('Mirai Botnet Linked to Dyn DNS DDoS Attacks',
  'https://www.flashpoint-intel.com/mirai-botnet-linked-dyn-dns-ddos-attacks/'),
 ('More', 'news?p=2')]

Great: when you run the code above (starting from the HTTP request), this list of top content should change from time to time.

More details on how to use XPath can be found in the w3 schools site:

http://www.w3schools.com/xsl/xpath_syntax.asp


In [ ]: