Import needed libraries


In [1]:
import requests
from lxml import html

We used the library "request" last time in getting Twitter data (REST-ful). We are introducing the new "lxml" library for analyzing & extracting HTML elements and attributes here.

Use Requests to get HackerNews content

HackerNews is a community contributed news website with an emphasis on technology related content. Let's grab the set of articles that are at the top of the HN list.


In [57]:
response = requests.get('http://news.ycombinator.com/')
response


Out[57]:
<Response [200]>

In [78]:
response.content


Out[78]:
'<html op="news"><head><meta name="referrer" content="origin"><meta name="viewport" content="width=device-width, initial-scale=1.0"><link rel="stylesheet" type="text/css" href="news.css?vvS6khQlZQ8ssGkyEBXp">\n        <link rel="shortcut icon" href="favicon.ico">\n          <link rel="alternate" type="application/rss+xml" title="RSS" href="rss">\n        <title>Hacker News</title></head><body><center><table id="hnmain" border="0" cellpadding="0" cellspacing="0" width="85%" bgcolor="#f6f6ef">\n        <tr><td bgcolor="#ff6600"><table border="0" cellpadding="0" cellspacing="0" width="100%" style="padding:2px"><tr><td style="width:18px;padding-right:4px"><a href="http://www.ycombinator.com"><img src="y18.gif" width="18" height="18" style="border:1px white solid;"></a></td>\n                  <td style="line-height:12pt; height:10px;"><span class="pagetop"><b class="hnname"><a href="news">Hacker News</a></b>\n              <a href="newest">new</a> | <a href="newcomments">comments</a> | <a href="show">show</a> | <a href="ask">ask</a> | <a href="jobs">jobs</a> | <a href="submit">submit</a>            </span></td><td style="text-align:right;padding-right:4px;"><span class="pagetop">\n                              <a href="login?goto=news">login</a>\n                          </span></td>\n              </tr></table></td></tr>\n<tr style="height:10px"></tr><tr><td><table border="0" cellpadding="0" cellspacing="0" class="itemlist">\n              <tr class=\'athing\' id=\'12144371\'>\n      <td align="right" valign="top" class="title"><span class="rank">1.</span></td>      <td valign="top" class="votelinks"><center><a id=\'up_12144371\' href=\'vote?id=12144371&amp;how=up&amp;goto=news\'><div class=\'votearrow\' title=\'upvote\'></div></a></center></td><td class="title"><a href="https://facebook.github.io/react/blog/2016/07/22/create-apps-with-no-configuration.html" class="storylink">Create React Apps with No Configuration</a><span class="sitebit comhead"> (<a href="from?site=facebook.github.io"><span class="sitestr">facebook.github.io</span></a>)</span></td></tr><tr><td colspan="2"></td><td class="subtext">\n        <span class="score" id="score_12144371">318 points</span> by <a href="user?id=vjeux" class="hnuser">vjeux</a> <span class="age"><a href="item?id=12144371">3 hours ago</a></span> <span id="unv_12144371"></span> | <a href="hide?id=12144371&amp;goto=news">hide</a> | <a href="item?id=12144371">106 comments</a>              </td></tr>\n      <tr class="spacer" style="height:5px"></tr>\n                <tr class=\'athing\' id=\'12145751\'>\n      <td align="right" valign="top" class="title"><span class="rank">2.</span></td>      <td valign="top" class="votelinks"><center><a id=\'up_12145751\' href=\'vote?id=12145751&amp;how=up&amp;goto=news\'><div class=\'votearrow\' title=\'upvote\'></div></a></center></td><td class="title"><a href="https://techcrunch.com/2016/07/22/apple-says-pokemon-go-is-the-most-downloaded-app-in-its-first-week-ever/" class="storylink">Apple says Pok\xc3\xa9mon Go is the most downloaded app in its first week ever</a><span class="sitebit comhead"> (<a href="from?site=techcrunch.com"><span class="sitestr">techcrunch.com</span></a>)</span></td></tr><tr><td colspan="2"></td><td class="subtext">\n        <span class="score" id="score_12145751">18 points</span> by <a href="user?id=doppp" class="hnuser">doppp</a> <span class="age"><a href="item?id=12145751">18 minutes ago</a></span> <span id="unv_12145751"></span> | <a href="hide?id=12145751&amp;goto=news">hide</a> | <a href="item?id=12145751">discuss</a>              </td></tr>\n      <tr class="spacer" style="height:5px"></tr>\n                <tr class=\'athing\' id=\'12143386\'>\n      <td align="right" valign="top" class="title"><span class="rank">3.</span></td>      <td valign="top" class="votelinks"><center><a id=\'up_12143386\' href=\'vote?id=12143386&amp;how=up&amp;goto=news\'><div class=\'votearrow\' title=\'upvote\'></div></a></center></td><td class="title"><a href="http://www.bloomberg.com/news/articles/2016-07-22/verizon-said-nearing-deal-to-buy-yahoo-beating-rival-bidders" class="storylink">Verizon nears deal to acquire Yahoo</a><span class="sitebit comhead"> (<a href="from?site=bloomberg.com"><span class="sitestr">bloomberg.com</span></a>)</span></td></tr><tr><td colspan="2"></td><td class="subtext">\n        <span class="score" id="score_12143386">217 points</span> by <a href="user?id=kshatrea" class="hnuser">kshatrea</a> <span class="age"><a href="item?id=12143386">5 hours ago</a></span> <span id="unv_12143386"></span> | <a href="hide?id=12143386&amp;goto=news">hide</a> | <a href="item?id=12143386">118 comments</a>              </td></tr>\n      <tr class="spacer" style="height:5px"></tr>\n                <tr class=\'athing\' id=\'12143952\'>\n      <td align="right" valign="top" class="title"><span class="rank">4.</span></td>      <td valign="top" class="votelinks"><center><a id=\'up_12143952\' href=\'vote?id=12143952&amp;how=up&amp;goto=news\'><div class=\'votearrow\' title=\'upvote\'></div></a></center></td><td class="title"><a href="http://opus-codec.com/" class="storylink">Opus Interactive Audio Codec v1.1.3 released</a><span class="sitebit comhead"> (<a href="from?site=opus-codec.com"><span class="sitestr">opus-codec.com</span></a>)</span></td></tr><tr><td colspan="2"></td><td class="subtext">\n        <span class="score" id="score_12143952">100 points</span> by <a href="user?id=MrZeus" class="hnuser">MrZeus</a> <span class="age"><a href="item?id=12143952">4 hours ago</a></span> <span id="unv_12143952"></span> | <a href="hide?id=12143952&amp;goto=news">hide</a> | <a href="item?id=12143952">39 comments</a>              </td></tr>\n      <tr class="spacer" style="height:5px"></tr>\n                <tr class=\'athing\' id=\'12143473\'>\n      <td align="right" valign="top" class="title"><span class="rank">5.</span></td>      <td valign="top" class="votelinks"><center><a id=\'up_12143473\' href=\'vote?id=12143473&amp;how=up&amp;goto=news\'><div class=\'votearrow\' title=\'upvote\'></div></a></center></td><td class="title"><a href="http://www.wired.com/2016/07/chef-david-chang-on-deliciousness/" class="storylink">David Chang\xe2\x80\x99s Unified Theory of Deliciousness</a><span class="sitebit comhead"> (<a href="from?site=wired.com"><span class="sitestr">wired.com</span></a>)</span></td></tr><tr><td colspan="2"></td><td class="subtext">\n        <span class="score" id="score_12143473">105 points</span> by <a href="user?id=tptacek" class="hnuser">tptacek</a> <span class="age"><a href="item?id=12143473">5 hours ago</a></span> <span id="unv_12143473"></span> | <a href="hide?id=12143473&amp;goto=news">hide</a> | <a href="item?id=12143473">51 comments</a>              </td></tr>\n      <tr class="spacer" style="height:5px"></tr>\n                <tr class=\'athing\' id=\'12144325\'>\n      <td align="right" valign="top" class="title"><span class="rank">6.</span></td>      <td valign="top" class="votelinks"><center><a id=\'up_12144325\' href=\'vote?id=12144325&amp;how=up&amp;goto=news\'><div class=\'votearrow\' title=\'upvote\'></div></a></center></td><td class="title"><a href="http://jangorecki.github.io/blog/2016-06-30/Boost-Your-Data-Munging-with-R.html" class="storylink">Boost Your Data Munging with R</a><span class="sitebit comhead"> (<a href="from?site=jangorecki.github.io"><span class="sitestr">jangorecki.github.io</span></a>)</span></td></tr><tr><td colspan="2"></td><td class="subtext">\n        <span class="score" id="score_12144325">33 points</span> by <a href="user?id=michaelsbradley" class="hnuser">michaelsbradley</a> <span class="age"><a href="item?id=12144325">3 hours ago</a></span> <span id="unv_12144325"></span> | <a href="hide?id=12144325&amp;goto=news">hide</a> | <a href="item?id=12144325">9 comments</a>              </td></tr>\n      <tr class="spacer" style="height:5px"></tr>\n                <tr class=\'athing\' id=\'12143482\'>\n      <td align="right" valign="top" class="title"><span class="rank">7.</span></td>      <td valign="top" class="votelinks"><center><a id=\'up_12143482\' href=\'vote?id=12143482&amp;how=up&amp;goto=news\'><div class=\'votearrow\' title=\'upvote\'></div></a></center></td><td class="title"><a href="https://www.disneyresearch.com/publication/machine-knitting-compiler/" class="storylink">A Compiler for 3D Machine Knitting</a><span class="sitebit comhead"> (<a href="from?site=disneyresearch.com"><span class="sitestr">disneyresearch.com</span></a>)</span></td></tr><tr><td colspan="2"></td><td class="subtext">\n        <span class="score" id="score_12143482">70 points</span> by <a href="user?id=fitzwatermellow" class="hnuser">fitzwatermellow</a> <span class="age"><a href="item?id=12143482">5 hours ago</a></span> <span id="unv_12143482"></span> | <a href="hide?id=12143482&amp;goto=news">hide</a> | <a href="item?id=12143482">6 comments</a>              </td></tr>\n      <tr class="spacer" style="height:5px"></tr>\n                <tr class=\'athing\' id=\'12145423\'>\n      <td align="right" valign="top" class="title"><span class="rank">8.</span></td>      <td valign="top" class="votelinks"><center><a id=\'up_12145423\' href=\'vote?id=12145423&amp;how=up&amp;goto=news\'><div class=\'votearrow\' title=\'upvote\'></div></a></center></td><td class="title"><a href="https://www.box.com/blog/kubernetes-box-microservices-maximum-velocity/" class="storylink">Kubernetes at Box: Microservices at Maximum Velocity</a><span class="sitebit comhead"> (<a href="from?site=box.com"><span class="sitestr">box.com</span></a>)</span></td></tr><tr><td colspan="2"></td><td class="subtext">\n        <span class="score" id="score_12145423">46 points</span> by <a href="user?id=robszumski" class="hnuser">robszumski</a> <span class="age"><a href="item?id=12145423">57 minutes ago</a></span> <span id="unv_12145423"></span> | <a href="hide?id=12145423&amp;goto=news">hide</a> | <a href="item?id=12145423">1 comment</a>              </td></tr>\n      <tr class="spacer" style="height:5px"></tr>\n                <tr class=\'athing\' id=\'12142770\'>\n      <td align="right" valign="top" class="title"><span class="rank">9.</span></td>      <td valign="top" class="votelinks"><center><a id=\'up_12142770\' href=\'vote?id=12142770&amp;how=up&amp;goto=news\'><div class=\'votearrow\' title=\'upvote\'></div></a></center></td><td class="title"><a href="http://akat1.pl/?id=2" class="storylink">Spawn your shell like it\'s the 90s again</a><span class="sitebit comhead"> (<a href="from?site=akat1.pl"><span class="sitestr">akat1.pl</span></a>)</span></td></tr><tr><td colspan="2"></td><td class="subtext">\n        <span class="score" id="score_12142770">112 points</span> by <a href="user?id=mulander" class="hnuser">mulander</a> <span class="age"><a href="item?id=12142770">8 hours ago</a></span> <span id="unv_12142770"></span> | <a href="hide?id=12142770&amp;goto=news">hide</a> | <a href="item?id=12142770">27 comments</a>              </td></tr>\n      <tr class="spacer" style="height:5px"></tr>\n                <tr class=\'athing\' id=\'12142987\'>\n      <td align="right" valign="top" class="title"><span class="rank">10.</span></td>      <td valign="top" class="votelinks"><center><a id=\'up_12142987\' href=\'vote?id=12142987&amp;how=up&amp;goto=news\'><div class=\'votearrow\' title=\'upvote\'></div></a></center></td><td class="title"><a href="https://mesosphere.com/blog/2016/07/20/serverless-computing-dcos-galactic-fog/" class="storylink">Serverless computing on DC/OS with Galactic Fog</a><span class="sitebit comhead"> (<a href="from?site=mesosphere.com"><span class="sitestr">mesosphere.com</span></a>)</span></td></tr><tr><td colspan="2"></td><td class="subtext">\n        <span class="score" id="score_12142987">76 points</span> by <a href="user?id=realbot" class="hnuser">realbot</a> <span class="age"><a href="item?id=12142987">7 hours ago</a></span> <span id="unv_12142987"></span> | <a href="hide?id=12142987&amp;goto=news">hide</a> | <a href="item?id=12142987">15 comments</a>              </td></tr>\n      <tr class="spacer" style="height:5px"></tr>\n                <tr class=\'athing\' id=\'12143627\'>\n      <td align="right" valign="top" class="title"><span class="rank">11.</span></td>      <td valign="top" class="votelinks"><center><a id=\'up_12143627\' href=\'vote?id=12143627&amp;how=up&amp;goto=news\'><div class=\'votearrow\' title=\'upvote\'></div></a></center></td><td class="title"><a href="https://techcrunch.com/2016/07/22/dear-google-the-future-is-fewer-people-writing-code/" class="storylink">The future is fewer people writing code?</a><span class="sitebit comhead"> (<a href="from?site=techcrunch.com"><span class="sitestr">techcrunch.com</span></a>)</span></td></tr><tr><td colspan="2"></td><td class="subtext">\n        <span class="score" id="score_12143627">44 points</span> by <a href="user?id=pratap103" class="hnuser">pratap103</a> <span class="age"><a href="item?id=12143627">4 hours ago</a></span> <span id="unv_12143627"></span> | <a href="hide?id=12143627&amp;goto=news">hide</a> | <a href="item?id=12143627">93 comments</a>              </td></tr>\n      <tr class="spacer" style="height:5px"></tr>\n                <tr class=\'athing\' id=\'12143713\'>\n      <td align="right" valign="top" class="title"><span class="rank">12.</span></td>      <td valign="top" class="votelinks"><center><a id=\'up_12143713\' href=\'vote?id=12143713&amp;how=up&amp;goto=news\'><div class=\'votearrow\' title=\'upvote\'></div></a></center></td><td class="title"><a href="https://deis.com/blog/2016/docker-storage-introduction/" class="storylink">Docker Storage: An Introduction</a><span class="sitebit comhead"> (<a href="from?site=deis.com"><span class="sitestr">deis.com</span></a>)</span></td></tr><tr><td colspan="2"></td><td class="subtext">\n        <span class="score" id="score_12143713">57 points</span> by <a href="user?id=nslater" class="hnuser">nslater</a> <span class="age"><a href="item?id=12143713">4 hours ago</a></span> <span id="unv_12143713"></span> | <a href="hide?id=12143713&amp;goto=news">hide</a> | <a href="item?id=12143713">16 comments</a>              </td></tr>\n      <tr class="spacer" style="height:5px"></tr>\n                <tr class=\'athing\' id=\'12145879\'>\n      <td align="right" valign="top" class="title"><span class="rank">13.</span></td>      <td valign="top" class="votelinks"><center><a id=\'up_12145879\' href=\'vote?id=12145879&amp;how=up&amp;goto=news\'><div class=\'votearrow\' title=\'upvote\'></div></a></center></td><td class="title"><a href="item?id=12145879" class="storylink">Ask HN: Best monitoring system?</a></td></tr><tr><td colspan="2"></td><td class="subtext">\n        <span class="score" id="score_12145879">3 points</span> by <a href="user?id=mspaulding06" class="hnuser">mspaulding06</a> <span class="age"><a href="item?id=12145879">2 minutes ago</a></span> <span id="unv_12145879"></span> | <a href="hide?id=12145879&amp;goto=news">hide</a> | <a href="item?id=12145879">discuss</a>              </td></tr>\n      <tr class="spacer" style="height:5px"></tr>\n                <tr class=\'athing\' id=\'12143266\'>\n      <td align="right" valign="top" class="title"><span class="rank">14.</span></td>      <td valign="top" class="votelinks"><center><a id=\'up_12143266\' href=\'vote?id=12143266&amp;how=up&amp;goto=news\'><div class=\'votearrow\' title=\'upvote\'></div></a></center></td><td class="title"><a href="item?id=12143266" class="storylink">Ask HN: When you feel stuck in life</a></td></tr><tr><td colspan="2"></td><td class="subtext">\n        <span class="score" id="score_12143266">180 points</span> by <a href="user?id=msleona" class="hnuser">msleona</a> <span class="age"><a href="item?id=12143266">5 hours ago</a></span> <span id="unv_12143266"></span> | <a href="hide?id=12143266&amp;goto=news">hide</a> | <a href="item?id=12143266">148 comments</a>              </td></tr>\n      <tr class="spacer" style="height:5px"></tr>\n                <tr class=\'athing\' id=\'12143121\'>\n      <td align="right" valign="top" class="title"><span class="rank">15.</span></td>      <td valign="top" class="votelinks"><center><a id=\'up_12143121\' href=\'vote?id=12143121&amp;how=up&amp;goto=news\'><div class=\'votearrow\' title=\'upvote\'></div></a></center></td><td class="title"><a href="http://www.theverge.com/2016/7/22/12255426/kickasstorrents-alternate-sites-spring-up" class="storylink">KickassTorrents resurfaces online</a><span class="sitebit comhead"> (<a href="from?site=theverge.com"><span class="sitestr">theverge.com</span></a>)</span></td></tr><tr><td colspan="2"></td><td class="subtext">\n        <span class="score" id="score_12143121">136 points</span> by <a href="user?id=noxin" class="hnuser">noxin</a> <span class="age"><a href="item?id=12143121">6 hours ago</a></span> <span id="unv_12143121"></span> | <a href="hide?id=12143121&amp;goto=news">hide</a> | <a href="item?id=12143121">49 comments</a>              </td></tr>\n      <tr class="spacer" style="height:5px"></tr>\n                <tr class=\'athing\' id=\'12143199\'>\n      <td align="right" valign="top" class="title"><span class="rank">16.</span></td>      <td valign="top" class="votelinks"><center><a id=\'up_12143199\' href=\'vote?id=12143199&amp;how=up&amp;goto=news\'><div class=\'votearrow\' title=\'upvote\'></div></a></center></td><td class="title"><a href="http://www.theatlantic.com/science/archive/2016/07/how-a-guy-from-a-montana-trailer-park-upturned-150-years-of-biology/491702/?single_page=true" class="storylink">How Toby Spribille Overturned 150 Years of Biology about Lichens</a><span class="sitebit comhead"> (<a href="from?site=theatlantic.com"><span class="sitestr">theatlantic.com</span></a>)</span></td></tr><tr><td colspan="2"></td><td class="subtext">\n        <span class="score" id="score_12143199">137 points</span> by <a href="user?id=virmundi" class="hnuser">virmundi</a> <span class="age"><a href="item?id=12143199">6 hours ago</a></span> <span id="unv_12143199"></span> | <a href="hide?id=12143199&amp;goto=news">hide</a> | <a href="item?id=12143199">57 comments</a>              </td></tr>\n      <tr class="spacer" style="height:5px"></tr>\n                <tr class=\'athing\' id=\'12144859\'>\n      <td align="right" valign="top" class="title"><span class="rank">17.</span></td>      <td></td><td class="title"><a href="item?id=12144859" class="storylink">Lead the front-end engineering team at Pachyderm (YC W15)</a></td></tr><tr><td colspan="2"></td><td class="subtext">\n        <span class="age"><a href="item?id=12144859">2 hours ago</a></span> | <a href="hide?id=12144859&amp;goto=news">hide</a>      </td></tr>\n      <tr class="spacer" style="height:5px"></tr>\n                <tr class=\'athing\' id=\'12144005\'>\n      <td align="right" valign="top" class="title"><span class="rank">18.</span></td>      <td valign="top" class="votelinks"><center><a id=\'up_12144005\' href=\'vote?id=12144005&amp;how=up&amp;goto=news\'><div class=\'votearrow\' title=\'upvote\'></div></a></center></td><td class="title"><a href="https://wikileaks.org/dnc-emails/" class="storylink">Search the DNC email database</a><span class="sitebit comhead"> (<a href="from?site=wikileaks.org"><span class="sitestr">wikileaks.org</span></a>)</span></td></tr><tr><td colspan="2"></td><td class="subtext">\n        <span class="score" id="score_12144005">78 points</span> by <a href="user?id=aburan28" class="hnuser">aburan28</a> <span class="age"><a href="item?id=12144005">3 hours ago</a></span> <span id="unv_12144005"></span> | <a href="hide?id=12144005&amp;goto=news">hide</a> | <a href="item?id=12144005">51 comments</a>              </td></tr>\n      <tr class="spacer" style="height:5px"></tr>\n                <tr class=\'athing\' id=\'12136578\'>\n      <td align="right" valign="top" class="title"><span class="rank">19.</span></td>      <td valign="top" class="votelinks"><center><a id=\'up_12136578\' href=\'vote?id=12136578&amp;how=up&amp;goto=news\'><div class=\'votearrow\' title=\'upvote\'></div></a></center></td><td class="title"><a href="https://www.bunniestudios.com/blog/?p=4782" class="storylink">Why I\xe2\x80\x99m Suing the US Government</a><span class="sitebit comhead"> (<a href="from?site=bunniestudios.com"><span class="sitestr">bunniestudios.com</span></a>)</span></td></tr><tr><td colspan="2"></td><td class="subtext">\n        <span class="score" id="score_12136578">1724 points</span> by <a href="user?id=ivank" class="hnuser">ivank</a> <span class="age"><a href="item?id=12136578">1 day ago</a></span> <span id="unv_12136578"></span> | <a href="hide?id=12136578&amp;goto=news">hide</a> | <a href="item?id=12136578">296 comments</a>              </td></tr>\n      <tr class="spacer" style="height:5px"></tr>\n                <tr class=\'athing\' id=\'12140477\'>\n      <td align="right" valign="top" class="title"><span class="rank">20.</span></td>      <td valign="top" class="votelinks"><center><a id=\'up_12140477\' href=\'vote?id=12140477&amp;how=up&amp;goto=news\'><div class=\'votearrow\' title=\'upvote\'></div></a></center></td><td class="title"><a href="https://github.com/FallibleInc/security-guide-for-developers" class="storylink">A practical security guide for web developers</a><span class="sitebit comhead"> (<a href="from?site=github.com"><span class="sitestr">github.com</span></a>)</span></td></tr><tr><td colspan="2"></td><td class="subtext">\n        <span class="score" id="score_12140477">756 points</span> by <a href="user?id=zianwar" class="hnuser">zianwar</a> <span class="age"><a href="item?id=12140477">20 hours ago</a></span> <span id="unv_12140477"></span> | <a href="hide?id=12140477&amp;goto=news">hide</a> | <a href="item?id=12140477">58 comments</a>              </td></tr>\n      <tr class="spacer" style="height:5px"></tr>\n                <tr class=\'athing\' id=\'12145608\'>\n      <td align="right" valign="top" class="title"><span class="rank">21.</span></td>      <td valign="top" class="votelinks"><center><a id=\'up_12145608\' href=\'vote?id=12145608&amp;how=up&amp;goto=news\'><div class=\'votearrow\' title=\'upvote\'></div></a></center></td><td class="title"><a href="http://qz.com/739685/wework-evicted-a-startup-after-it-published-a-negative-blog-post-about-the-co-working-space/" class="storylink" rel="nofollow">WeWork evicted a startup after it published a negative blog post about it</a><span class="sitebit comhead"> (<a href="from?site=qz.com"><span class="sitestr">qz.com</span></a>)</span></td></tr><tr><td colspan="2"></td><td class="subtext">\n        <span class="score" id="score_12145608">8 points</span> by <a href="user?id=minimaxir" class="hnuser">minimaxir</a> <span class="age"><a href="item?id=12145608">37 minutes ago</a></span> <span id="unv_12145608"></span> | <a href="hide?id=12145608&amp;goto=news">hide</a> | <a href="item?id=12145608">1 comment</a>              </td></tr>\n      <tr class="spacer" style="height:5px"></tr>\n                <tr class=\'athing\' id=\'12144445\'>\n      <td align="right" valign="top" class="title"><span class="rank">22.</span></td>      <td valign="top" class="votelinks"><center><a id=\'up_12144445\' href=\'vote?id=12144445&amp;how=up&amp;goto=news\'><div class=\'votearrow\' title=\'upvote\'></div></a></center></td><td class="title"><a href="http://slatestarcodex.com/2016/04/04/the-ideology-is-not-the-movement/" class="storylink">The Ideology Is Not the Movement</a><span class="sitebit comhead"> (<a href="from?site=slatestarcodex.com"><span class="sitestr">slatestarcodex.com</span></a>)</span></td></tr><tr><td colspan="2"></td><td class="subtext">\n        <span class="score" id="score_12144445">26 points</span> by <a href="user?id=l1n" class="hnuser">l1n</a> <span class="age"><a href="item?id=12144445">2 hours ago</a></span> <span id="unv_12144445"></span> | <a href="hide?id=12144445&amp;goto=news">hide</a> | <a href="item?id=12144445">2 comments</a>              </td></tr>\n      <tr class="spacer" style="height:5px"></tr>\n                <tr class=\'athing\' id=\'12142868\'>\n      <td align="right" valign="top" class="title"><span class="rank">23.</span></td>      <td valign="top" class="votelinks"><center><a id=\'up_12142868\' href=\'vote?id=12142868&amp;how=up&amp;goto=news\'><div class=\'votearrow\' title=\'upvote\'></div></a></center></td><td class="title"><a href="https://github.com/huynhquancam/worque" class="storylink">Worque \xe2\x80\x93 CLI written in Ruby to manage and push your daily notes to Slack</a><span class="sitebit comhead"> (<a href="from?site=github.com"><span class="sitestr">github.com</span></a>)</span></td></tr><tr><td colspan="2"></td><td class="subtext">\n        <span class="score" id="score_12142868">19 points</span> by <a href="user?id=hqc" class="hnuser">hqc</a> <span class="age"><a href="item?id=12142868">7 hours ago</a></span> <span id="unv_12142868"></span> | <a href="hide?id=12142868&amp;goto=news">hide</a> | <a href="item?id=12142868">16 comments</a>              </td></tr>\n      <tr class="spacer" style="height:5px"></tr>\n                <tr class=\'athing\' id=\'12142192\'>\n      <td align="right" valign="top" class="title"><span class="rank">24.</span></td>      <td valign="top" class="votelinks"><center><a id=\'up_12142192\' href=\'vote?id=12142192&amp;how=up&amp;goto=news\'><div class=\'votearrow\' title=\'upvote\'></div></a></center></td><td class="title"><a href="http://rocksandwater.net/blog/2016/07/wrightwood-recurrence/" class="storylink">How long should we wait for an overdue earthquake on the San Andreas?</a><span class="sitebit comhead"> (<a href="from?site=rocksandwater.net"><span class="sitestr">rocksandwater.net</span></a>)</span></td></tr><tr><td colspan="2"></td><td class="subtext">\n        <span class="score" id="score_12142192">54 points</span> by <a href="user?id=cossatot" class="hnuser">cossatot</a> <span class="age"><a href="item?id=12142192">11 hours ago</a></span> <span id="unv_12142192"></span> | <a href="hide?id=12142192&amp;goto=news">hide</a> | <a href="item?id=12142192">68 comments</a>              </td></tr>\n      <tr class="spacer" style="height:5px"></tr>\n                <tr class=\'athing\' id=\'12138523\'>\n      <td align="right" valign="top" class="title"><span class="rank">25.</span></td>      <td valign="top" class="votelinks"><center><a id=\'up_12138523\' href=\'vote?id=12138523&amp;how=up&amp;goto=news\'><div class=\'votearrow\' title=\'upvote\'></div></a></center></td><td class="title"><a href="https://medium.com/@peretzp/i-got-arrested-in-kazakhstan-and-represented-myself-in-court-d3764fb738f1#.e2fu9nw2w" class="storylink">I got arrested in Kazakhstan and represented myself in court</a><span class="sitebit comhead"> (<a href="from?site=medium.com"><span class="sitestr">medium.com</span></a>)</span></td></tr><tr><td colspan="2"></td><td class="subtext">\n        <span class="score" id="score_12138523">727 points</span> by <a href="user?id=drpp" class="hnuser">drpp</a> <span class="age"><a href="item?id=12138523">1 day ago</a></span> <span id="unv_12138523"></span> | <a href="hide?id=12138523&amp;goto=news">hide</a> | <a href="item?id=12138523">216 comments</a>              </td></tr>\n      <tr class="spacer" style="height:5px"></tr>\n                <tr class=\'athing\' id=\'12140603\'>\n      <td align="right" valign="top" class="title"><span class="rank">26.</span></td>      <td valign="top" class="votelinks"><center><a id=\'up_12140603\' href=\'vote?id=12140603&amp;how=up&amp;goto=news\'><div class=\'votearrow\' title=\'upvote\'></div></a></center></td><td class="title"><a href="https://techcrunch.com/2016/07/21/reddit-is-still-in-turmoil/" class="storylink">Reddit is still in turmoil</a><span class="sitebit comhead"> (<a href="from?site=techcrunch.com"><span class="sitestr">techcrunch.com</span></a>)</span></td></tr><tr><td colspan="2"></td><td class="subtext">\n        <span class="score" id="score_12140603">309 points</span> by <a href="user?id=minimaxir" class="hnuser">minimaxir</a> <span class="age"><a href="item?id=12140603">20 hours ago</a></span> <span id="unv_12140603"></span> | <a href="hide?id=12140603&amp;goto=news">hide</a> | <a href="item?id=12140603">425 comments</a>              </td></tr>\n      <tr class="spacer" style="height:5px"></tr>\n                <tr class=\'athing\' id=\'12133766\'>\n      <td align="right" valign="top" class="title"><span class="rank">27.</span></td>      <td valign="top" class="votelinks"><center><a id=\'up_12133766\' href=\'vote?id=12133766&amp;how=up&amp;goto=news\'><div class=\'votearrow\' title=\'upvote\'></div></a></center></td><td class="title"><a href="https://www.tesla.com/blog/master-plan-part-deux" class="storylink">Master Plan, Part Deux</a><span class="sitebit comhead"> (<a href="from?site=tesla.com"><span class="sitestr">tesla.com</span></a>)</span></td></tr><tr><td colspan="2"></td><td class="subtext">\n        <span class="score" id="score_12133766">1770 points</span> by <a href="user?id=arturogarrido" class="hnuser">arturogarrido</a> <span class="age"><a href="item?id=12133766">1 day ago</a></span> <span id="unv_12133766"></span> | <a href="hide?id=12133766&amp;goto=news">hide</a> | <a href="item?id=12133766">661 comments</a>              </td></tr>\n      <tr class="spacer" style="height:5px"></tr>\n                <tr class=\'athing\' id=\'12142177\'>\n      <td align="right" valign="top" class="title"><span class="rank">28.</span></td>      <td valign="top" class="votelinks"><center><a id=\'up_12142177\' href=\'vote?id=12142177&amp;how=up&amp;goto=news\'><div class=\'votearrow\' title=\'upvote\'></div></a></center></td><td class="title"><a href="http://boilingsteam.com/life-is-strange-a-groundhog-day-simulator/" class="storylink">Life is Strange is now on Linux (Square Enix game)</a><span class="sitebit comhead"> (<a href="from?site=boilingsteam.com"><span class="sitestr">boilingsteam.com</span></a>)</span></td></tr><tr><td colspan="2"></td><td class="subtext">\n        <span class="score" id="score_12142177">143 points</span> by <a href="user?id=ekianjo" class="hnuser">ekianjo</a> <span class="age"><a href="item?id=12142177">11 hours ago</a></span> <span id="unv_12142177"></span> | <a href="hide?id=12142177&amp;goto=news">hide</a> | <a href="item?id=12142177">56 comments</a>              </td></tr>\n      <tr class="spacer" style="height:5px"></tr>\n                <tr class=\'athing\' id=\'12140997\'>\n      <td align="right" valign="top" class="title"><span class="rank">29.</span></td>      <td valign="top" class="votelinks"><center><a id=\'up_12140997\' href=\'vote?id=12140997&amp;how=up&amp;goto=news\'><div class=\'votearrow\' title=\'upvote\'></div></a></center></td><td class="title"><a href="http://llvm.org/docs/ProgrammersManual.html" class="storylink">LLVM Programmer\xe2\x80\x99s Manual</a><span class="sitebit comhead"> (<a href="from?site=llvm.org"><span class="sitestr">llvm.org</span></a>)</span></td></tr><tr><td colspan="2"></td><td class="subtext">\n        <span class="score" id="score_12140997">133 points</span> by <a href="user?id=adamnemecek" class="hnuser">adamnemecek</a> <span class="age"><a href="item?id=12140997">18 hours ago</a></span> <span id="unv_12140997"></span> | <a href="hide?id=12140997&amp;goto=news">hide</a> | <a href="item?id=12140997">62 comments</a>              </td></tr>\n      <tr class="spacer" style="height:5px"></tr>\n                <tr class=\'athing\' id=\'12142728\'>\n      <td align="right" valign="top" class="title"><span class="rank">30.</span></td>      <td valign="top" class="votelinks"><center><a id=\'up_12142728\' href=\'vote?id=12142728&amp;how=up&amp;goto=news\'><div class=\'votearrow\' title=\'upvote\'></div></a></center></td><td class="title"><a href="item?id=12142728" class="storylink">Ask HN: Anyone else having no email deliver with SendGrid?</a></td></tr><tr><td colspan="2"></td><td class="subtext">\n        <span class="score" id="score_12142728">69 points</span> by <a href="user?id=samwillis" class="hnuser">samwillis</a> <span class="age"><a href="item?id=12142728">8 hours ago</a></span> <span id="unv_12142728"></span> | <a href="hide?id=12142728&amp;goto=news">hide</a> | <a href="item?id=12142728">62 comments</a>              </td></tr>\n      <tr class="spacer" style="height:5px"></tr>\n            <tr class="morespace" style="height:10px"></tr><tr><td colspan="2"></td><td class="title"><a href="news?p=2" class="morelink" rel="nofollow">More</a></td></tr>\n  </table>\n</td></tr>\n<tr><td><img src="s.gif" height="10" width="0"><table width="100%" cellspacing="0" cellpadding="1"><tr><td bgcolor="#ff6600"></td></tr></table><br><center><span class="yclinks"><a href="newsguidelines.html">Guidelines</a>\n        | <a href="newsfaq.html">FAQ</a>\n        | <a href="mailto:hn@ycombinator.com">Support</a>\n        | <a href="https://github.com/HackerNews/API">API</a>\n        | <a href="security.html">Security</a>\n        | <a href="lists">Lists</a>\n        | <a href="bookmarklet.html">Bookmarklet</a>\n        | <a href="dmca.html">DMCA</a>\n        | <a href="http://www.ycombinator.com/apply/">Apply to YC</a>\n        | <a href="mailto:hn@ycombinator.com">Contact</a></span><br><br><form method="get" action="//hn.algolia.com/">Search:\n          <input type="text" name="q" value="" size="17" autocorrect="off" spellcheck="false" autocapitalize="off" autocomplete="false"></form>\n            </center></td></tr>      </table></center></body><script type=\'text/javascript\' src=\'hn.js?vvS6khQlZQ8ssGkyEBXp\'></script></html>\n'

We will now use lxml to create a programmatic access to the content from HackerNews.

Analyzing HTML Content


In [79]:
page = html.fromstring(response.content)
page


Out[79]:
<Element html at 0x10482b578>

CSS Selectors

For those of you who are web designers, you are likely very familiar with Cascading Stylesheets (CSS). Here is an example for how to use CSS selector for finding specific HTML elements


In [80]:
posts = page.cssselect('.title')

In [81]:
len(posts)


Out[81]:
61

Details of how to use CSS selectors can be found in the w3 schools site:

http://www.w3schools.com/cssref/css_selectors.asp

XPath

Alternatively, we can use a standard called "XPath" to find specific content in the HTML.


In [61]:
posts = page.xpath('//td[contains(@class, "title")]')

In [62]:
len(posts)


Out[62]:
61

We are only interested in those "td" tags that contain an anchor link to the referred article.


In [84]:
posts = page.xpath('//td[contains(@class, "title")]/a')

In [85]:
len(posts)


Out[85]:
31

So, only half of those "td" tags with "title" contain posts that we are interested in. Let's take a look at the first such post.


In [86]:
first_post = posts[0]
first_post.text


Out[86]:
'Create React Apps with No Configuration'

There is a lot of "content" in the td tag's attributes.


In [88]:
first_post.attrib


Out[88]:
{'href': 'https://facebook.github.io/react/blog/2016/07/22/create-apps-with-no-configuration.html', 'class': 'storylink'}

In [89]:
first_post.attrib["href"]


Out[89]:
'https://facebook.github.io/react/blog/2016/07/22/create-apps-with-no-configuration.html'

In [90]:
all_links = []
for p in posts:
    all_links.append((p.text, p.attrib["href"]))

In [91]:
all_links


Out[91]:
[('Create React Apps with No Configuration',
  'https://facebook.github.io/react/blog/2016/07/22/create-apps-with-no-configuration.html'),
 (u'Apple says Pok\xc3\xa9mon Go is the most downloaded app in its first week ever',
  'https://techcrunch.com/2016/07/22/apple-says-pokemon-go-is-the-most-downloaded-app-in-its-first-week-ever/'),
 ('Verizon nears deal to acquire Yahoo',
  'http://www.bloomberg.com/news/articles/2016-07-22/verizon-said-nearing-deal-to-buy-yahoo-beating-rival-bidders'),
 ('Opus Interactive Audio Codec v1.1.3 released', 'http://opus-codec.com/'),
 (u'David Chang\xe2\x80\x99s Unified Theory of Deliciousness',
  'http://www.wired.com/2016/07/chef-david-chang-on-deliciousness/'),
 ('Boost Your Data Munging with R',
  'http://jangorecki.github.io/blog/2016-06-30/Boost-Your-Data-Munging-with-R.html'),
 ('A Compiler for 3D Machine Knitting',
  'https://www.disneyresearch.com/publication/machine-knitting-compiler/'),
 ('Kubernetes at Box: Microservices at Maximum Velocity',
  'https://www.box.com/blog/kubernetes-box-microservices-maximum-velocity/'),
 ("Spawn your shell like it's the 90s again", 'http://akat1.pl/?id=2'),
 ('Serverless computing on DC/OS with Galactic Fog',
  'https://mesosphere.com/blog/2016/07/20/serverless-computing-dcos-galactic-fog/'),
 ('The future is fewer people writing code?',
  'https://techcrunch.com/2016/07/22/dear-google-the-future-is-fewer-people-writing-code/'),
 ('Docker Storage: An Introduction',
  'https://deis.com/blog/2016/docker-storage-introduction/'),
 ('Ask HN: Best monitoring system?', 'item?id=12145879'),
 ('Ask HN: When you feel stuck in life', 'item?id=12143266'),
 ('KickassTorrents resurfaces online',
  'http://www.theverge.com/2016/7/22/12255426/kickasstorrents-alternate-sites-spring-up'),
 ('How Toby Spribille Overturned 150 Years of Biology about Lichens',
  'http://www.theatlantic.com/science/archive/2016/07/how-a-guy-from-a-montana-trailer-park-upturned-150-years-of-biology/491702/?single_page=true'),
 ('Lead the front-end engineering team at Pachyderm (YC W15)',
  'item?id=12144859'),
 ('Search the DNC email database', 'https://wikileaks.org/dnc-emails/'),
 (u'Why I\xe2\x80\x99m Suing the US Government',
  'https://www.bunniestudios.com/blog/?p=4782'),
 ('A practical security guide for web developers',
  'https://github.com/FallibleInc/security-guide-for-developers'),
 ('WeWork evicted a startup after it published a negative blog post about it',
  'http://qz.com/739685/wework-evicted-a-startup-after-it-published-a-negative-blog-post-about-the-co-working-space/'),
 ('The Ideology Is Not the Movement',
  'http://slatestarcodex.com/2016/04/04/the-ideology-is-not-the-movement/'),
 (u'Worque \xe2\x80\x93 CLI written in Ruby to manage and push your daily notes to Slack',
  'https://github.com/huynhquancam/worque'),
 ('How long should we wait for an overdue earthquake on the San Andreas?',
  'http://rocksandwater.net/blog/2016/07/wrightwood-recurrence/'),
 ('I got arrested in Kazakhstan and represented myself in court',
  'https://medium.com/@peretzp/i-got-arrested-in-kazakhstan-and-represented-myself-in-court-d3764fb738f1#.e2fu9nw2w'),
 ('Reddit is still in turmoil',
  'https://techcrunch.com/2016/07/21/reddit-is-still-in-turmoil/'),
 ('Master Plan, Part Deux',
  'https://www.tesla.com/blog/master-plan-part-deux'),
 ('Life is Strange is now on Linux (Square Enix game)',
  'http://boilingsteam.com/life-is-strange-a-groundhog-day-simulator/'),
 (u'LLVM Programmer\xe2\x80\x99s Manual',
  'http://llvm.org/docs/ProgrammersManual.html'),
 ('Ask HN: Anyone else having no email deliver with SendGrid?',
  'item?id=12142728'),
 ('More', 'news?p=2')]

Great: when you run the code above (starting from the HTTP request), this list of top content should change from time to time.

More details on how to use XPath can be found in the w3 schools site:

http://www.w3schools.com/xsl/xpath_syntax.asp


In [ ]: