In [4]:
import requests
from bs4 import BeautifulSoup
In [5]:
r = requests.get("")
data = r.text
soup = BeautifulSoup(data)
In [ ]:
print r.text
Print just the paragraph elements of the page:
In [10]:
print soup('p')
[<p> </p>, <p><a class="grouped_elements" href="" rel="tc-fancybox-group56"><img alt="Python_for_Scientists small" class="alignleft wp-image-298 size-medium" height="300" src="//" width="200"/></a></p>, <p> </p>, <p>The Python for Scientists and Engineers course, based on my highly successful <a href="">Kickstarter</a>, seeks to teach you advanced Python by building awesome projects.</p>, <p><strong>Practice, not theory</strong></p>, <p>The course will be heavily practical, with little or no theory. The goal is to get you using Python for real world engineering applications. For each topic, we will choose a real case scenario and build a quick solution in Python to solve our problem.</p>, <p>These are the topics we will cover:</p>, <p><strong>Introduction to Python</strong></p>, <p>I will cover the basics of the Python, specifically for the programmers wanting to use Python for engineering.This will be a quick introduction to Python for people who know at least one other programming language.</p>, <p><strong>Image and Video processing</strong></p>, <p><a class="grouped_elements" href="" rel="tc-fancybox-group56"><img alt="abba_face_detected" class="aligncenter wp-image-10 " height="319" src="//" width="345"/></a></p>, <p><a class="grouped_elements" href="" rel="tc-fancybox-group56"><img alt="motion2" class="aligncenter wp-image-135 size-thumbnail" height="150" src="//" width="150"/></a></p>, <p> </p>, <p><strong>Audio</strong></p>, <p>Create a sine wave, find its frequency, simple filtering</p>, <p><a class="grouped_elements" href="" rel="tc-fancybox-group56"><img alt="noisy4" class="aligncenter wp-image-136 " height="407" src="//" width="545"/></a></p>, <p><strong>Analysis and plotting with Numpy, Scipy and Matplotlib</strong></p>, <p>Learn how to work with and graph scientific data</p>, <p> </p>, <p><img alt="audacity3" class="aligncenter wp-image-16" height="299" src="//" width="715"/></p>, <p> </p>, <p><strong>Machine Learning</strong></p>, <p>Build an Amazon like recommendation engine in Python.</p>, <p><a class="grouped_elements" href="" rel="tc-fancybox-group56"><img alt="Marvin_(HHGG)" class="aligncenter wp-image-17" height="302" src="//" width="184"/></a></p>, <p> </p>, <p><strong>Statistics and data manipulation</strong></p>, <p>The Python pandas library is Python’s answer to R, and used extensively in financial analysis.</p>, <p> </p>, <p><strong>Turn your Raspberry Pi into a web server<br/>
</strong></p>, <p><a class="grouped_elements" href="" rel="tc-fancybox-group56"><img alt="300px-RaspberryPi" class="aligncenter size-full wp-image-36" height="200" src="//" width="300"/></a></p>, <p>Learn how to control your Pi via a web browser, using your laptop or even iPad:</p>, <p><a class="grouped_elements" href="" rel="tc-fancybox-group56"><img alt="rpi" class="aligncenter wp-image-138 " height="443" src="//" width="635"/></a></p>, <p> </p>, <p><strong>Buy now: </strong><em><br/>
</em></p>, <p>These options are for individuals. Teams, please contact me.</p>, <p><strong>Option 1.</strong> Get the book, all the code, plus a Virtual Machine to run the examples.</p>, <p>Price: $39<strong><br/>
</strong></p>, <p><strong>Option 2:</strong> Get the above, plus videos of all the courses.</p>, <p>Price: $99</p>, <p><strong>Available </strong>on <a href="">LeanPub</a>.</p>, <p>Want to pay by Paypal? Contact me.</p>, <p><strong>FAQ</strong></p>, <p><em>1. What’s this Virtual Machine I will get?</em></p>, <p>The biggest hassle with projects like these is installing libraries, struggling with version differences, 32/64 bit versions of libraries, etc. You can spend more time installing libraries than running the code. For this reason, I will create a Virtual Machine and do most of the testing there. You can have this VM too. It means you can start coding immediately, without wasting any time installing libraries.</p>, <p class="comment-notes"><span id="email-notes">Your email address will not be published.</span> Required fields are marked <span class="required">*</span></p>, <p class="comment-form-author"><label for="author">Name <span class="required">*</span></label> <input aria-required="true" id="author" name="author" size="30" type="text" value=""/></p>, <p class="comment-form-email"><label for="email">Email <span class="required">*</span></label> <input aria-describedby="email-notes" aria-required="true" id="email" name="email" size="30" type="text" value=""/></p>, <p class="comment-form-url"><label for="url">Website</label> <input id="url" name="url" size="30" type="text" value=""/></p>, <p class="comment-form-comment"><label for="comment">Comment</label> <textarea aria-describedby="form-allowed-tags" aria-required="true" cols="45" id="comment" name="comment" rows="8"></textarea></p>, <p class="form-allowed-tags" id="form-allowed-tags">You may use these <abbr title="HyperText Markup Language">HTML</abbr> tags and attributes: <code><a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong> </code></p>, <p class="form-submit">
<input class="submit" id="submit" name="submit" type="submit" value="Post Comment"/>
<input id="comment_post_ID" name="comment_post_ID" type="hidden" value="56"/>
<input id="comment_parent" name="comment_parent" type="hidden" value="0"/>
</p>, <p style="display: none;"><input id="akismet_comment_nonce" name="akismet_comment_nonce" type="hidden" value="debb71cef4"/></p>, <p style="display: none;"><input id="ak_js" name="ak_js" type="hidden" value="242"/></p>, <p>· © 2015 <a href="" rel="bookmark" title="Python For Engineers">Python For Engineers</a> · Designed by <a href="">Themes & Co</a> ·</p>, <p class="pull-right"><a class="back-to-top" href="#">Back to top</a></p>, <p><img alt="Clicky" height="1" src="//" width="1"/></p>]
Print the image elements of the page:
In [8]:
print soup('img')
[<img alt="Python_for_Scientists small" class="alignleft wp-image-298 size-medium" height="300" src="//" width="200"/>, <img alt="abba_face_detected" class="aligncenter wp-image-10 " height="319" src="//" width="345"/>, <img alt="motion2" class="aligncenter wp-image-135 size-thumbnail" height="150" src="//" width="150"/>, <img alt="noisy4" class="aligncenter wp-image-136 " height="407" src="//" width="545"/>, <img alt="audacity3" class="aligncenter wp-image-16" height="299" src="//" width="715"/>, <img alt="Marvin_(HHGG)" class="aligncenter wp-image-17" height="302" src="//" width="184"/>, <img alt="300px-RaspberryPi" class="aligncenter size-full wp-image-36" height="200" src="//" width="300"/>, <img alt="rpi" class="aligncenter wp-image-138 " height="443" src="//" width="635"/>, <img alt="Clicky" height="1" src="//" width="1"/>]
In [9]:
for link in soup.find_all('a'):
print link
<a class="site-title" href="" title="Python For Engineers | ">Python For Engineers</a>
<a href="">Home</a>
<a href="">Articles</a>
<a href="">Contact</a>
<a href="">Forum</a>
<a class="trail-begin" href="" rel="home" title="Python For Engineers">Home</a>
<a class="grouped_elements" href="" rel="tc-fancybox-group56"><img alt="Python_for_Scientists small" class="alignleft wp-image-298 size-medium" height="300" src="//" width="200"/></a>
<a href="">Kickstarter</a>
<a class="grouped_elements" href="" rel="tc-fancybox-group56"><img alt="abba_face_detected" class="aligncenter wp-image-10 " height="319" src="//" width="345"/></a>
<a class="grouped_elements" href="" rel="tc-fancybox-group56"><img alt="motion2" class="aligncenter wp-image-135 size-thumbnail" height="150" src="//" width="150"/></a>
<a class="grouped_elements" href="" rel="tc-fancybox-group56"><img alt="noisy4" class="aligncenter wp-image-136 " height="407" src="//" width="545"/></a>
<a class="grouped_elements" href="" rel="tc-fancybox-group56"><img alt="Marvin_(HHGG)" class="aligncenter wp-image-17" height="302" src="//" width="184"/></a>
<a class="grouped_elements" href="" rel="tc-fancybox-group56"><img alt="300px-RaspberryPi" class="aligncenter size-full wp-image-36" height="200" src="//" width="300"/></a>
<a class="grouped_elements" href="" rel="tc-fancybox-group56"><img alt="rpi" class="aligncenter wp-image-138 " height="443" src="//" width="635"/></a>
<a href="">LeanPub</a>
<a href="/pythonforengineersbook/#respond" id="cancel-comment-reply-link" rel="nofollow" style="display:none;">Cancel reply</a>
<a class="social-icon icon-feed" href="" title="Subscribe to my rss feed"></a>
<a href="" rel="bookmark" title="Python For Engineers">Python For Engineers</a>
<a href="">Themes & Co</a>
<a class="back-to-top" href="#">Back to top</a>
In [7]:
for link in soup.find_all('a'):
print link.get('href')
In [7]:
Content source: shantnu/WebScrapingCourse
Similar notebooks: