Load the content from a website with urllib.request

In this example we use urllib.request to load the content from a website.



In [1]:

    
import urllib.request

Some sites will block request from urlib, so we set a custom 'User-Agent' header to load the content from the remote site.



In [2]:

    
url = 'https://medium.com/tag/machine-learning'
req = urllib.request.Request(url, headers={'User-Agent' : "Magic Browser"}) 
con = urllib.request.urlopen(req)

Let's check the HTTP status and the message.



In [3]:

    
print(con.status, con.msg)

We can check if a specific HTTP request header exists



In [4]:

    
con.getheader('Content-Type')









    Out[4]:





'text/html; charset=utf-8'

Now we can load the content from the website



In [5]:

    
text = con.read()
text[:500]









    Out[5]:





b'<!DOCTYPE html><html xmlns:cc="http://creativecommons.org/ns#"><head prefix="og: http://ogp.me/ns# fb: http://ogp.me/ns/fb# medium-com: http://ogp.me/ns/fb/medium-com#"><meta http-equiv="Content-Type" content="text/html; charset=utf-8"><meta name="viewport" content="width=device-width, initial-scale=1.0, viewport-fit=contain"><title>The most insightful stories about Machine Learning \xe2\x80\x93 Medium</title><link rel="canonical" href="https://medium.com/tag/machine-learning"><link id="feedLink" rel="al'



In [ ]: