In [1]:
import urllib.request
Some sites will block request from urlib, so we set a custom 'User-Agent' header to load the content from the remote site.
In [2]:
url = 'https://medium.com/tag/machine-learning'
req = urllib.request.Request(url, headers={'User-Agent' : "Magic Browser"})
con = urllib.request.urlopen(req)
Let's check the HTTP status and the message.
In [3]:
print(con.status, con.msg)
We can check if a specific HTTP request header exists
In [4]:
con.getheader('Content-Type')
Out[4]:
Now we can load the content from the website
In [5]:
text = con.read()
text[:500]
Out[5]:
In [ ]: