The requests
module lets you easily download files from the web without complicated issues.
requests
does not come with Python, so it must be installed manually with pip
.
In [2]:
# Test the requests module by importing it
import requests
# Store a website url in a response object that can be queried
res = requests.get('https://automatetheboringstuff.com/files/rj.txt')
Response objects can be checked via status codes:
In [3]:
res.status_code
Out[3]:
The response object has succeded, and all values are stored within it:
In [5]:
# Print the first 100 lines
print(res.text[:1000])
A typical way to deal with status is to use a raise_for_status()
statement, which will crash if a file is not found, and can be used in conjunction with boolean statements, and try
and except
statements.
In [7]:
# Run method on existing response object; won't raise anything because no error
res.raise_for_status()
# An example bad request
badres = requests.get('https://automatetheboringstuff.com/134513135465614561456')
badres.raise_for_status()
Files downloaded in this way must be stored in wb
or write-binary
method, to preserve the unicode formatting of this text. An explanation of unicode and its relationship to Python can be found here.
To store this file, we therefore need to write it in 'byte' chunks to a binary file. A useful method to help do this is the response object's iter_content
method.
In [10]:
# Open/create a file to store the bytes, using a new name
playFile= open('files/RomeoAnd Juliet.txt', 'wb')
# Iteratively write each 100,000 byte 'chunk' of data into this file
for chunk in res.iter_content(100000):
playFile.write(chunk)
# Close to save file
playFile.close()
The requests
module is the preferred method for dealing with files, and the documentation can help explore a variety of use cases.
It excels only at downloading specific files from specific urls; it cannot handle logins and other complex actions. A browser simulator like selenium
is often superior for such actions.
requests
module is a third-party module for downloading web pages and files.requests.get()
returns a Respone object..status_code
and raise_for_status()
methods can retrieve the status codes of the response object, which can inform the success or failure of the operation.iter._content()
method can be used to iteratively write byte chunks to a file, in order to save binary files locally.