The requests module lets you easily download files from the web without complicated issues.
requests does not come with Python, so it must be installed manually with pip.
In [2]:
# Test the requests module by importing it
import requests
# Store a website url in a response object that can be queried
res = requests.get('https://automatetheboringstuff.com/files/rj.txt')
Response objects can be checked via status codes:
In [3]:
res.status_code
Out[3]:
The response object has succeded, and all values are stored within it:
In [5]:
# Print the first 100 lines
print(res.text[:1000])
A typical way to deal with status is to use a raise_for_status() statement, which will crash if a file is not found, and can be used in conjunction with boolean statements, and try and except statements.
In [7]:
# Run method on existing response object; won't raise anything because no error
res.raise_for_status()
# An example bad request
badres = requests.get('https://automatetheboringstuff.com/134513135465614561456')
badres.raise_for_status()
Files downloaded in this way must be stored in wb or write-binary method, to preserve the unicode formatting of this text. An explanation of unicode and its relationship to Python can be found here.
To store this file, we therefore need to write it in 'byte' chunks to a binary file. A useful method to help do this is the response object's iter_content method.
In [10]:
# Open/create a file to store the bytes, using a new name
playFile= open('files/RomeoAnd Juliet.txt', 'wb')
# Iteratively write each 100,000 byte 'chunk' of data into this file
for chunk in res.iter_content(100000):
playFile.write(chunk)
# Close to save file
playFile.close()
The requests module is the preferred method for dealing with files, and the documentation can help explore a variety of use cases.
It excels only at downloading specific files from specific urls; it cannot handle logins and other complex actions. A browser simulator like selenium is often superior for such actions.
requests module is a third-party module for downloading web pages and files.requests.get() returns a Respone object..status_code and raise_for_status() methods can retrieve the status codes of the response object, which can inform the success or failure of the operation.iter._content() method can be used to iteratively write byte chunks to a file, in order to save binary files locally.