Python Request 库入门

1 urllib2 和 Request对比

Get请求至https://api.github.com/


In [4]:
import urllib2
import requests
import json
gh_url = 'https://api.github.com'
gh_user = 'gaufung'
gh_pw = 'gaofenggit123'
req = urllib2.Request(gh_url)

password_manager = urllib2.HTTPPasswordMgrWithDefaultRealm()
password_manager.add_password(None, gh_url, gh_user, gh_pw)

auth_manager = urllib2.HTTPBasicAuthHandler(password_manager)
opener = urllib2.build_opener(auth_manager)

urllib2.install_opener(opener)

handler = urllib2.urlopen(req)

if handler.getcode() == requests.codes.ok:
    text = handler.read()
    d_text = json.loads(text)
    for k, v in d_text.items():
        print k, v


issues_url https://api.github.com/issues
current_user_repositories_url https://api.github.com/user/repos{?type,page,per_page,sort}
rate_limit_url https://api.github.com/rate_limit
repository_search_url https://api.github.com/search/repositories?q={query}{&page,per_page,sort,order}
user_organizations_url https://api.github.com/user/orgs
commit_search_url https://api.github.com/search/commits?q={query}{&page,per_page,sort,order}
repository_url https://api.github.com/repos/{owner}/{repo}
emojis_url https://api.github.com/emojis
hub_url https://api.github.com/hub
keys_url https://api.github.com/user/keys
following_url https://api.github.com/user/following{/target}
emails_url https://api.github.com/user/emails
authorizations_url https://api.github.com/authorizations
code_search_url https://api.github.com/search/code?q={query}{&page,per_page,sort,order}
followers_url https://api.github.com/user/followers
public_gists_url https://api.github.com/gists/public
organization_url https://api.github.com/orgs/{org}
gists_url https://api.github.com/gists{/gist_id}
feeds_url https://api.github.com/feeds
user_search_url https://api.github.com/search/users?q={query}{&page,per_page,sort,order}
user_url https://api.github.com/users/{user}
events_url https://api.github.com/events
organization_repositories_url https://api.github.com/orgs/{org}/repos{?type,page,per_page,sort}
current_user_url https://api.github.com/user
issue_search_url https://api.github.com/search/issues?q={query}{&page,per_page,sort,order}
notifications_url https://api.github.com/notifications
starred_url https://api.github.com/user/starred{/owner}{/repo}
starred_gists_url https://api.github.com/gists/starred
current_user_authorizations_html_url https://github.com/settings/connections/applications{/client_id}
user_repositories_url https://api.github.com/users/{user}/repos{?type,page,per_page,sort}
team_url https://api.github.com/teams

In [7]:
import requests
import json
gh_url = 'https://api.github.com'
gh_user = 'gaufung'
gh_pw = 'gaofenggit123'
r = requests.get(gh_url,auth=(gh_user,gh_pw))
if r.status_code == requests.codes.ok:
    for k, v in r.json().items():
        print k,v

2 基本用法


In [12]:
import requests
cs_url = 'http://httpbin.org'
r = requests.get("%s/%s" % (cs_url, 'get'))
r = requests.post("%s/%s" % (cs_url, 'post'))
r = requests.put("%s/%s" % (cs_url, 'put'))
r = requests.delete("%s/%s" % (cs_url, 'delete'))
r = requests.patch("%s/%s" % (cs_url, 'patch'))
r = requests.options("%s/%s" % (cs_url, 'get'))

3 URL 传参

https://encrypted.google.com/search?q=hello

<协议>://<域名>/<接口>?<键1>=<值1>&<键2>=<值2>

requests 库提供的 HTTP 方法,都提供了名为 params 的参数。这个参数可以接受一个 Python 字典,并自动格式化为上述格式。


In [18]:
import requests
cs_url =  'https://www.so.com/s'
param = {'ie':'utf-8','q':'query'}
r = requests.get(cs_url,params = param)
print r.url


https://www.so.com/s?q=query&ie=utf-8

4 设置超时

requests 的超时设置以秒为单位。例如,对请求加参数 timeout = 5 即可设置超时为 5 秒


In [22]:
import requests
cs_url = 'https://www.zhihu.com'
r = requests.get(cs_url,timeout=100)

5 请求头


In [23]:
import requests

cs_url = 'http://httpbin.org/get'
r = requests.get (cs_url)
print r.content


{
  "args": {}, 
  "headers": {
    "Accept": "*/*", 
    "Accept-Encoding": "gzip, deflate", 
    "Host": "httpbin.org", 
    "User-Agent": "python-requests/2.11.1"
  }, 
  "origin": "117.136.68.150", 
  "url": "http://httpbin.org/get"
}

通常我们比较关注其中的 User-Agent 和 Accept-Encoding。如果我们要修改 HTTP 头中的这两项内容,只需要将一个合适的字典参数传给 headers 即可。


In [24]:
import requests

my_headers = {'User-Agent' : 'From Liam Huang', 'Accept-Encoding' : 'gzip'}
cs_url = 'http://httpbin.org/get'
r = requests.get (cs_url, headers = my_headers)
print r.content


{
  "args": {}, 
  "headers": {
    "Accept": "*/*", 
    "Accept-Encoding": "gzip", 
    "Host": "httpbin.org", 
    "User-Agent": "From Liam Huang"
  }, 
  "origin": "117.136.68.150", 
  "url": "http://httpbin.org/get"
}

6 响应头


In [25]:
import requests

cs_url = 'http://httpbin.org/get'
r = requests.get (cs_url)
print r.headers


{'Content-Length': '239', 'Server': 'nginx', 'Connection': 'keep-alive', 'Access-Control-Allow-Credentials': 'true', 'Date': 'Fri, 06 Jan 2017 07:29:47 GMT', 'Access-Control-Allow-Origin': '*', 'Content-Type': 'application/json'}

7 响应内容

长期以来,互联网都存在带宽有限的情况。因此,网络上传输的数据,很多情况下都是经过压缩的。经由 requests 发送的请求,当收到的响应内容经过 gzip 或 deflate 压缩时,requests 会自动为我们解包。我们可以用 Response.content 来获得以字节形式返回的相应内容。


In [26]:
import requests

cs_url = 'https://www.zhihu.com'
r = requests.get (cs_url)

if r.status_code == requests.codes.ok:
    print r.content

如果相应内容不是文本,而是二进制数据(比如图片),则需要进行响应的解码


In [27]:
import requests
from PIL import Image
from StringIO import StringIO

cs_url = 'http://liam0205.me/uploads/avatar/avatar-2.jpg'
r = requests.get (cs_url)

if r.status_code == requests.codes.ok:
    Image.open(StringIO(r.content)).show()

文本模式解码


In [29]:
import requests

cs_url = 'https://www.zhihu.com'
r = requests.get (cs_url,auth=('gaofengcumt@126.com','gaofengcumt'))

if r.status_code == requests.codes.ok:
    print r.text
else:
    print 'bad request'


bad request

8 反序列化 JSON 数据


In [30]:
import requests

cs_url   = 'http://ip.taobao.com/service/getIpInfo.php'
my_param = {'ip':'8.8.8.8'}

r = requests.get(cs_url, params = my_param)

print r.json()['data']['country'].encode('utf-8')


美国

In [ ]: