Can we identify different types of text documents based on the frequency of their words? Can we identify different authors, styles, or disciplines like medical versus information technology?
We can start with counting the occurance of words in a document. Hereby, words should be converted to one case (e.g. lower case), and all punctuation characters should be eliminated.
Our program reads a (plain) text file, isolates individual words, and computes their frequencies in the document.
The following steps outline the process:
In [43]:
from urllib.request import urlopen
# from urllib.request import *
In [45]:
# in order to get the help text, we should import the whole subpackage.
import urllib.request
help(urllib.request)
Help on module urllib.request in urllib:
NAME
urllib.request - An extensible library for opening URLs using a variety of protocols
DESCRIPTION
The simplest way to use this module is to call the urlopen function,
which accepts a string containing a URL or a Request object (described
below). It opens the URL and returns the results as file-like
object; the returned object has some extra methods described below.
The OpenerDirector manages a collection of Handler objects that do
all the actual work. Each Handler implements a particular protocol or
option. The OpenerDirector is a composite object that invokes the
Handlers needed to open the requested URL. For example, the
HTTPHandler performs HTTP GET and POST requests and deals with
non-error returns. The HTTPRedirectHandler automatically deals with
HTTP 301, 302, 303 and 307 redirect errors, and the HTTPDigestAuthHandler
deals with digest authentication.
urlopen(url, data=None) -- Basic usage is the same as original
urllib. pass the url and optionally data to post to an HTTP URL, and
get a file-like object back. One difference is that you can also pass
a Request instance instead of URL. Raises a URLError (subclass of
OSError); for HTTP errors, raises an HTTPError, which can also be
treated as a valid response.
build_opener -- Function that creates a new OpenerDirector instance.
Will install the default handlers. Accepts one or more Handlers as
arguments, either instances or Handler classes that it will
instantiate. If one of the argument is a subclass of the default
handler, the argument will be installed instead of the default.
install_opener -- Installs a new opener as the default opener.
objects of interest:
OpenerDirector -- Sets up the User Agent as the Python-urllib client and manages
the Handler classes, while dealing with requests and responses.
Request -- An object that encapsulates the state of a request. The
state can be as simple as the URL. It can also include extra HTTP
headers, e.g. a User-Agent.
BaseHandler --
internals:
BaseHandler and parent
_call_chain conventions
Example usage:
import urllib.request
# set up authentication info
authinfo = urllib.request.HTTPBasicAuthHandler()
authinfo.add_password(realm='PDQ Application',
uri='https://mahler:8092/site-updates.py',
user='klem',
passwd='geheim$parole')
proxy_support = urllib.request.ProxyHandler({"http" : "http://ahad-haam:3128"})
# build a new opener that adds authentication and caching FTP handlers
opener = urllib.request.build_opener(proxy_support, authinfo,
urllib.request.CacheFTPHandler)
# install it
urllib.request.install_opener(opener)
f = urllib.request.urlopen('http://www.python.org/')
CLASSES
builtins.object
AbstractBasicAuthHandler
HTTPBasicAuthHandler(AbstractBasicAuthHandler, BaseHandler)
ProxyBasicAuthHandler(AbstractBasicAuthHandler, BaseHandler)
AbstractDigestAuthHandler
BaseHandler
DataHandler
FTPHandler
CacheFTPHandler
FileHandler
HTTPCookieProcessor
HTTPDefaultErrorHandler
HTTPDigestAuthHandler(BaseHandler, AbstractDigestAuthHandler)
HTTPErrorProcessor
HTTPRedirectHandler
ProxyDigestAuthHandler(BaseHandler, AbstractDigestAuthHandler)
ProxyHandler
UnknownHandler
HTTPPasswordMgr
HTTPPasswordMgrWithDefaultRealm
OpenerDirector
Request
URLopener
FancyURLopener
AbstractHTTPHandler(BaseHandler)
HTTPHandler
HTTPSHandler
class AbstractBasicAuthHandler(builtins.object)
| Methods defined here:
|
| __init__(self, password_mgr=None)
|
| http_error_auth_reqed(self, authreq, host, req, headers)
|
| retry_http_basic_auth(self, host, req, realm)
|
| ----------------------------------------------------------------------
| Data descriptors defined here:
|
| __dict__
| dictionary for instance variables (if defined)
|
| __weakref__
| list of weak references to the object (if defined)
|
| ----------------------------------------------------------------------
| Data and other attributes defined here:
|
| rx = re.compile('(?:.*,)*[ \t]*([^ \t]+)[ \t]+realm=(["\']?)([^"\']*)\...
class AbstractDigestAuthHandler(builtins.object)
| Methods defined here:
|
| __init__(self, passwd=None)
|
| get_algorithm_impls(self, algorithm)
|
| get_authorization(self, req, chal)
|
| get_cnonce(self, nonce)
|
| get_entity_digest(self, data, chal)
|
| http_error_auth_reqed(self, auth_header, host, req, headers)
|
| reset_retry_count(self)
|
| retry_http_digest_auth(self, req, auth)
|
| ----------------------------------------------------------------------
| Data descriptors defined here:
|
| __dict__
| dictionary for instance variables (if defined)
|
| __weakref__
| list of weak references to the object (if defined)
class BaseHandler(builtins.object)
| Methods defined here:
|
| __lt__(self, other)
|
| add_parent(self, parent)
|
| close(self)
|
| ----------------------------------------------------------------------
| Data descriptors defined here:
|
| __dict__
| dictionary for instance variables (if defined)
|
| __weakref__
| list of weak references to the object (if defined)
|
| ----------------------------------------------------------------------
| Data and other attributes defined here:
|
| handler_order = 500
class CacheFTPHandler(FTPHandler)
| Method resolution order:
| CacheFTPHandler
| FTPHandler
| BaseHandler
| builtins.object
|
| Methods defined here:
|
| __init__(self)
| # XXX would be nice to have pluggable cache strategies
| # XXX this stuff is definitely not thread safe
|
| check_cache(self)
|
| clear_cache(self)
|
| connect_ftp(self, user, passwd, host, port, dirs, timeout)
|
| setMaxConns(self, m)
|
| setTimeout(self, t)
|
| ----------------------------------------------------------------------
| Methods inherited from FTPHandler:
|
| ftp_open(self, req)
|
| ----------------------------------------------------------------------
| Methods inherited from BaseHandler:
|
| __lt__(self, other)
|
| add_parent(self, parent)
|
| close(self)
|
| ----------------------------------------------------------------------
| Data descriptors inherited from BaseHandler:
|
| __dict__
| dictionary for instance variables (if defined)
|
| __weakref__
| list of weak references to the object (if defined)
|
| ----------------------------------------------------------------------
| Data and other attributes inherited from BaseHandler:
|
| handler_order = 500
class DataHandler(BaseHandler)
| Method resolution order:
| DataHandler
| BaseHandler
| builtins.object
|
| Methods defined here:
|
| data_open(self, req)
|
| ----------------------------------------------------------------------
| Methods inherited from BaseHandler:
|
| __lt__(self, other)
|
| add_parent(self, parent)
|
| close(self)
|
| ----------------------------------------------------------------------
| Data descriptors inherited from BaseHandler:
|
| __dict__
| dictionary for instance variables (if defined)
|
| __weakref__
| list of weak references to the object (if defined)
|
| ----------------------------------------------------------------------
| Data and other attributes inherited from BaseHandler:
|
| handler_order = 500
class FTPHandler(BaseHandler)
| Method resolution order:
| FTPHandler
| BaseHandler
| builtins.object
|
| Methods defined here:
|
| connect_ftp(self, user, passwd, host, port, dirs, timeout)
|
| ftp_open(self, req)
|
| ----------------------------------------------------------------------
| Methods inherited from BaseHandler:
|
| __lt__(self, other)
|
| add_parent(self, parent)
|
| close(self)
|
| ----------------------------------------------------------------------
| Data descriptors inherited from BaseHandler:
|
| __dict__
| dictionary for instance variables (if defined)
|
| __weakref__
| list of weak references to the object (if defined)
|
| ----------------------------------------------------------------------
| Data and other attributes inherited from BaseHandler:
|
| handler_order = 500
class FancyURLopener(URLopener)
| Derived class with handlers for errors we can handle (perhaps).
|
| Method resolution order:
| FancyURLopener
| URLopener
| builtins.object
|
| Methods defined here:
|
| __init__(self, *args, **kwargs)
|
| get_user_passwd(self, host, realm, clear_cache=0)
|
| http_error_301(self, url, fp, errcode, errmsg, headers, data=None)
| Error 301 -- also relocated (permanently).
|
| http_error_302(self, url, fp, errcode, errmsg, headers, data=None)
| Error 302 -- relocated (temporarily).
|
| http_error_303(self, url, fp, errcode, errmsg, headers, data=None)
| Error 303 -- also relocated (essentially identical to 302).
|
| http_error_307(self, url, fp, errcode, errmsg, headers, data=None)
| Error 307 -- relocated, but turn POST into error.
|
| http_error_401(self, url, fp, errcode, errmsg, headers, data=None, retry=False)
| Error 401 -- authentication required.
| This function supports Basic authentication only.
|
| http_error_407(self, url, fp, errcode, errmsg, headers, data=None, retry=False)
| Error 407 -- proxy authentication required.
| This function supports Basic authentication only.
|
| http_error_default(self, url, fp, errcode, errmsg, headers)
| Default error handling -- don't raise an exception.
|
| prompt_user_passwd(self, host, realm)
| Override this in a GUI environment!
|
| redirect_internal(self, url, fp, errcode, errmsg, headers, data)
|
| retry_http_basic_auth(self, url, realm, data=None)
|
| retry_https_basic_auth(self, url, realm, data=None)
|
| retry_proxy_http_basic_auth(self, url, realm, data=None)
|
| retry_proxy_https_basic_auth(self, url, realm, data=None)
|
| ----------------------------------------------------------------------
| Methods inherited from URLopener:
|
| __del__(self)
|
| addheader(self, *args)
| Add a header to be used by the HTTP interface only
| e.g. u.addheader('Accept', 'sound/basic')
|
| cleanup(self)
|
| close(self)
|
| http_error(self, url, fp, errcode, errmsg, headers, data=None)
| Handle http errors.
|
| Derived class can override this, or provide specific handlers
| named http_error_DDD where DDD is the 3-digit error code.
|
| open(self, fullurl, data=None)
| Use URLopener().open(file) instead of open(file, 'r').
|
| open_data(self, url, data=None)
| Use "data" URL.
|
| open_file(self, url)
| Use local file or FTP depending on form of URL.
|
| open_ftp(self, url)
| Use FTP protocol.
|
| open_http(self, url, data=None)
| Use HTTP protocol.
|
| open_https(self, url, data=None)
| Use HTTPS protocol.
|
| open_local_file(self, url)
| Use local file.
|
| open_unknown(self, fullurl, data=None)
| Overridable interface to open unknown URL type.
|
| open_unknown_proxy(self, proxy, fullurl, data=None)
| Overridable interface to open unknown URL type.
|
| retrieve(self, url, filename=None, reporthook=None, data=None)
| retrieve(url) returns (filename, headers) for a local object
| or (tempfilename, headers) for a remote object.
|
| ----------------------------------------------------------------------
| Data descriptors inherited from URLopener:
|
| __dict__
| dictionary for instance variables (if defined)
|
| __weakref__
| list of weak references to the object (if defined)
|
| ----------------------------------------------------------------------
| Data and other attributes inherited from URLopener:
|
| version = 'Python-urllib/3.4'
class FileHandler(BaseHandler)
| Method resolution order:
| FileHandler
| BaseHandler
| builtins.object
|
| Methods defined here:
|
| file_open(self, req)
| # Use local file or FTP depending on form of URL
|
| get_names(self)
|
| open_local_file(self, req)
| # not entirely sure what the rules are here
|
| ----------------------------------------------------------------------
| Data and other attributes defined here:
|
| names = None
|
| ----------------------------------------------------------------------
| Methods inherited from BaseHandler:
|
| __lt__(self, other)
|
| add_parent(self, parent)
|
| close(self)
|
| ----------------------------------------------------------------------
| Data descriptors inherited from BaseHandler:
|
| __dict__
| dictionary for instance variables (if defined)
|
| __weakref__
| list of weak references to the object (if defined)
|
| ----------------------------------------------------------------------
| Data and other attributes inherited from BaseHandler:
|
| handler_order = 500
class HTTPBasicAuthHandler(AbstractBasicAuthHandler, BaseHandler)
| Method resolution order:
| HTTPBasicAuthHandler
| AbstractBasicAuthHandler
| BaseHandler
| builtins.object
|
| Methods defined here:
|
| http_error_401(self, req, fp, code, msg, headers)
|
| ----------------------------------------------------------------------
| Data and other attributes defined here:
|
| auth_header = 'Authorization'
|
| ----------------------------------------------------------------------
| Methods inherited from AbstractBasicAuthHandler:
|
| __init__(self, password_mgr=None)
|
| http_error_auth_reqed(self, authreq, host, req, headers)
|
| retry_http_basic_auth(self, host, req, realm)
|
| ----------------------------------------------------------------------
| Data descriptors inherited from AbstractBasicAuthHandler:
|
| __dict__
| dictionary for instance variables (if defined)
|
| __weakref__
| list of weak references to the object (if defined)
|
| ----------------------------------------------------------------------
| Data and other attributes inherited from AbstractBasicAuthHandler:
|
| rx = re.compile('(?:.*,)*[ \t]*([^ \t]+)[ \t]+realm=(["\']?)([^"\']*)\...
|
| ----------------------------------------------------------------------
| Methods inherited from BaseHandler:
|
| __lt__(self, other)
|
| add_parent(self, parent)
|
| close(self)
|
| ----------------------------------------------------------------------
| Data and other attributes inherited from BaseHandler:
|
| handler_order = 500
class HTTPCookieProcessor(BaseHandler)
| Method resolution order:
| HTTPCookieProcessor
| BaseHandler
| builtins.object
|
| Methods defined here:
|
| __init__(self, cookiejar=None)
|
| http_request(self, request)
|
| http_response(self, request, response)
|
| https_request = http_request(self, request)
|
| https_response = http_response(self, request, response)
|
| ----------------------------------------------------------------------
| Methods inherited from BaseHandler:
|
| __lt__(self, other)
|
| add_parent(self, parent)
|
| close(self)
|
| ----------------------------------------------------------------------
| Data descriptors inherited from BaseHandler:
|
| __dict__
| dictionary for instance variables (if defined)
|
| __weakref__
| list of weak references to the object (if defined)
|
| ----------------------------------------------------------------------
| Data and other attributes inherited from BaseHandler:
|
| handler_order = 500
class HTTPDefaultErrorHandler(BaseHandler)
| Method resolution order:
| HTTPDefaultErrorHandler
| BaseHandler
| builtins.object
|
| Methods defined here:
|
| http_error_default(self, req, fp, code, msg, hdrs)
|
| ----------------------------------------------------------------------
| Methods inherited from BaseHandler:
|
| __lt__(self, other)
|
| add_parent(self, parent)
|
| close(self)
|
| ----------------------------------------------------------------------
| Data descriptors inherited from BaseHandler:
|
| __dict__
| dictionary for instance variables (if defined)
|
| __weakref__
| list of weak references to the object (if defined)
|
| ----------------------------------------------------------------------
| Data and other attributes inherited from BaseHandler:
|
| handler_order = 500
class HTTPDigestAuthHandler(BaseHandler, AbstractDigestAuthHandler)
| An authentication protocol defined by RFC 2069
|
| Digest authentication improves on basic authentication because it
| does not transmit passwords in the clear.
|
| Method resolution order:
| HTTPDigestAuthHandler
| BaseHandler
| AbstractDigestAuthHandler
| builtins.object
|
| Methods defined here:
|
| http_error_401(self, req, fp, code, msg, headers)
|
| ----------------------------------------------------------------------
| Data and other attributes defined here:
|
| auth_header = 'Authorization'
|
| handler_order = 490
|
| ----------------------------------------------------------------------
| Methods inherited from BaseHandler:
|
| __lt__(self, other)
|
| add_parent(self, parent)
|
| close(self)
|
| ----------------------------------------------------------------------
| Data descriptors inherited from BaseHandler:
|
| __dict__
| dictionary for instance variables (if defined)
|
| __weakref__
| list of weak references to the object (if defined)
|
| ----------------------------------------------------------------------
| Methods inherited from AbstractDigestAuthHandler:
|
| __init__(self, passwd=None)
|
| get_algorithm_impls(self, algorithm)
|
| get_authorization(self, req, chal)
|
| get_cnonce(self, nonce)
|
| get_entity_digest(self, data, chal)
|
| http_error_auth_reqed(self, auth_header, host, req, headers)
|
| reset_retry_count(self)
|
| retry_http_digest_auth(self, req, auth)
class HTTPErrorProcessor(BaseHandler)
| Process HTTP error responses.
|
| Method resolution order:
| HTTPErrorProcessor
| BaseHandler
| builtins.object
|
| Methods defined here:
|
| http_response(self, request, response)
|
| https_response = http_response(self, request, response)
|
| ----------------------------------------------------------------------
| Data and other attributes defined here:
|
| handler_order = 1000
|
| ----------------------------------------------------------------------
| Methods inherited from BaseHandler:
|
| __lt__(self, other)
|
| add_parent(self, parent)
|
| close(self)
|
| ----------------------------------------------------------------------
| Data descriptors inherited from BaseHandler:
|
| __dict__
| dictionary for instance variables (if defined)
|
| __weakref__
| list of weak references to the object (if defined)
class HTTPHandler(AbstractHTTPHandler)
| Method resolution order:
| HTTPHandler
| AbstractHTTPHandler
| BaseHandler
| builtins.object
|
| Methods defined here:
|
| http_open(self, req)
|
| http_request = do_request_(self, request)
|
| ----------------------------------------------------------------------
| Methods inherited from AbstractHTTPHandler:
|
| __init__(self, debuglevel=0)
|
| do_open(self, http_class, req, **http_conn_args)
| Return an HTTPResponse object for the request, using http_class.
|
| http_class must implement the HTTPConnection API from http.client.
|
| do_request_(self, request)
|
| set_http_debuglevel(self, level)
|
| ----------------------------------------------------------------------
| Methods inherited from BaseHandler:
|
| __lt__(self, other)
|
| add_parent(self, parent)
|
| close(self)
|
| ----------------------------------------------------------------------
| Data descriptors inherited from BaseHandler:
|
| __dict__
| dictionary for instance variables (if defined)
|
| __weakref__
| list of weak references to the object (if defined)
|
| ----------------------------------------------------------------------
| Data and other attributes inherited from BaseHandler:
|
| handler_order = 500
class HTTPPasswordMgr(builtins.object)
| Methods defined here:
|
| __init__(self)
|
| add_password(self, realm, uri, user, passwd)
|
| find_user_password(self, realm, authuri)
|
| is_suburi(self, base, test)
| Check if test is below base in a URI tree
|
| Both args must be URIs in reduced form.
|
| reduce_uri(self, uri, default_port=True)
| Accept authority or URI and extract only the authority and path.
|
| ----------------------------------------------------------------------
| Data descriptors defined here:
|
| __dict__
| dictionary for instance variables (if defined)
|
| __weakref__
| list of weak references to the object (if defined)
class HTTPPasswordMgrWithDefaultRealm(HTTPPasswordMgr)
| Method resolution order:
| HTTPPasswordMgrWithDefaultRealm
| HTTPPasswordMgr
| builtins.object
|
| Methods defined here:
|
| find_user_password(self, realm, authuri)
|
| ----------------------------------------------------------------------
| Methods inherited from HTTPPasswordMgr:
|
| __init__(self)
|
| add_password(self, realm, uri, user, passwd)
|
| is_suburi(self, base, test)
| Check if test is below base in a URI tree
|
| Both args must be URIs in reduced form.
|
| reduce_uri(self, uri, default_port=True)
| Accept authority or URI and extract only the authority and path.
|
| ----------------------------------------------------------------------
| Data descriptors inherited from HTTPPasswordMgr:
|
| __dict__
| dictionary for instance variables (if defined)
|
| __weakref__
| list of weak references to the object (if defined)
class HTTPRedirectHandler(BaseHandler)
| Method resolution order:
| HTTPRedirectHandler
| BaseHandler
| builtins.object
|
| Methods defined here:
|
| http_error_301 = http_error_302(self, req, fp, code, msg, headers)
|
| http_error_302(self, req, fp, code, msg, headers)
| # Implementation note: To avoid the server sending us into an
| # infinite loop, the request object needs to track what URLs we
| # have already seen. Do this by adding a handler-specific
| # attribute to the Request object.
|
| http_error_303 = http_error_302(self, req, fp, code, msg, headers)
|
| http_error_307 = http_error_302(self, req, fp, code, msg, headers)
|
| redirect_request(self, req, fp, code, msg, headers, newurl)
| Return a Request or None in response to a redirect.
|
| This is called by the http_error_30x methods when a
| redirection response is received. If a redirection should
| take place, return a new Request to allow http_error_30x to
| perform the redirect. Otherwise, raise HTTPError if no-one
| else should try to handle this url. Return None if you can't
| but another Handler might.
|
| ----------------------------------------------------------------------
| Data and other attributes defined here:
|
| inf_msg = 'The HTTP server returned a redirect error that w...n infini...
|
| max_redirections = 10
|
| max_repeats = 4
|
| ----------------------------------------------------------------------
| Methods inherited from BaseHandler:
|
| __lt__(self, other)
|
| add_parent(self, parent)
|
| close(self)
|
| ----------------------------------------------------------------------
| Data descriptors inherited from BaseHandler:
|
| __dict__
| dictionary for instance variables (if defined)
|
| __weakref__
| list of weak references to the object (if defined)
|
| ----------------------------------------------------------------------
| Data and other attributes inherited from BaseHandler:
|
| handler_order = 500
class HTTPSHandler(AbstractHTTPHandler)
| Method resolution order:
| HTTPSHandler
| AbstractHTTPHandler
| BaseHandler
| builtins.object
|
| Methods defined here:
|
| __init__(self, debuglevel=0, context=None, check_hostname=None)
|
| https_open(self, req)
|
| https_request = do_request_(self, request)
|
| ----------------------------------------------------------------------
| Methods inherited from AbstractHTTPHandler:
|
| do_open(self, http_class, req, **http_conn_args)
| Return an HTTPResponse object for the request, using http_class.
|
| http_class must implement the HTTPConnection API from http.client.
|
| do_request_(self, request)
|
| set_http_debuglevel(self, level)
|
| ----------------------------------------------------------------------
| Methods inherited from BaseHandler:
|
| __lt__(self, other)
|
| add_parent(self, parent)
|
| close(self)
|
| ----------------------------------------------------------------------
| Data descriptors inherited from BaseHandler:
|
| __dict__
| dictionary for instance variables (if defined)
|
| __weakref__
| list of weak references to the object (if defined)
|
| ----------------------------------------------------------------------
| Data and other attributes inherited from BaseHandler:
|
| handler_order = 500
class OpenerDirector(builtins.object)
| Methods defined here:
|
| __init__(self)
|
| add_handler(self, handler)
|
| close(self)
|
| error(self, proto, *args)
|
| open(self, fullurl, data=None, timeout=<object object at 0x7f4cd4cc8130>)
|
| ----------------------------------------------------------------------
| Data descriptors defined here:
|
| __dict__
| dictionary for instance variables (if defined)
|
| __weakref__
| list of weak references to the object (if defined)
class ProxyBasicAuthHandler(AbstractBasicAuthHandler, BaseHandler)
| Method resolution order:
| ProxyBasicAuthHandler
| AbstractBasicAuthHandler
| BaseHandler
| builtins.object
|
| Methods defined here:
|
| http_error_407(self, req, fp, code, msg, headers)
|
| ----------------------------------------------------------------------
| Data and other attributes defined here:
|
| auth_header = 'Proxy-authorization'
|
| ----------------------------------------------------------------------
| Methods inherited from AbstractBasicAuthHandler:
|
| __init__(self, password_mgr=None)
|
| http_error_auth_reqed(self, authreq, host, req, headers)
|
| retry_http_basic_auth(self, host, req, realm)
|
| ----------------------------------------------------------------------
| Data descriptors inherited from AbstractBasicAuthHandler:
|
| __dict__
| dictionary for instance variables (if defined)
|
| __weakref__
| list of weak references to the object (if defined)
|
| ----------------------------------------------------------------------
| Data and other attributes inherited from AbstractBasicAuthHandler:
|
| rx = re.compile('(?:.*,)*[ \t]*([^ \t]+)[ \t]+realm=(["\']?)([^"\']*)\...
|
| ----------------------------------------------------------------------
| Methods inherited from BaseHandler:
|
| __lt__(self, other)
|
| add_parent(self, parent)
|
| close(self)
|
| ----------------------------------------------------------------------
| Data and other attributes inherited from BaseHandler:
|
| handler_order = 500
class ProxyDigestAuthHandler(BaseHandler, AbstractDigestAuthHandler)
| Method resolution order:
| ProxyDigestAuthHandler
| BaseHandler
| AbstractDigestAuthHandler
| builtins.object
|
| Methods defined here:
|
| http_error_407(self, req, fp, code, msg, headers)
|
| ----------------------------------------------------------------------
| Data and other attributes defined here:
|
| auth_header = 'Proxy-Authorization'
|
| handler_order = 490
|
| ----------------------------------------------------------------------
| Methods inherited from BaseHandler:
|
| __lt__(self, other)
|
| add_parent(self, parent)
|
| close(self)
|
| ----------------------------------------------------------------------
| Data descriptors inherited from BaseHandler:
|
| __dict__
| dictionary for instance variables (if defined)
|
| __weakref__
| list of weak references to the object (if defined)
|
| ----------------------------------------------------------------------
| Methods inherited from AbstractDigestAuthHandler:
|
| __init__(self, passwd=None)
|
| get_algorithm_impls(self, algorithm)
|
| get_authorization(self, req, chal)
|
| get_cnonce(self, nonce)
|
| get_entity_digest(self, data, chal)
|
| http_error_auth_reqed(self, auth_header, host, req, headers)
|
| reset_retry_count(self)
|
| retry_http_digest_auth(self, req, auth)
class ProxyHandler(BaseHandler)
| Method resolution order:
| ProxyHandler
| BaseHandler
| builtins.object
|
| Methods defined here:
|
| __init__(self, proxies=None)
|
| proxy_open(self, req, proxy, type)
|
| ----------------------------------------------------------------------
| Data and other attributes defined here:
|
| handler_order = 100
|
| ----------------------------------------------------------------------
| Methods inherited from BaseHandler:
|
| __lt__(self, other)
|
| add_parent(self, parent)
|
| close(self)
|
| ----------------------------------------------------------------------
| Data descriptors inherited from BaseHandler:
|
| __dict__
| dictionary for instance variables (if defined)
|
| __weakref__
| list of weak references to the object (if defined)
class Request(builtins.object)
| Methods defined here:
|
| __init__(self, url, data=None, headers={}, origin_req_host=None, unverifiable=False, method=None)
|
| add_header(self, key, val)
|
| add_unredirected_header(self, key, val)
|
| get_full_url(self)
|
| get_header(self, header_name, default=None)
|
| get_method(self)
| Return a string indicating the HTTP request method.
|
| has_header(self, header_name)
|
| has_proxy(self)
|
| header_items(self)
|
| remove_header(self, header_name)
|
| set_proxy(self, host, type)
|
| ----------------------------------------------------------------------
| Data descriptors defined here:
|
| __dict__
| dictionary for instance variables (if defined)
|
| __weakref__
| list of weak references to the object (if defined)
|
| data
|
| full_url
class URLopener(builtins.object)
| Class to open URLs.
| This is a class rather than just a subroutine because we may need
| more than one set of global protocol-specific options.
| Note -- this is a base class for those who don't want the
| automatic handling of errors type 302 (relocated) and 401
| (authorization needed).
|
| Methods defined here:
|
| __del__(self)
|
| __init__(self, proxies=None, **x509)
| # Constructor
|
| addheader(self, *args)
| Add a header to be used by the HTTP interface only
| e.g. u.addheader('Accept', 'sound/basic')
|
| cleanup(self)
|
| close(self)
|
| http_error(self, url, fp, errcode, errmsg, headers, data=None)
| Handle http errors.
|
| Derived class can override this, or provide specific handlers
| named http_error_DDD where DDD is the 3-digit error code.
|
| http_error_default(self, url, fp, errcode, errmsg, headers)
| Default error handler: close the connection and raise OSError.
|
| open(self, fullurl, data=None)
| Use URLopener().open(file) instead of open(file, 'r').
|
| open_data(self, url, data=None)
| Use "data" URL.
|
| open_file(self, url)
| Use local file or FTP depending on form of URL.
|
| open_ftp(self, url)
| Use FTP protocol.
|
| open_http(self, url, data=None)
| Use HTTP protocol.
|
| open_https(self, url, data=None)
| Use HTTPS protocol.
|
| open_local_file(self, url)
| Use local file.
|
| open_unknown(self, fullurl, data=None)
| Overridable interface to open unknown URL type.
|
| open_unknown_proxy(self, proxy, fullurl, data=None)
| Overridable interface to open unknown URL type.
|
| retrieve(self, url, filename=None, reporthook=None, data=None)
| retrieve(url) returns (filename, headers) for a local object
| or (tempfilename, headers) for a remote object.
|
| ----------------------------------------------------------------------
| Data descriptors defined here:
|
| __dict__
| dictionary for instance variables (if defined)
|
| __weakref__
| list of weak references to the object (if defined)
|
| ----------------------------------------------------------------------
| Data and other attributes defined here:
|
| version = 'Python-urllib/3.4'
class UnknownHandler(BaseHandler)
| Method resolution order:
| UnknownHandler
| BaseHandler
| builtins.object
|
| Methods defined here:
|
| unknown_open(self, req)
|
| ----------------------------------------------------------------------
| Methods inherited from BaseHandler:
|
| __lt__(self, other)
|
| add_parent(self, parent)
|
| close(self)
|
| ----------------------------------------------------------------------
| Data descriptors inherited from BaseHandler:
|
| __dict__
| dictionary for instance variables (if defined)
|
| __weakref__
| list of weak references to the object (if defined)
|
| ----------------------------------------------------------------------
| Data and other attributes inherited from BaseHandler:
|
| handler_order = 500
FUNCTIONS
build_opener(*handlers)
Create an opener object from a list of handlers.
The opener will use several default handlers, including support
for HTTP, FTP and when applicable HTTPS.
If any of the handlers passed as arguments are subclasses of the
default handlers, the default handlers will not be used.
getproxies = getproxies_environment()
Return a dictionary of scheme -> proxy server URL mappings.
Scan the environment for variables named <scheme>_proxy;
this seems to be the standard convention. If you need a
different way, you can pass a proxies dictionary to the
[Fancy]URLopener constructor.
install_opener(opener)
pathname2url(pathname)
OS-specific conversion from a file system path to a relative URL
of the 'file' scheme; not recommended for general use.
url2pathname(pathname)
OS-specific conversion from a relative URL of the 'file' scheme
to a file system path; not recommended for general use.
urlcleanup()
urlopen(url, data=None, timeout=<object object at 0x7f4cd4cc8130>, *, cafile=None, capath=None, cadefault=False, context=None)
urlretrieve(url, filename=None, reporthook=None, data=None)
Retrieve a URL into a temporary location on disk.
Requires a URL argument. If a filename is passed, it is used as
the temporary file location. The reporthook argument should be
a callable that accepts a block number, a read size, and the
total file size of the URL target. The data argument should be
valid URL encoded data.
If a filename is passed and the URL points to a local resource,
the result is a copy from local file to new file.
Returns a tuple containing the path to the newly created
data file as well as the resulting HTTPMessage object.
DATA
__all__ = ['Request', 'OpenerDirector', 'BaseHandler', 'HTTPDefaultErr...
VERSION
3.4
FILE
/usr/lib64/python3.4/urllib/request.py
In [46]:
help(urlopen)
Help on function urlopen in module urllib.request:
urlopen(url, data=None, timeout=<object object at 0x7f4cd4cc8130>, *, cafile=None, capath=None, cadefault=False, context=None)
For example: load the collection of Shakespear's work and print a couple of rows. (The first 244 lines of this particular document are copyright information, and should be skipped.)
In [12]:
with urlopen('http://ocw.mit.edu/ans7870/6/6.006/s08/lecturenotes/files/t8.shakespeare.txt') as src:
txt = src.readlines()
for t in txt[244:250]:
print(t.decode())
1609
THE SONNETS
by William Shakespeare
Load everything at once:
In [73]:
data = urlopen('http://ocw.mit.edu/ans7870/6/6.006/s08/lecturenotes/files/t8.shakespeare.txt').read().decode()
data[0:100]
Out[73]:
'This is the 100th Etext file presented by Project Gutenberg, and\nis presented in cooperation with Wo'
Note: there is a difference between read and readlines. While read loads the entire content into string of bytes, readline allow to iterate over sections of the input stream that are separated by the new-line character(s).
In [73]:
with open('textfiles/shakespeare.txt', 'r') as src:
txt = src.readlines()
for t in txt[0:10]:
print(t) ## Note: we don't need to decode the string
1609
THE SONNETS
by William Shakespeare
1
From fairest creatures we desire increase,
Read everything at once...
In [76]:
txt = open('textfiles/shakespeare.txt', 'r').read()
txt[0:100]
Out[76]:
'1609\n\nTHE SONNETS\n\nby William Shakespeare\n\n\n\n 1\n From fairest creatures we desi'
In [79]:
import zlib
from hdfs import InsecureClient
client = InsecureClient('http://backend-0-0:50070')
In [80]:
with client.read('/user/pmolnar/data/20news/20news-bydate-test/talk.politics.mideast/77239.gz') as reader:
txt = zlib.decompress(reader.read(), 16+zlib.MAX_WBITS).decode()
txt[0:100]
Out[80]:
'From: oaf@zurich.ai.mit.edu (Oded Feingold)\nSubject: Re: To All My Friends on T.P.M., I send Greetin'
In [81]:
txt.split('\n')
Out[81]:
['From: oaf@zurich.ai.mit.edu (Oded Feingold)',
'Subject: Re: To All My Friends on T.P.M., I send Greetings',
'Organization: M.I.T. Artificial Intelligence Lab.',
'Lines: 1',
'Reply-To: oaf@zurich.ai.mit.edu',
'NNTP-Posting-Host: klosters.ai.mit.edu',
"In-reply-to: szljubi@chip.ucdavis.edu's message of Thu, 6 May 1993 22:47:00 GMT",
'',
"This is an outrage! I don't even own a dog.",
'']
In order to read the text files within an entire directory we have to first get thg list, and then iterate through it.
In [83]:
dir_list = client.list('/user/pmolnar/data/20news/20news-bydate-test/talk.politics.mideast/')
dir_list[0:10]
Out[83]:
['76355.gz',
'76366.gz',
'76367.gz',
'76368.gz',
'76369.gz',
'76370.gz',
'76372.gz',
'76373.gz',
'76374.gz',
'76375.gz']
In [87]:
text_docs = []
for f in dir_list:
with client.read('/user/pmolnar/data/20news/20news-bydate-test/talk.politics.mideast/%s' % f) as reader:
txt = zlib.decompress(reader.read(), 16+zlib.MAX_WBITS).decode()
text_docs.append(txt)
print("Read %d text files." % len(text_docs))
Read 376 text files.
In [86]:
text_docs[1:3]
Out[86]:
['From: rj3s@Virginia.EDU ("Get thee to a nunnery.....")\nSubject: Re: Deir Yassin\nOrganization: University of Virginia\nLines: 65\n\nhm@cs.brown.edu writes:\n> In article <martinb.735590895@brise.ERE.UMontreal.CA> aurag@ERE.UMontreal.CA (Aurag Hassan) writes:\n> \n> Are you trying to say that there were no massacres in Deir Yassin\n> or in Sabra and Shatila? If so then let me tell you some good jokes:\n> \n> There is not and was not any such thing like jewish killing in WWII\n> \n> Palestinians just did what Davidians did for fourty years and more.\n> \n> In fact no one was killed in any war at any time or any place.\n> \n> People die that is all. No one gets killed.\n> \n> Maybe also vietamiese didn\'t die in Vietnam war killed by american\n> napalm they were just pyromaniacs and that\'s all.\n> \n> Maybe jews just liked gas chambers and no one forced them to get in there.they \n> may be thought it was like snifing cocaine. No?\n> \n> What do you think of this ? Isn\'t it stupid to say so?\n> Well it is as stupid as what you said .Next time you want to lie do it\n> intelligently.\n> \n> Sincerely yours.\n> \n> Hassan\n> \n> Arab civilians did die at Dir yassin. But there was no massacre. First\n> of all, the village housed many *armed* troops. Secondly, the Irgun\n> and Stern fighters had absolutely no intentions of killing civilians.\n> The village was attacked only for its military significance. In fact,\n> a warning was given to the occupants of the village to leave before\n> the attack was to begin.\n> \n> By all rational standards, Dir Yassin was not a massacre. The killing\n> was unintentional. The village housed Arab snipers and Arab troops.\n> Thus it was attacked for its military significance. It was not\n> attacked with intentions of killing any civilians.\n> \n> To even compare Dir Yassin, in which some 120 or so Arabs died, to the\n> Holocaust is absurd. The Irgun did not want to kill any civilians. The\n> village had almost 1000 inhabitants, most of whom survived.\n> \n> Harry.\nThis is such Bullshit. Deir Yassin was an unprovoked attack on\nthe part of the Jews, and a massacre defines it best in my\nopinion. The village of Deir Yassin had had a pact with the\nJews, a peace pact, but the Irgun purposely broke this\nagreement in order to scare off the Palestinians. I might\ngrant that this village housed armed Arabs [I doubt it] but\nnothing in the archives and available literature indicates that\nthis was a motivating force amongst the Irgun. The Deir Yassin\nMASSACRE was part of an over all strategy to intimidate the\nPalestinians to flee the Jewish Homeland.,...and contrary to\nyour belief, many civilians were killed. Deir Yassin was later\nadvertized by the very Jews who perpetrated it because it was\nuseful in getting many Palestinians to leave. The Palestinians\nwere rightfully scared off, because they did not want another\nDeir Yassin. \n\tI\'m not necessarily condemning the Israelites here;\natrocities were aslo committed on the part of the Arabs.\nIsraelophiles should just be careful in thinking that they are\nand were the good guys in the middle east. Both Arab and Jew\nsuck equally.\n',
'From: ohayon@jcpltyo.JCPL.CO.JP (Tsiel Ohayon)\nSubject: Re: rejoinder. Questions to Israelis\nOrganization: James Capel Pacific Limited, Tokyo Japan\nLines: 31\n\nIn article <1993Apr26.211905.28317@freenet.carleton.ca> aa229@Freenet.carleton.ca (Steve Birnbaum) writes:\n\n[SB] Oh yeah, Israel was really ready to "expand its borders" on the holiest day\n[SB] of the year (Yom Kippur) when the Arabs attacked in 1973. Oh wait, you\n[SB] chose to omit that war...perhaps because it 100% supports the exact \n[SB] OPPOSITE to the point you are trying to make? I don\'t think that it\'s\n[SB] because it was the war that hit Israel the hardest. Also, in 1967 it was\n[SB] Egypt, not Israel who kicked out the UN force. In 1948 it was the Arabs\n[SB] who refused to accept the existance of Israel BASED ON THE BORDERS SET\n[SB] BY THE UNITED NATIONS. In 1956, Egypt closed off the Red Sea to Israeli\n[SB] shipping, a clear antagonistic act. And in 1982 the attack was a response\n[SB] to years of constant shelling by terrorist organizations from the Golan\n\t\t\t\t\t\t\t ^^^^^^^^^^^^^^^^\n[SB] Heights. Children were being murdered all the time by terrorists and Israel\n^^^^^^^^^^^^\n[SB] finally retaliated. Nowhere do I see a war that Israel started so that \n[SB] the borders could be expanded.\n\nI agree with all you write except that Terrorist orgs. were not shelling\nIsrael from the Golan Heights in 1982, but rather from Lebanon. The Golan\nHeights have been held by Israel since 1967, and therefore the PLO could\nnot have been shelling Israel from there, unless there is something I am\nnot aware of.\n\n\nTsiel\n-- \n----8<--------------------------------------------------------------->8------\nTsiel:ohayon@jcpl.co.jp\t | If you do not receive this E-mail, please let me\nEmployer may not have same | know as soon as possible, if possible.\nopinions, if any ! | Two percent of zero is almost nothing.\n']
In [ ]:
In [ ]:
In [ ]:
In [ ]:
In [93]:
import string
help(string)
Help on module string:
NAME
string - A collection of string constants.
DESCRIPTION
Public module variables:
whitespace -- a string containing all ASCII whitespace
ascii_lowercase -- a string containing all ASCII lowercase letters
ascii_uppercase -- a string containing all ASCII uppercase letters
ascii_letters -- a string containing all ASCII letters
digits -- a string containing all ASCII decimal digits
hexdigits -- a string containing all ASCII hexadecimal digits
octdigits -- a string containing all ASCII octal digits
punctuation -- a string containing all ASCII punctuation characters
printable -- a string containing all ASCII characters considered printable
CLASSES
builtins.object
Formatter
Template
class Formatter(builtins.object)
| Methods defined here:
|
| check_unused_args(self, used_args, args, kwargs)
|
| convert_field(self, value, conversion)
|
| format(self, format_string, *args, **kwargs)
|
| format_field(self, value, format_spec)
|
| get_field(self, field_name, args, kwargs)
| # given a field_name, find the object it references.
| # field_name: the field being looked up, e.g. "0.name"
| # or "lookup[3]"
| # used_args: a set of which args have been used
| # args, kwargs: as passed in to vformat
|
| get_value(self, key, args, kwargs)
|
| parse(self, format_string)
| # returns an iterable that contains tuples of the form:
| # (literal_text, field_name, format_spec, conversion)
| # literal_text can be zero length
| # field_name can be None, in which case there's no
| # object to format and output
| # if field_name is not None, it is looked up, formatted
| # with format_spec and conversion and then used
|
| vformat(self, format_string, args, kwargs)
|
| ----------------------------------------------------------------------
| Data descriptors defined here:
|
| __dict__
| dictionary for instance variables (if defined)
|
| __weakref__
| list of weak references to the object (if defined)
class Template(builtins.object)
| A string class for supporting $-substitutions.
|
| Methods defined here:
|
| __init__(self, template)
|
| safe_substitute(self, *args, **kws)
|
| substitute(self, *args, **kws)
|
| ----------------------------------------------------------------------
| Data descriptors defined here:
|
| __dict__
| dictionary for instance variables (if defined)
|
| __weakref__
| list of weak references to the object (if defined)
|
| ----------------------------------------------------------------------
| Data and other attributes defined here:
|
| delimiter = '$'
|
| flags = 2
|
| idpattern = '[_a-z][_a-z0-9]*'
|
| pattern = re.compile('\n \\$(?:\n (?P<escaped>\\$)..._a-z][_a-...
FUNCTIONS
capwords(s, sep=None)
capwords(s [,sep]) -> string
Split the argument into words using split, capitalize each
word using capitalize, and join the capitalized words using
join. If the optional second argument sep is absent or None,
runs of whitespace characters are replaced by a single space
and leading and trailing whitespace are removed, otherwise
sep is used to split and join the words.
DATA
ascii_letters = 'abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ'
ascii_lowercase = 'abcdefghijklmnopqrstuvwxyz'
ascii_uppercase = 'ABCDEFGHIJKLMNOPQRSTUVWXYZ'
digits = '0123456789'
hexdigits = '0123456789abcdefABCDEF'
octdigits = '01234567'
printable = '0123456789abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTU...
punctuation = '!"#$%&\'()*+,-./:;<=>?@[\\]^_`{|}~'
whitespace = ' \t\n\r\x0b\x0c'
FILE
/usr/lib64/python3.4/string.py
In [96]:
txt = open("textfiles/shakespeare.txt").read()
txt[0:100]
Out[96]:
'1609\n\nTHE SONNETS\n\nby William Shakespeare\n\n\n\n 1\n From fairest creatures we desi'
In [97]:
txt = txt.lower()
In [98]:
for c in '.;!\'" ':
txt = txt.replace(c, '\n')
txt[0:100]
Out[98]:
'1609\n\nthe\nsonnets\n\nby\nwilliam\nshakespeare\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n1\n\n\nfrom\nfairest\ncreatures\nwe\ndesi'
In [100]:
word_list = txt.split('\n')
word_list[0:10]
Out[100]:
['1609', '', 'the', 'sonnets', '', 'by', 'william', 'shakespeare', '', '']
In [ ]:
help(list)
In [ ]:
help(tuple)
In [ ]:
In [98]:
# Example
a = []
a.append('a')
a.append('z')
a += ['b', 'x', 'c']
a.sort()
a[0:2]
Out[98]:
['a', 'b']
In [ ]:
help(dict)
In [19]:
f = { 'one': 1, 'two': 2}
f['a'] = 0
In [20]:
f
Out[20]:
{'a': 0, 'one': 1, 'two': 2}
In [22]:
f['one']
Out[22]:
1
In [23]:
f.keys()
Out[23]:
dict_keys(['one', 'two', 'a'])
In [24]:
f.values()
Out[24]:
dict_values([1, 2, 0])
In [25]:
Ω = 17
In [26]:
Δ
Out[26]:
17
In [55]:
'a' in f.keys()
Out[55]:
True
In [56]:
f['b']
---------------------------------------------------------------------------
KeyError Traceback (most recent call last)
<ipython-input-56-6202e5beb3a1> in <module>()
----> 1 f['b']
KeyError: 'b'
In [30]:
l2 = [3,4,1,45,7,234,123]
l2.sort()
l2
Out[30]:
[1, 3, 4, 7, 45, 123, 234]
In [35]:
l = [(3,'a'), (9, 'z'), (1, 'y'), (1, 'b'), (5, 'd'), (7, 'x')]
l
Out[35]:
[(3, 'a'), (9, 'z'), (1, 'y'), (1, 'b'), (5, 'd'), (7, 'x')]
In [37]:
def take_first(x):
return x[0]
l.sort(key=take_first)
l
Out[37]:
[(1, 'b'), (1, 'y'), (3, 'a'), (5, 'd'), (7, 'x'), (9, 'z')]
In [92]:
l.sort(key=lambda x: x[0], reverse=True)
l
Out[92]:
[(7, 'x'), (5, 'd'), (3, 'a'), (1, 'b')]
In [87]:
sorted(l, key=lambda x: x[0], reverse=True)
Out[87]:
[(5, 'd'), (3, 'a'), (1, 'b')]
In [77]:
l
Out[77]:
[(3, 'a'), (1, 'b'), (5, 'd')]
In [41]:
l3 = [10, 110, 12, 1203]
l3.sort(key=lambda x: str(x))
l3
Out[41]:
[10, 110, 12, 1203]
In [ ]:
In [ ]:
In [ ]:
In [82]:
help(sorted)
Help on built-in function sorted in module builtins:
sorted(iterable, key=None, reverse=False)
Return a new list containing all items from the iterable in ascending order.
A custom key function can be supplied to customise the sort order, and the
reverse flag can be set to request the result in descending order.
In [ ]:
# curl http://ocw.mit.edu/ans7870/6/6.006/s08/lecturenotes/files/t8.shakespeare.txt | tail -n +245 | tr 'A-Z' 'a-z'| tr ' .?:,;' '\n' | sort | uniq -c | sort -rn | more
In [ ]:
In [1]:
txt = open('textfiles/shakespeare.txt', 'r').read()
txt[0:100]
Out[1]:
'1609\n\nTHE SONNETS\n\nby William Shakespeare\n\n\n\n 1\n From fairest creatures we desi'
In [22]:
txt2 = txt.replace(',', '\n').replace('.', '\n').replace('?', '\n').replace('!', '\n').replace('\'', '\n').replace('"', '\n').lower()
txt2[0:100]
Out[22]:
'1609\n\nthe sonnets\n\nby william shakespeare\n\n\n\n 1\n from fairest creatures we desi'
In [23]:
wordlist = txt2.split()
wordlist.sort()
results = []
current_word = wordlist[0]
current_counter = 1
for w in wordlist[1:]:
if w!=current_word:
results.append((current_word, current_counter))
current_word = w
current_counter = 1
else:
current_counter += 1
results.append((current_word, current_counter))
results.sort(key=lambda x: x[1], reverse=True)
results[0:10]
In [25]:
results[0:10]
Out[25]:
[('&', 3),
('&c', 18),
('(1)', 218),
('(2)', 218),
('(a', 3),
('(alack', 1),
('(all', 4),
('(although', 2),
('(always', 1),
('(as', 17)]
In [27]:
results.sort(key=lambda x: x[1], reverse=True)
In [28]:
results[0:10]
Out[28]:
[('the', 27531),
('and', 26658),
('i', 22430),
('to', 18937),
('of', 18103),
('a', 14554),
('you', 13475),
('my', 12474),
('that', 11457),
('in', 11010)]
In [ ]:
In [33]:
wordlist = txt2.split()
reshash = {}
for w in wordlist:
if w in reshash.keys():
reshash[w] += 1
else:
reshash[w] = 1
results = [(k, reshash[k]) for k in reshash.keys()]
results.sort(key=lambda x: x[1], reverse=True)
results[0:10]
Out[33]:
[('the', 27531),
('and', 26658),
('i', 22430),
('to', 18937),
('of', 18103),
('a', 14554),
('you', 13475),
('my', 12474),
('that', 11457),
('in', 11010)]
In [31]:
Out[31]:
[('misuse', 8),
('julia', 153),
('legacy', 5),
('unhand', 1),
('nine-', 1),
('long-ingraffed', 1),
('substances', 2),
('profound;', 1),
('austerely', 2),
('executed', 18)]
In [ ]:
In [ ]:
In [ ]:
In [ ]:
In [ ]:
Content source: squishbug/DataScienceProgramming
Similar notebooks: