File Explorer

/proc/self/root/proc/self/root/proc/self/root/usr/lib64/python3.9/urllib
This explorer reads the filesystem of the server it runs on, so /workspace/user isn't present here. Browsing and the terminal still work against this server's own disk from /.
1 dir
6 files
request.py99.1 KB · 2777 lines
1"""An extensible library for opening URLs using a variety of protocols2 3The simplest way to use this module is to call the urlopen function,4which accepts a string containing a URL or a Request object (described5below).  It opens the URL and returns the results as file-like6object; the returned object has some extra methods described below.7 8The OpenerDirector manages a collection of Handler objects that do9all the actual work.  Each Handler implements a particular protocol or10option.  The OpenerDirector is a composite object that invokes the11Handlers needed to open the requested URL.  For example, the12HTTPHandler performs HTTP GET and POST requests and deals with13non-error returns.  The HTTPRedirectHandler automatically deals with14HTTP 301, 302, 303 and 307 redirect errors, and the HTTPDigestAuthHandler15deals with digest authentication.16 17urlopen(url, data=None) -- Basic usage is the same as original18urllib.  pass the url and optionally data to post to an HTTP URL, and19get a file-like object back.  One difference is that you can also pass20a Request instance instead of URL.  Raises a URLError (subclass of21OSError); for HTTP errors, raises an HTTPError, which can also be22treated as a valid response.23 24build_opener -- Function that creates a new OpenerDirector instance.25Will install the default handlers.  Accepts one or more Handlers as26arguments, either instances or Handler classes that it will27instantiate.  If one of the argument is a subclass of the default28handler, the argument will be installed instead of the default.29 30install_opener -- Installs a new opener as the default opener.31 32objects of interest:33 34OpenerDirector -- Sets up the User Agent as the Python-urllib client and manages35the Handler classes, while dealing with requests and responses.36 37Request -- An object that encapsulates the state of a request.  The38state can be as simple as the URL.  It can also include extra HTTP39headers, e.g. a User-Agent.40 41BaseHandler --42 43internals:44BaseHandler and parent45_call_chain conventions46 47Example usage:48 49import urllib.request50 51# set up authentication info52authinfo = urllib.request.HTTPBasicAuthHandler()53authinfo.add_password(realm='PDQ Application',54                      uri='https://mahler:8092/site-updates.py',55                      user='klem',56                      passwd='geheim$parole')57 58proxy_support = urllib.request.ProxyHandler({"http" : "http://ahad-haam:3128"})59 60# build a new opener that adds authentication and caching FTP handlers61opener = urllib.request.build_opener(proxy_support, authinfo,62                                     urllib.request.CacheFTPHandler)63 64# install it65urllib.request.install_opener(opener)66 67f = urllib.request.urlopen('https://www.python.org/')68"""69 70# XXX issues:71# If an authentication error handler that tries to perform72# authentication for some reason but fails, how should the error be73# signalled?  The client needs to know the HTTP error code.  But if74# the handler knows that the problem was, e.g., that it didn't know75# that hash algo that requested in the challenge, it would be good to76# pass that information along to the client, too.77# ftp errors aren't handled cleanly78# check digest against correct (i.e. non-apache) implementation79 80# Possible extensions:81# complex proxies  XXX not sure what exactly was meant by this82# abstract factory for opener83 84import base6485import bisect86import email87import hashlib88import http.client89import io90import os91import posixpath92import re93import socket94import string95import sys96import time97import tempfile98import contextlib99import warnings100 101 102from urllib.error import URLError, HTTPError, ContentTooShortError103from urllib.parse import (104    urlparse, urlsplit, urljoin, unwrap, quote, unquote,105    _splittype, _splithost, _splitport, _splituser, _splitpasswd,106    _splitattr, _splitquery, _splitvalue, _splittag, _to_bytes,107    unquote_to_bytes, urlunparse)108from urllib.response import addinfourl, addclosehook109 110# check for SSL111try:112    import ssl113except ImportError:114    _have_ssl = False115else:116    _have_ssl = True117 118__all__ = [119    # Classes120    'Request', 'OpenerDirector', 'BaseHandler', 'HTTPDefaultErrorHandler',121    'HTTPRedirectHandler', 'HTTPCookieProcessor', 'ProxyHandler',122    'HTTPPasswordMgr', 'HTTPPasswordMgrWithDefaultRealm',123    'HTTPPasswordMgrWithPriorAuth', 'AbstractBasicAuthHandler',124    'HTTPBasicAuthHandler', 'ProxyBasicAuthHandler', 'AbstractDigestAuthHandler',125    'HTTPDigestAuthHandler', 'ProxyDigestAuthHandler', 'HTTPHandler',126    'FileHandler', 'FTPHandler', 'CacheFTPHandler', 'DataHandler',127    'UnknownHandler', 'HTTPErrorProcessor',128    # Functions129    'urlopen', 'install_opener', 'build_opener',130    'pathname2url', 'url2pathname', 'getproxies',131    # Legacy interface132    'urlretrieve', 'urlcleanup', 'URLopener', 'FancyURLopener',133]134 135# used in User-Agent header sent136__version__ = '%d.%d' % sys.version_info[:2]137 138_opener = None139def urlopen(url, data=None, timeout=socket._GLOBAL_DEFAULT_TIMEOUT,140            *, cafile=None, capath=None, cadefault=False, context=None):141    '''Open the URL url, which can be either a string or a Request object.142 143    *data* must be an object specifying additional data to be sent to144    the server, or None if no such data is needed.  See Request for145    details.146 147    urllib.request module uses HTTP/1.1 and includes a "Connection:close"148    header in its HTTP requests.149 150    The optional *timeout* parameter specifies a timeout in seconds for151    blocking operations like the connection attempt (if not specified, the152    global default timeout setting will be used). This only works for HTTP,153    HTTPS and FTP connections.154 155    If *context* is specified, it must be a ssl.SSLContext instance describing156    the various SSL options. See HTTPSConnection for more details.157 158    The optional *cafile* and *capath* parameters specify a set of trusted CA159    certificates for HTTPS requests. cafile should point to a single file160    containing a bundle of CA certificates, whereas capath should point to a161    directory of hashed certificate files. More information can be found in162    ssl.SSLContext.load_verify_locations().163 164    The *cadefault* parameter is ignored.165 166 167    This function always returns an object which can work as a168    context manager and has the properties url, headers, and status.169    See urllib.response.addinfourl for more detail on these properties.170 171    For HTTP and HTTPS URLs, this function returns a http.client.HTTPResponse172    object slightly modified. In addition to the three new methods above, the173    msg attribute contains the same information as the reason attribute ---174    the reason phrase returned by the server --- instead of the response175    headers as it is specified in the documentation for HTTPResponse.176 177    For FTP, file, and data URLs and requests explicitly handled by legacy178    URLopener and FancyURLopener classes, this function returns a179    urllib.response.addinfourl object.180 181    Note that None may be returned if no handler handles the request (though182    the default installed global OpenerDirector uses UnknownHandler to ensure183    this never happens).184 185    In addition, if proxy settings are detected (for example, when a *_proxy186    environment variable like http_proxy is set), ProxyHandler is default187    installed and makes sure the requests are handled through the proxy.188 189    '''190    global _opener191    if cafile or capath or cadefault:192        import warnings193        warnings.warn("cafile, capath and cadefault are deprecated, use a "194                      "custom context instead.", DeprecationWarning, 2)195        if context is not None:196            raise ValueError(197                "You can't pass both context and any of cafile, capath, and "198                "cadefault"199            )200        if not _have_ssl:201            raise ValueError('SSL support not available')202        context = ssl.create_default_context(ssl.Purpose.SERVER_AUTH,203                                             cafile=cafile,204                                             capath=capath)205        https_handler = HTTPSHandler(context=context)206        opener = build_opener(https_handler)207    elif context:208        https_handler = HTTPSHandler(context=context)209        opener = build_opener(https_handler)210    elif _opener is None:211        _opener = opener = build_opener()212    else:213        opener = _opener214    return opener.open(url, data, timeout)215 216def install_opener(opener):217    global _opener218    _opener = opener219 220_url_tempfiles = []221def urlretrieve(url, filename=None, reporthook=None, data=None):222    """223    Retrieve a URL into a temporary location on disk.224 225    Requires a URL argument. If a filename is passed, it is used as226    the temporary file location. The reporthook argument should be227    a callable that accepts a block number, a read size, and the228    total file size of the URL target. The data argument should be229    valid URL encoded data.230 231    If a filename is passed and the URL points to a local resource,232    the result is a copy from local file to new file.233 234    Returns a tuple containing the path to the newly created235    data file as well as the resulting HTTPMessage object.236    """237    url_type, path = _splittype(url)238 239    with contextlib.closing(urlopen(url, data)) as fp:240        headers = fp.info()241 242        # Just return the local path and the "headers" for file://243        # URLs. No sense in performing a copy unless requested.244        if url_type == "file" and not filename:245            return os.path.normpath(path), headers246 247        # Handle temporary file setup.248        if filename:249            tfp = open(filename, 'wb')250        else:251            tfp = tempfile.NamedTemporaryFile(delete=False)252            filename = tfp.name253            _url_tempfiles.append(filename)254 255        with tfp:256            result = filename, headers257            bs = 1024*8258            size = -1259            read = 0260            blocknum = 0261            if "content-length" in headers:262                size = int(headers["Content-Length"])263 264            if reporthook:265                reporthook(blocknum, bs, size)266 267            while True:268                block = fp.read(bs)269                if not block:270                    break271                read += len(block)272                tfp.write(block)273                blocknum += 1274                if reporthook:275                    reporthook(blocknum, bs, size)276 277    if size >= 0 and read < size:278        raise ContentTooShortError(279            "retrieval incomplete: got only %i out of %i bytes"280            % (read, size), result)281 282    return result283 284def urlcleanup():285    """Clean up temporary files from urlretrieve calls."""286    for temp_file in _url_tempfiles:287        try:288            os.unlink(temp_file)289        except OSError:290            pass291 292    del _url_tempfiles[:]293    global _opener294    if _opener:295        _opener = None296 297# copied from cookielib.py298_cut_port_re = re.compile(r":\d+$", re.ASCII)299def request_host(request):300    """Return request-host, as defined by RFC 2965.301 302    Variation from RFC: returned value is lowercased, for convenient303    comparison.304 305    """306    url = request.full_url307    host = urlparse(url)[1]308    if host == "":309        host = request.get_header("Host", "")310 311    # remove port, if present312    host = _cut_port_re.sub("", host, 1)313    return host.lower()314 315class Request:316 317    def __init__(self, url, data=None, headers={},318                 origin_req_host=None, unverifiable=False,319                 method=None):320        self.full_url = url321        self.headers = {}322        self.unredirected_hdrs = {}323        self._data = None324        self.data = data325        self._tunnel_host = None326        for key, value in headers.items():327            self.add_header(key, value)328        if origin_req_host is None:329            origin_req_host = request_host(self)330        self.origin_req_host = origin_req_host331        self.unverifiable = unverifiable332        if method:333            self.method = method334 335    @property336    def full_url(self):337        if self.fragment:338            return '{}#{}'.format(self._full_url, self.fragment)339        return self._full_url340 341    @full_url.setter342    def full_url(self, url):343        # unwrap('<URL:type://host/path>') --> 'type://host/path'344        self._full_url = unwrap(url)345        self._full_url, self.fragment = _splittag(self._full_url)346        self._parse()347 348    @full_url.deleter349    def full_url(self):350        self._full_url = None351        self.fragment = None352        self.selector = ''353 354    @property355    def data(self):356        return self._data357 358    @data.setter359    def data(self, data):360        if data != self._data:361            self._data = data362            # issue 16464363            # if we change data we need to remove content-length header364            # (cause it's most probably calculated for previous value)365            if self.has_header("Content-length"):366                self.remove_header("Content-length")367 368    @data.deleter369    def data(self):370        self.data = None371 372    def _parse(self):373        self.type, rest = _splittype(self._full_url)374        if self.type is None:375            raise ValueError("unknown url type: %r" % self.full_url)376        self.host, self.selector = _splithost(rest)377        if self.host:378            self.host = unquote(self.host)379 380    def get_method(self):381        """Return a string indicating the HTTP request method."""382        default_method = "POST" if self.data is not None else "GET"383        return getattr(self, 'method', default_method)384 385    def get_full_url(self):386        return self.full_url387 388    def set_proxy(self, host, type):389        if self.type == 'https' and not self._tunnel_host:390            self._tunnel_host = self.host391        else:392            self.type= type393            self.selector = self.full_url394        self.host = host395 396    def has_proxy(self):397        return self.selector == self.full_url398 399    def add_header(self, key, val):400        # useful for something like authentication401        self.headers[key.capitalize()] = val402 403    def add_unredirected_header(self, key, val):404        # will not be added to a redirected request405        self.unredirected_hdrs[key.capitalize()] = val406 407    def has_header(self, header_name):408        return (header_name in self.headers or409                header_name in self.unredirected_hdrs)410 411    def get_header(self, header_name, default=None):412        return self.headers.get(413            header_name,414            self.unredirected_hdrs.get(header_name, default))415 416    def remove_header(self, header_name):417        self.headers.pop(header_name, None)418        self.unredirected_hdrs.pop(header_name, None)419 420    def header_items(self):421        hdrs = {**self.unredirected_hdrs, **self.headers}422        return list(hdrs.items())423 424class OpenerDirector:425    def __init__(self):426        client_version = "Python-urllib/%s" % __version__427        self.addheaders = [('User-agent', client_version)]428        # self.handlers is retained only for backward compatibility429        self.handlers = []430        # manage the individual handlers431        self.handle_open = {}432        self.handle_error = {}433        self.process_response = {}434        self.process_request = {}435 436    def add_handler(self, handler):437        if not hasattr(handler, "add_parent"):438            raise TypeError("expected BaseHandler instance, got %r" %439                            type(handler))440 441        added = False442        for meth in dir(handler):443            if meth in ["redirect_request", "do_open", "proxy_open"]:444                # oops, coincidental match445                continue446 447            i = meth.find("_")448            protocol = meth[:i]449            condition = meth[i+1:]450 451            if condition.startswith("error"):452                j = condition.find("_") + i + 1453                kind = meth[j+1:]454                try:455                    kind = int(kind)456                except ValueError:457                    pass458                lookup = self.handle_error.get(protocol, {})459                self.handle_error[protocol] = lookup460            elif condition == "open":461                kind = protocol462                lookup = self.handle_open463            elif condition == "response":464                kind = protocol465                lookup = self.process_response466            elif condition == "request":467                kind = protocol468                lookup = self.process_request469            else:470                continue471 472            handlers = lookup.setdefault(kind, [])473            if handlers:474                bisect.insort(handlers, handler)475            else:476                handlers.append(handler)477            added = True478 479        if added:480            bisect.insort(self.handlers, handler)481            handler.add_parent(self)482 483    def close(self):484        # Only exists for backwards compatibility.485        pass486 487    def _call_chain(self, chain, kind, meth_name, *args):488        # Handlers raise an exception if no one else should try to handle489        # the request, or return None if they can't but another handler490        # could.  Otherwise, they return the response.491        handlers = chain.get(kind, ())492        for handler in handlers:493            func = getattr(handler, meth_name)494            result = func(*args)495            if result is not None:496                return result497 498    def open(self, fullurl, data=None, timeout=socket._GLOBAL_DEFAULT_TIMEOUT):499        # accept a URL or a Request object500        if isinstance(fullurl, str):501            req = Request(fullurl, data)502        else:503            req = fullurl504            if data is not None:505                req.data = data506 507        req.timeout = timeout508        protocol = req.type509 510        # pre-process request511        meth_name = protocol+"_request"512        for processor in self.process_request.get(protocol, []):513            meth = getattr(processor, meth_name)514            req = meth(req)515 516        sys.audit('urllib.Request', req.full_url, req.data, req.headers, req.get_method())517        response = self._open(req, data)518 519        # post-process response520        meth_name = protocol+"_response"521        for processor in self.process_response.get(protocol, []):522            meth = getattr(processor, meth_name)523            response = meth(req, response)524 525        return response526 527    def _open(self, req, data=None):528        result = self._call_chain(self.handle_open, 'default',529                                  'default_open', req)530        if result:531            return result532 533        protocol = req.type534        result = self._call_chain(self.handle_open, protocol, protocol +535                                  '_open', req)536        if result:537            return result538 539        return self._call_chain(self.handle_open, 'unknown',540                                'unknown_open', req)541 542    def error(self, proto, *args):543        if proto in ('http', 'https'):544            # XXX http[s] protocols are special-cased545            dict = self.handle_error['http'] # https is not different than http546            proto = args[2]  # YUCK!547            meth_name = 'http_error_%s' % proto548            http_err = 1549            orig_args = args550        else:551            dict = self.handle_error552            meth_name = proto + '_error'553            http_err = 0554        args = (dict, proto, meth_name) + args555        result = self._call_chain(*args)556        if result:557            return result558 559        if http_err:560            args = (dict, 'default', 'http_error_default') + orig_args561            return self._call_chain(*args)562 563# XXX probably also want an abstract factory that knows when it makes564# sense to skip a superclass in favor of a subclass and when it might565# make sense to include both566 567def build_opener(*handlers):568    """Create an opener object from a list of handlers.569 570    The opener will use several default handlers, including support571    for HTTP, FTP and when applicable HTTPS.572 573    If any of the handlers passed as arguments are subclasses of the574    default handlers, the default handlers will not be used.575    """576    opener = OpenerDirector()577    default_classes = [ProxyHandler, UnknownHandler, HTTPHandler,578                       HTTPDefaultErrorHandler, HTTPRedirectHandler,579                       FTPHandler, FileHandler, HTTPErrorProcessor,580                       DataHandler]581    if hasattr(http.client, "HTTPSConnection"):582        default_classes.append(HTTPSHandler)583    skip = set()584    for klass in default_classes:585        for check in handlers:586            if isinstance(check, type):587                if issubclass(check, klass):588                    skip.add(klass)589            elif isinstance(check, klass):590                skip.add(klass)591    for klass in skip:592        default_classes.remove(klass)593 594    for klass in default_classes:595        opener.add_handler(klass())596 597    for h in handlers:598        if isinstance(h, type):599            h = h()600        opener.add_handler(h)601    return opener602 603class BaseHandler:604    handler_order = 500605 606    def add_parent(self, parent):607        self.parent = parent608 609    def close(self):610        # Only exists for backwards compatibility611        pass612 613    def __lt__(self, other):614        if not hasattr(other, "handler_order"):615            # Try to preserve the old behavior of having custom classes616            # inserted after default ones (works only for custom user617            # classes which are not aware of handler_order).618            return True619        return self.handler_order < other.handler_order620 621 622class HTTPErrorProcessor(BaseHandler):623    """Process HTTP error responses."""624    handler_order = 1000  # after all other processing625 626    def http_response(self, request, response):627        code, msg, hdrs = response.code, response.msg, response.info()628 629        # According to RFC 2616, "2xx" code indicates that the client's630        # request was successfully received, understood, and accepted.631        if not (200 <= code < 300):632            response = self.parent.error(633                'http', request, response, code, msg, hdrs)634 635        return response636 637    https_response = http_response638 639class HTTPDefaultErrorHandler(BaseHandler):640    def http_error_default(self, req, fp, code, msg, hdrs):641        raise HTTPError(req.full_url, code, msg, hdrs, fp)642 643class HTTPRedirectHandler(BaseHandler):644    # maximum number of redirections to any single URL645    # this is needed because of the state that cookies introduce646    max_repeats = 4647    # maximum total number of redirections (regardless of URL) before648    # assuming we're in a loop649    max_redirections = 10650 651    def redirect_request(self, req, fp, code, msg, headers, newurl):652        """Return a Request or None in response to a redirect.653 654        This is called by the http_error_30x methods when a655        redirection response is received.  If a redirection should656        take place, return a new Request to allow http_error_30x to657        perform the redirect.  Otherwise, raise HTTPError if no-one658        else should try to handle this url.  Return None if you can't659        but another Handler might.660        """661        m = req.get_method()662        if (not (code in (301, 302, 303, 307) and m in ("GET", "HEAD")663            or code in (301, 302, 303) and m == "POST")):664            raise HTTPError(req.full_url, code, msg, headers, fp)665 666        # Strictly (according to RFC 2616), 301 or 302 in response to667        # a POST MUST NOT cause a redirection without confirmation668        # from the user (of urllib.request, in this case).  In practice,669        # essentially all clients do redirect in this case, so we do670        # the same.671 672        # Be conciliant with URIs containing a space.  This is mainly673        # redundant with the more complete encoding done in http_error_302(),674        # but it is kept for compatibility with other callers.675        newurl = newurl.replace(' ', '%20')676 677        CONTENT_HEADERS = ("content-length", "content-type")678        newheaders = {k: v for k, v in req.headers.items()679                      if k.lower() not in CONTENT_HEADERS}680        return Request(newurl,681                       headers=newheaders,682                       origin_req_host=req.origin_req_host,683                       unverifiable=True)684 685    # Implementation note: To avoid the server sending us into an686    # infinite loop, the request object needs to track what URLs we687    # have already seen.  Do this by adding a handler-specific688    # attribute to the Request object.689    def http_error_302(self, req, fp, code, msg, headers):690        # Some servers (incorrectly) return multiple Location headers691        # (so probably same goes for URI).  Use first header.692        if "location" in headers:693            newurl = headers["location"]694        elif "uri" in headers:695            newurl = headers["uri"]696        else:697            return698 699        # fix a possible malformed URL700        urlparts = urlparse(newurl)701 702        # For security reasons we don't allow redirection to anything other703        # than http, https or ftp.704 705        if urlparts.scheme not in ('http', 'https', 'ftp', ''):706            raise HTTPError(707                newurl, code,708                "%s - Redirection to url '%s' is not allowed" % (msg, newurl),709                headers, fp)710 711        if not urlparts.path and urlparts.netloc:712            urlparts = list(urlparts)713            urlparts[2] = "/"714        newurl = urlunparse(urlparts)715 716        # http.client.parse_headers() decodes as ISO-8859-1.  Recover the717        # original bytes and percent-encode non-ASCII bytes, and any special718        # characters such as the space.719        newurl = quote(720            newurl, encoding="iso-8859-1", safe=string.punctuation)721        newurl = urljoin(req.full_url, newurl)722 723        # XXX Probably want to forget about the state of the current724        # request, although that might interact poorly with other725        # handlers that also use handler-specific request attributes726        new = self.redirect_request(req, fp, code, msg, headers, newurl)727        if new is None:728            return729 730        # loop detection731        # .redirect_dict has a key url if url was previously visited.732        if hasattr(req, 'redirect_dict'):733            visited = new.redirect_dict = req.redirect_dict734            if (visited.get(newurl, 0) >= self.max_repeats or735                len(visited) >= self.max_redirections):736                raise HTTPError(req.full_url, code,737                                self.inf_msg + msg, headers, fp)738        else:739            visited = new.redirect_dict = req.redirect_dict = {}740        visited[newurl] = visited.get(newurl, 0) + 1741 742        # Don't close the fp until we are sure that we won't use it743        # with HTTPError.744        fp.read()745        fp.close()746 747        return self.parent.open(new, timeout=req.timeout)748 749    http_error_301 = http_error_303 = http_error_307 = http_error_302750 751    inf_msg = "The HTTP server returned a redirect error that would " \752              "lead to an infinite loop.\n" \753              "The last 30x error message was:\n"754 755 756def _parse_proxy(proxy):757    """Return (scheme, user, password, host/port) given a URL or an authority.758 759    If a URL is supplied, it must have an authority (host:port) component.760    According to RFC 3986, having an authority component means the URL must761    have two slashes after the scheme.762    """763    scheme, r_scheme = _splittype(proxy)764    if not r_scheme.startswith("/"):765        # authority766        scheme = None767        authority = proxy768    else:769        # URL770        if not r_scheme.startswith("//"):771            raise ValueError("proxy URL with no authority: %r" % proxy)772        # We have an authority, so for RFC 3986-compliant URLs (by ss 3.773        # and 3.3.), path is empty or starts with '/'774        if '@' in r_scheme:775            host_separator = r_scheme.find('@')776            end = r_scheme.find("/", host_separator)777        else:778            end = r_scheme.find("/", 2)779        if end == -1:780            end = None781        authority = r_scheme[2:end]782    userinfo, hostport = _splituser(authority)783    if userinfo is not None:784        user, password = _splitpasswd(userinfo)785    else:786        user = password = None787    return scheme, user, password, hostport788 789class ProxyHandler(BaseHandler):790    # Proxies must be in front791    handler_order = 100792 793    def __init__(self, proxies=None):794        if proxies is None:795            proxies = getproxies()796        assert hasattr(proxies, 'keys'), "proxies must be a mapping"797        self.proxies = proxies798        for type, url in proxies.items():799            type = type.lower()800            setattr(self, '%s_open' % type,801                    lambda r, proxy=url, type=type, meth=self.proxy_open:802                        meth(r, proxy, type))803 804    def proxy_open(self, req, proxy, type):805        orig_type = req.type806        proxy_type, user, password, hostport = _parse_proxy(proxy)807        if proxy_type is None:808            proxy_type = orig_type809 810        if req.host and proxy_bypass(req.host):811            return None812 813        if user and password:814            user_pass = '%s:%s' % (unquote(user),815                                   unquote(password))816            creds = base64.b64encode(user_pass.encode()).decode("ascii")817            req.add_header('Proxy-authorization', 'Basic ' + creds)818        hostport = unquote(hostport)819        req.set_proxy(hostport, proxy_type)820        if orig_type == proxy_type or orig_type == 'https':821            # let other handlers take care of it822            return None823        else:824            # need to start over, because the other handlers don't825            # grok the proxy's URL type826            # e.g. if we have a constructor arg proxies like so:827            # {'http': 'ftp://proxy.example.com'}, we may end up turning828            # a request for http://acme.example.com/a into one for829            # ftp://proxy.example.com/a830            return self.parent.open(req, timeout=req.timeout)831 832class HTTPPasswordMgr:833 834    def __init__(self):835        self.passwd = {}836 837    def add_password(self, realm, uri, user, passwd):838        # uri could be a single URI or a sequence839        if isinstance(uri, str):840            uri = [uri]841        if realm not in self.passwd:842            self.passwd[realm] = {}843        for default_port in True, False:844            reduced_uri = tuple(845                self.reduce_uri(u, default_port) for u in uri)846            self.passwd[realm][reduced_uri] = (user, passwd)847 848    def find_user_password(self, realm, authuri):849        domains = self.passwd.get(realm, {})850        for default_port in True, False:851            reduced_authuri = self.reduce_uri(authuri, default_port)852            for uris, authinfo in domains.items():853                for uri in uris:854                    if self.is_suburi(uri, reduced_authuri):855                        return authinfo856        return None, None857 858    def reduce_uri(self, uri, default_port=True):859        """Accept authority or URI and extract only the authority and path."""860        # note HTTP URLs do not have a userinfo component861        parts = urlsplit(uri)862        if parts[1]:863            # URI864            scheme = parts[0]865            authority = parts[1]866            path = parts[2] or '/'867        else:868            # host or host:port869            scheme = None870            authority = uri871            path = '/'872        host, port = _splitport(authority)873        if default_port and port is None and scheme is not None:874            dport = {"http": 80,875                     "https": 443,876                     }.get(scheme)877            if dport is not None:878                authority = "%s:%d" % (host, dport)879        return authority, path880 881    def is_suburi(self, base, test):882        """Check if test is below base in a URI tree883 884        Both args must be URIs in reduced form.885        """886        if base == test:887            return True888        if base[0] != test[0]:889            return False890        prefix = base[1]891        if prefix[-1:] != '/':892            prefix += '/'893        return test[1].startswith(prefix)894 895 896class HTTPPasswordMgrWithDefaultRealm(HTTPPasswordMgr):897 898    def find_user_password(self, realm, authuri):899        user, password = HTTPPasswordMgr.find_user_password(self, realm,900                                                            authuri)901        if user is not None:902            return user, password903        return HTTPPasswordMgr.find_user_password(self, None, authuri)904 905 906class HTTPPasswordMgrWithPriorAuth(HTTPPasswordMgrWithDefaultRealm):907 908    def __init__(self, *args, **kwargs):909        self.authenticated = {}910        super().__init__(*args, **kwargs)911 912    def add_password(self, realm, uri, user, passwd, is_authenticated=False):913        self.update_authenticated(uri, is_authenticated)914        # Add a default for prior auth requests915        if realm is not None:916            super().add_password(None, uri, user, passwd)917        super().add_password(realm, uri, user, passwd)918 919    def update_authenticated(self, uri, is_authenticated=False):920        # uri could be a single URI or a sequence921        if isinstance(uri, str):922            uri = [uri]923 924        for default_port in True, False:925            for u in uri:926                reduced_uri = self.reduce_uri(u, default_port)927                self.authenticated[reduced_uri] = is_authenticated928 929    def is_authenticated(self, authuri):930        for default_port in True, False:931            reduced_authuri = self.reduce_uri(authuri, default_port)932            for uri in self.authenticated:933                if self.is_suburi(uri, reduced_authuri):934                    return self.authenticated[uri]935 936 937class AbstractBasicAuthHandler:938 939    # XXX this allows for multiple auth-schemes, but will stupidly pick940    # the last one with a realm specified.941 942    # allow for double- and single-quoted realm values943    # (single quotes are a violation of the RFC, but appear in the wild)944    rx = re.compile('(?:^|,)'   # start of the string or ','945                    '[ \t]*'    # optional whitespaces946                    '([^ \t,]+)' # scheme like "Basic"947                    '[ \t]+'    # mandatory whitespaces948                    # realm=xxx949                    # realm='xxx'950                    # realm="xxx"951                    'realm=(["\']?)([^"\']*)\\2',952                    re.I)953 954    # XXX could pre-emptively send auth info already accepted (RFC 2617,955    # end of section 2, and section 1.2 immediately after "credentials"956    # production).957 958    def __init__(self, password_mgr=None):959        if password_mgr is None:960            password_mgr = HTTPPasswordMgr()961        self.passwd = password_mgr962        self.add_password = self.passwd.add_password963 964    def _parse_realm(self, header):965        # parse WWW-Authenticate header: accept multiple challenges per header966        found_challenge = False967        for mo in AbstractBasicAuthHandler.rx.finditer(header):968            scheme, quote, realm = mo.groups()969            if quote not in ['"', "'"]:970                warnings.warn("Basic Auth Realm was unquoted",971                              UserWarning, 3)972 973            yield (scheme, realm)974 975            found_challenge = True976 977        if not found_challenge:978            if header:979                scheme = header.split()[0]980            else:981                scheme = ''982            yield (scheme, None)983 984    def http_error_auth_reqed(self, authreq, host, req, headers):985        # host may be an authority (without userinfo) or a URL with an986        # authority987        headers = headers.get_all(authreq)988        if not headers:989            # no header found990            return991 992        unsupported = None993        for header in headers:994            for scheme, realm in self._parse_realm(header):995                if scheme.lower() != 'basic':996                    unsupported = scheme997                    continue998 999                if realm is not None:1000                    # Use the first matching Basic challenge.1001                    # Ignore following challenges even if they use the Basic1002                    # scheme.1003                    return self.retry_http_basic_auth(host, req, realm)1004 1005        if unsupported is not None:1006            raise ValueError("AbstractBasicAuthHandler does not "1007                             "support the following scheme: %r"1008                             % (scheme,))1009 1010    def retry_http_basic_auth(self, host, req, realm):1011        user, pw = self.passwd.find_user_password(realm, host)1012        if pw is not None:1013            raw = "%s:%s" % (user, pw)1014            auth = "Basic " + base64.b64encode(raw.encode()).decode("ascii")1015            if req.get_header(self.auth_header, None) == auth:1016                return None1017            req.add_unredirected_header(self.auth_header, auth)1018            return self.parent.open(req, timeout=req.timeout)1019        else:1020            return None1021 1022    def http_request(self, req):1023        if (not hasattr(self.passwd, 'is_authenticated') or1024           not self.passwd.is_authenticated(req.full_url)):1025            return req1026 1027        if not req.has_header('Authorization'):1028            user, passwd = self.passwd.find_user_password(None, req.full_url)1029            credentials = '{0}:{1}'.format(user, passwd).encode()1030            auth_str = base64.standard_b64encode(credentials).decode()1031            req.add_unredirected_header('Authorization',1032                                        'Basic {}'.format(auth_str.strip()))1033        return req1034 1035    def http_response(self, req, response):1036        if hasattr(self.passwd, 'is_authenticated'):1037            if 200 <= response.code < 300:1038                self.passwd.update_authenticated(req.full_url, True)1039            else:1040                self.passwd.update_authenticated(req.full_url, False)1041        return response1042 1043    https_request = http_request1044    https_response = http_response1045 1046 1047 1048class HTTPBasicAuthHandler(AbstractBasicAuthHandler, BaseHandler):1049 1050    auth_header = 'Authorization'1051 1052    def http_error_401(self, req, fp, code, msg, headers):1053        url = req.full_url1054        response = self.http_error_auth_reqed('www-authenticate',1055                                          url, req, headers)1056        return response1057 1058 1059class ProxyBasicAuthHandler(AbstractBasicAuthHandler, BaseHandler):1060 1061    auth_header = 'Proxy-authorization'1062 1063    def http_error_407(self, req, fp, code, msg, headers):1064        # http_error_auth_reqed requires that there is no userinfo component in1065        # authority.  Assume there isn't one, since urllib.request does not (and1066        # should not, RFC 3986 s. 3.2.1) support requests for URLs containing1067        # userinfo.1068        authority = req.host1069        response = self.http_error_auth_reqed('proxy-authenticate',1070                                          authority, req, headers)1071        return response1072 1073 1074# Return n random bytes.1075_randombytes = os.urandom1076 1077 1078class AbstractDigestAuthHandler:1079    # Digest authentication is specified in RFC 2617.1080 1081    # XXX The client does not inspect the Authentication-Info header1082    # in a successful response.1083 1084    # XXX It should be possible to test this implementation against1085    # a mock server that just generates a static set of challenges.1086 1087    # XXX qop="auth-int" supports is shaky1088 1089    def __init__(self, passwd=None):1090        if passwd is None:1091            passwd = HTTPPasswordMgr()1092        self.passwd = passwd1093        self.add_password = self.passwd.add_password1094        self.retried = 01095        self.nonce_count = 01096        self.last_nonce = None1097 1098    def reset_retry_count(self):1099        self.retried = 01100 1101    def http_error_auth_reqed(self, auth_header, host, req, headers):1102        authreq = headers.get(auth_header, None)1103        if self.retried > 5:1104            # Don't fail endlessly - if we failed once, we'll probably1105            # fail a second time. Hm. Unless the Password Manager is1106            # prompting for the information. Crap. This isn't great1107            # but it's better than the current 'repeat until recursion1108            # depth exceeded' approach <wink>1109            raise HTTPError(req.full_url, 401, "digest auth failed",1110                            headers, None)1111        else:1112            self.retried += 11113        if authreq:1114            scheme = authreq.split()[0]1115            if scheme.lower() == 'digest':1116                return self.retry_http_digest_auth(req, authreq)1117            elif scheme.lower() != 'basic':1118                raise ValueError("AbstractDigestAuthHandler does not support"1119                                 " the following scheme: '%s'" % scheme)1120 1121    def retry_http_digest_auth(self, req, auth):1122        token, challenge = auth.split(' ', 1)1123        chal = parse_keqv_list(filter(None, parse_http_list(challenge)))1124        auth = self.get_authorization(req, chal)1125        if auth:1126            auth_val = 'Digest %s' % auth1127            if req.headers.get(self.auth_header, None) == auth_val:1128                return None1129            req.add_unredirected_header(self.auth_header, auth_val)1130            resp = self.parent.open(req, timeout=req.timeout)1131            return resp1132 1133    def get_cnonce(self, nonce):1134        # The cnonce-value is an opaque1135        # quoted string value provided by the client and used by both client1136        # and server to avoid chosen plaintext attacks, to provide mutual1137        # authentication, and to provide some message integrity protection.1138        # This isn't a fabulous effort, but it's probably Good Enough.1139        s = "%s:%s:%s:" % (self.nonce_count, nonce, time.ctime())1140        b = s.encode("ascii") + _randombytes(8)1141        dig = hashlib.sha1(b).hexdigest()1142        return dig[:16]1143 1144    def get_authorization(self, req, chal):1145        try:1146            realm = chal['realm']1147            nonce = chal['nonce']1148            qop = chal.get('qop')1149            algorithm = chal.get('algorithm', 'MD5')1150            # mod_digest doesn't send an opaque, even though it isn't1151            # supposed to be optional1152            opaque = chal.get('opaque', None)1153        except KeyError:1154            return None1155 1156        H, KD = self.get_algorithm_impls(algorithm)1157        if H is None:1158            return None1159 1160        user, pw = self.passwd.find_user_password(realm, req.full_url)1161        if user is None:1162            return None1163 1164        # XXX not implemented yet1165        if req.data is not None:1166            entdig = self.get_entity_digest(req.data, chal)1167        else:1168            entdig = None1169 1170        A1 = "%s:%s:%s" % (user, realm, pw)1171        A2 = "%s:%s" % (req.get_method(),1172                        # XXX selector: what about proxies and full urls1173                        req.selector)1174        # NOTE: As per  RFC 2617, when server sends "auth,auth-int", the client could use either `auth`1175        #     or `auth-int` to the response back. we use `auth` to send the response back.1176        if qop is None:1177            respdig = KD(H(A1), "%s:%s" % (nonce, H(A2)))1178        elif 'auth' in qop.split(','):1179            if nonce == self.last_nonce:1180                self.nonce_count += 11181            else:1182                self.nonce_count = 11183                self.last_nonce = nonce1184            ncvalue = '%08x' % self.nonce_count1185            cnonce = self.get_cnonce(nonce)1186            noncebit = "%s:%s:%s:%s:%s" % (nonce, ncvalue, cnonce, 'auth', H(A2))1187            respdig = KD(H(A1), noncebit)1188        else:1189            # XXX handle auth-int.1190            raise URLError("qop '%s' is not supported." % qop)1191 1192        # XXX should the partial digests be encoded too?1193 1194        base = 'username="%s", realm="%s", nonce="%s", uri="%s", ' \1195               'response="%s"' % (user, realm, nonce, req.selector,1196                                  respdig)1197        if opaque:1198            base += ', opaque="%s"' % opaque1199        if entdig:1200            base += ', digest="%s"' % entdig1201        base += ', algorithm="%s"' % algorithm1202        if qop:1203            base += ', qop=auth, nc=%s, cnonce="%s"' % (ncvalue, cnonce)1204        return base1205 1206    def get_algorithm_impls(self, algorithm):1207        # lambdas assume digest modules are imported at the top level1208        if algorithm == 'MD5':1209            H = lambda x: hashlib.md5(x.encode("ascii")).hexdigest()1210        elif algorithm == 'SHA':1211            H = lambda x: hashlib.sha1(x.encode("ascii")).hexdigest()1212        # XXX MD5-sess1213        else:1214            raise ValueError("Unsupported digest authentication "1215                             "algorithm %r" % algorithm)1216        KD = lambda s, d: H("%s:%s" % (s, d))1217        return H, KD1218 1219    def get_entity_digest(self, data, chal):1220        # XXX not implemented yet1221        return None1222 1223 1224class HTTPDigestAuthHandler(BaseHandler, AbstractDigestAuthHandler):1225    """An authentication protocol defined by RFC 20691226 1227    Digest authentication improves on basic authentication because it1228    does not transmit passwords in the clear.1229    """1230 1231    auth_header = 'Authorization'1232    handler_order = 490  # before Basic auth1233 1234    def http_error_401(self, req, fp, code, msg, headers):1235        host = urlparse(req.full_url)[1]1236        retry = self.http_error_auth_reqed('www-authenticate',1237                                           host, req, headers)1238        self.reset_retry_count()1239        return retry1240 1241 1242class ProxyDigestAuthHandler(BaseHandler, AbstractDigestAuthHandler):1243 1244    auth_header = 'Proxy-Authorization'1245    handler_order = 490  # before Basic auth1246 1247    def http_error_407(self, req, fp, code, msg, headers):1248        host = req.host1249        retry = self.http_error_auth_reqed('proxy-authenticate',1250                                           host, req, headers)1251        self.reset_retry_count()1252        return retry1253 1254class AbstractHTTPHandler(BaseHandler):1255 1256    def __init__(self, debuglevel=0):1257        self._debuglevel = debuglevel1258 1259    def set_http_debuglevel(self, level):1260        self._debuglevel = level1261 1262    def _get_content_length(self, request):1263        return http.client.HTTPConnection._get_content_length(1264            request.data,1265            request.get_method())1266 1267    def do_request_(self, request):1268        host = request.host1269        if not host:1270            raise URLError('no host given')1271 1272        if request.data is not None:  # POST1273            data = request.data1274            if isinstance(data, str):1275                msg = "POST data should be bytes, an iterable of bytes, " \1276                      "or a file object. It cannot be of type str."1277                raise TypeError(msg)1278            if not request.has_header('Content-type'):1279                request.add_unredirected_header(1280                    'Content-type',1281                    'application/x-www-form-urlencoded')1282            if (not request.has_header('Content-length')1283                    and not request.has_header('Transfer-encoding')):1284                content_length = self._get_content_length(request)1285                if content_length is not None:1286                    request.add_unredirected_header(1287                            'Content-length', str(content_length))1288                else:1289                    request.add_unredirected_header(1290                            'Transfer-encoding', 'chunked')1291 1292        sel_host = host1293        if request.has_proxy():1294            scheme, sel = _splittype(request.selector)1295            sel_host, sel_path = _splithost(sel)1296        if not request.has_header('Host'):1297            request.add_unredirected_header('Host', sel_host)1298        for name, value in self.parent.addheaders:1299            name = name.capitalize()1300            if not request.has_header(name):1301                request.add_unredirected_header(name, value)1302 1303        return request1304 1305    def do_open(self, http_class, req, **http_conn_args):1306        """Return an HTTPResponse object for the request, using http_class.1307 1308        http_class must implement the HTTPConnection API from http.client.1309        """1310        host = req.host1311        if not host:1312            raise URLError('no host given')1313 1314        # will parse host:port1315        h = http_class(host, timeout=req.timeout, **http_conn_args)1316        h.set_debuglevel(self._debuglevel)1317 1318        headers = dict(req.unredirected_hdrs)1319        headers.update({k: v for k, v in req.headers.items()1320                        if k not in headers})1321 1322        # TODO(jhylton): Should this be redesigned to handle1323        # persistent connections?1324 1325        # We want to make an HTTP/1.1 request, but the addinfourl1326        # class isn't prepared to deal with a persistent connection.1327        # It will try to read all remaining data from the socket,1328        # which will block while the server waits for the next request.1329        # So make sure the connection gets closed after the (only)1330        # request.1331        headers["Connection"] = "close"1332        headers = {name.title(): val for name, val in headers.items()}1333 1334        if req._tunnel_host:1335            tunnel_headers = {}1336            proxy_auth_hdr = "Proxy-Authorization"1337            if proxy_auth_hdr in headers:1338                tunnel_headers[proxy_auth_hdr] = headers[proxy_auth_hdr]1339                # Proxy-Authorization should not be sent to origin1340                # server.1341                del headers[proxy_auth_hdr]1342            h.set_tunnel(req._tunnel_host, headers=tunnel_headers)1343 1344        try:1345            try:1346                h.request(req.get_method(), req.selector, req.data, headers,1347                          encode_chunked=req.has_header('Transfer-encoding'))1348            except OSError as err: # timeout error1349                raise URLError(err)1350            r = h.getresponse()1351        except:1352            h.close()1353            raise1354 1355        # If the server does not send us a 'Connection: close' header,1356        # HTTPConnection assumes the socket should be left open. Manually1357        # mark the socket to be closed when this response object goes away.1358        if h.sock:1359            h.sock.close()1360            h.sock = None1361 1362        r.url = req.get_full_url()1363        # This line replaces the .msg attribute of the HTTPResponse1364        # with .headers, because urllib clients expect the response to1365        # have the reason in .msg.  It would be good to mark this1366        # attribute is deprecated and get then to use info() or1367        # .headers.1368        r.msg = r.reason1369        return r1370 1371 1372class HTTPHandler(AbstractHTTPHandler):1373 1374    def http_open(self, req):1375        return self.do_open(http.client.HTTPConnection, req)1376 1377    http_request = AbstractHTTPHandler.do_request_1378 1379if hasattr(http.client, 'HTTPSConnection'):1380 1381    class HTTPSHandler(AbstractHTTPHandler):1382 1383        def __init__(self, debuglevel=0, context=None, check_hostname=None):1384            AbstractHTTPHandler.__init__(self, debuglevel)1385            self._context = context1386            self._check_hostname = check_hostname1387 1388        def https_open(self, req):1389            return self.do_open(http.client.HTTPSConnection, req,1390                context=self._context, check_hostname=self._check_hostname)1391 1392        https_request = AbstractHTTPHandler.do_request_1393 1394    __all__.append('HTTPSHandler')1395 1396class HTTPCookieProcessor(BaseHandler):1397    def __init__(self, cookiejar=None):1398        import http.cookiejar1399        if cookiejar is None:1400            cookiejar = http.cookiejar.CookieJar()1401        self.cookiejar = cookiejar1402 1403    def http_request(self, request):1404        self.cookiejar.add_cookie_header(request)1405        return request1406 1407    def http_response(self, request, response):1408        self.cookiejar.extract_cookies(response, request)1409        return response1410 1411    https_request = http_request1412    https_response = http_response1413 1414class UnknownHandler(BaseHandler):1415    def unknown_open(self, req):1416        type = req.type1417        raise URLError('unknown url type: %s' % type)1418 1419def parse_keqv_list(l):1420    """Parse list of key=value strings where keys are not duplicated."""1421    parsed = {}1422    for elt in l:1423        k, v = elt.split('=', 1)1424        if v[0] == '"' and v[-1] == '"':1425            v = v[1:-1]1426        parsed[k] = v1427    return parsed1428 1429def parse_http_list(s):1430    """Parse lists as described by RFC 2068 Section 2.1431 1432    In particular, parse comma-separated lists where the elements of1433    the list may include quoted-strings.  A quoted-string could1434    contain a comma.  A non-quoted string could have quotes in the1435    middle.  Neither commas nor quotes count if they are escaped.1436    Only double-quotes count, not single-quotes.1437    """1438    res = []1439    part = ''1440 1441    escape = quote = False1442    for cur in s:1443        if escape:1444            part += cur1445            escape = False1446            continue1447        if quote:1448            if cur == '\\':1449                escape = True1450                continue1451            elif cur == '"':1452                quote = False1453            part += cur1454            continue1455 1456        if cur == ',':1457            res.append(part)1458            part = ''1459            continue1460 1461        if cur == '"':1462            quote = True1463 1464        part += cur1465 1466    # append last part1467    if part:1468        res.append(part)1469 1470    return [part.strip() for part in res]1471 1472class FileHandler(BaseHandler):1473    # Use local file or FTP depending on form of URL1474    def file_open(self, req):1475        url = req.selector1476        if url[:2] == '//' and url[2:3] != '/' and (req.host and1477                req.host != 'localhost'):1478            if not req.host in self.get_names():1479                raise URLError("file:// scheme is supported only on localhost")1480        else:1481            return self.open_local_file(req)1482 1483    # names for the localhost1484    names = None1485    def get_names(self):1486        if FileHandler.names is None:1487            try:1488                FileHandler.names = tuple(1489                    socket.gethostbyname_ex('localhost')[2] +1490                    socket.gethostbyname_ex(socket.gethostname())[2])1491            except socket.gaierror:1492                FileHandler.names = (socket.gethostbyname('localhost'),)1493        return FileHandler.names1494 1495    # not entirely sure what the rules are here1496    def open_local_file(self, req):1497        import email.utils1498        import mimetypes1499        host = req.host1500        filename = req.selector1501        localfile = url2pathname(filename)1502        try:1503            stats = os.stat(localfile)1504            size = stats.st_size1505            modified = email.utils.formatdate(stats.st_mtime, usegmt=True)1506            mtype = mimetypes.guess_type(filename)[0]1507            headers = email.message_from_string(1508                'Content-type: %s\nContent-length: %d\nLast-modified: %s\n' %1509                (mtype or 'text/plain', size, modified))1510            if host:1511                host, port = _splitport(host)1512            if not host or \1513                (not port and _safe_gethostbyname(host) in self.get_names()):1514                if host:1515                    origurl = 'file://' + host + filename1516                else:1517                    origurl = 'file://' + filename1518                return addinfourl(open(localfile, 'rb'), headers, origurl)1519        except OSError as exp:1520            raise URLError(exp)1521        raise URLError('file not on local host')1522 1523def _safe_gethostbyname(host):1524    try:1525        return socket.gethostbyname(host)1526    except socket.gaierror:1527        return None1528 1529class FTPHandler(BaseHandler):1530    def ftp_open(self, req):1531        import ftplib1532        import mimetypes1533        host = req.host1534        if not host:1535            raise URLError('ftp error: no host given')1536        host, port = _splitport(host)1537        if port is None:1538            port = ftplib.FTP_PORT1539        else:1540            port = int(port)1541 1542        # username/password handling1543        user, host = _splituser(host)1544        if user:1545            user, passwd = _splitpasswd(user)1546        else:1547            passwd = None1548        host = unquote(host)1549        user = user or ''1550        passwd = passwd or ''1551 1552        try:1553            host = socket.gethostbyname(host)1554        except OSError as msg:1555            raise URLError(msg)1556        path, attrs = _splitattr(req.selector)1557        dirs = path.split('/')1558        dirs = list(map(unquote, dirs))1559        dirs, file = dirs[:-1], dirs[-1]1560        if dirs and not dirs[0]:1561            dirs = dirs[1:]1562        try:1563            fw = self.connect_ftp(user, passwd, host, port, dirs, req.timeout)1564            type = file and 'I' or 'D'1565            for attr in attrs:1566                attr, value = _splitvalue(attr)1567                if attr.lower() == 'type' and \1568                   value in ('a', 'A', 'i', 'I', 'd', 'D'):1569                    type = value.upper()1570            fp, retrlen = fw.retrfile(file, type)1571            headers = ""1572            mtype = mimetypes.guess_type(req.full_url)[0]1573            if mtype:1574                headers += "Content-type: %s\n" % mtype1575            if retrlen is not None and retrlen >= 0:1576                headers += "Content-length: %d\n" % retrlen1577            headers = email.message_from_string(headers)1578            return addinfourl(fp, headers, req.full_url)1579        except ftplib.all_errors as exp:1580            exc = URLError('ftp error: %r' % exp)1581            raise exc.with_traceback(sys.exc_info()[2])1582 1583    def connect_ftp(self, user, passwd, host, port, dirs, timeout):1584        return ftpwrapper(user, passwd, host, port, dirs, timeout,1585                          persistent=False)1586 1587class CacheFTPHandler(FTPHandler):1588    # XXX would be nice to have pluggable cache strategies1589    # XXX this stuff is definitely not thread safe1590    def __init__(self):1591        self.cache = {}1592        self.timeout = {}1593        self.soonest = 01594        self.delay = 601595        self.max_conns = 161596 1597    def setTimeout(self, t):1598        self.delay = t1599 1600    def setMaxConns(self, m):1601        self.max_conns = m1602 1603    def connect_ftp(self, user, passwd, host, port, dirs, timeout):1604        key = user, host, port, '/'.join(dirs), timeout1605        if key in self.cache:1606            self.timeout[key] = time.time() + self.delay1607        else:1608            self.cache[key] = ftpwrapper(user, passwd, host, port,1609                                         dirs, timeout)1610            self.timeout[key] = time.time() + self.delay1611        self.check_cache()1612        return self.cache[key]1613 1614    def check_cache(self):1615        # first check for old ones1616        t = time.time()1617        if self.soonest <= t:1618            for k, v in list(self.timeout.items()):1619                if v < t:1620                    self.cache[k].close()1621                    del self.cache[k]1622                    del self.timeout[k]1623        self.soonest = min(list(self.timeout.values()))1624 1625        # then check the size1626        if len(self.cache) == self.max_conns:1627            for k, v in list(self.timeout.items()):1628                if v == self.soonest:1629                    del self.cache[k]1630                    del self.timeout[k]1631                    break1632            self.soonest = min(list(self.timeout.values()))1633 1634    def clear_cache(self):1635        for conn in self.cache.values():1636            conn.close()1637        self.cache.clear()1638        self.timeout.clear()1639 1640class DataHandler(BaseHandler):1641    def data_open(self, req):1642        # data URLs as specified in RFC 2397.1643        #1644        # ignores POSTed data1645        #1646        # syntax:1647        # dataurl   := "data:" [ mediatype ] [ ";base64" ] "," data1648        # mediatype := [ type "/" subtype ] *( ";" parameter )1649        # data      := *urlchar1650        # parameter := attribute "=" value1651        url = req.full_url1652 1653        scheme, data = url.split(":",1)1654        mediatype, data = data.split(",",1)1655 1656        # Disallow control characters within mediatype.1657        if re.search(r"[\x00-\x1F\x7F]", mediatype):1658            raise ValueError(1659                "Control characters not allowed in data: mediatype")1660 1661        # even base64 encoded data URLs might be quoted so unquote in any case:1662        data = unquote_to_bytes(data)1663        if mediatype.endswith(";base64"):1664            data = base64.decodebytes(data)1665            mediatype = mediatype[:-7]1666 1667        if not mediatype:1668            mediatype = "text/plain;charset=US-ASCII"1669 1670        headers = email.message_from_string("Content-type: %s\nContent-length: %d\n" %1671            (mediatype, len(data)))1672 1673        return addinfourl(io.BytesIO(data), headers, url)1674 1675 1676# Code move from the old urllib module1677 1678MAXFTPCACHE = 10        # Trim the ftp cache beyond this size1679 1680# Helper for non-unix systems1681if os.name == 'nt':1682    from nturl2path import url2pathname, pathname2url1683else:1684    def url2pathname(pathname):1685        """OS-specific conversion from a relative URL of the 'file' scheme1686        to a file system path; not recommended for general use."""1687        return unquote(pathname)1688 1689    def pathname2url(pathname):1690        """OS-specific conversion from a file system path to a relative URL1691        of the 'file' scheme; not recommended for general use."""1692        return quote(pathname)1693 1694 1695ftpcache = {}1696 1697 1698class URLopener:1699    """Class to open URLs.1700    This is a class rather than just a subroutine because we may need1701    more than one set of global protocol-specific options.1702    Note -- this is a base class for those who don't want the1703    automatic handling of errors type 302 (relocated) and 4011704    (authorization needed)."""1705 1706    __tempfiles = None1707 1708    version = "Python-urllib/%s" % __version__1709 1710    # Constructor1711    def __init__(self, proxies=None, **x509):1712        msg = "%(class)s style of invoking requests is deprecated. " \1713              "Use newer urlopen functions/methods" % {'class': self.__class__.__name__}1714        warnings.warn(msg, DeprecationWarning, stacklevel=3)1715        if proxies is None:1716            proxies = getproxies()1717        assert hasattr(proxies, 'keys'), "proxies must be a mapping"1718        self.proxies = proxies1719        self.key_file = x509.get('key_file')1720        self.cert_file = x509.get('cert_file')1721        self.addheaders = [('User-Agent', self.version), ('Accept', '*/*')]1722        self.__tempfiles = []1723        self.__unlink = os.unlink # See cleanup()1724        self.tempcache = None1725        # Undocumented feature: if you assign {} to tempcache,1726        # it is used to cache files retrieved with1727        # self.retrieve().  This is not enabled by default1728        # since it does not work for changing documents (and I1729        # haven't got the logic to check expiration headers1730        # yet).1731        self.ftpcache = ftpcache1732        # Undocumented feature: you can use a different1733        # ftp cache by assigning to the .ftpcache member;1734        # in case you want logically independent URL openers1735        # XXX This is not threadsafe.  Bah.1736 1737    def __del__(self):1738        self.close()1739 1740    def close(self):1741        self.cleanup()1742 1743    def cleanup(self):1744        # This code sometimes runs when the rest of this module1745        # has already been deleted, so it can't use any globals1746        # or import anything.1747        if self.__tempfiles:1748            for file in self.__tempfiles:1749                try:1750                    self.__unlink(file)1751                except OSError:1752                    pass1753            del self.__tempfiles[:]1754        if self.tempcache:1755            self.tempcache.clear()1756 1757    def addheader(self, *args):1758        """Add a header to be used by the HTTP interface only1759        e.g. u.addheader('Accept', 'sound/basic')"""1760        self.addheaders.append(args)1761 1762    # External interface1763    def open(self, fullurl, data=None):1764        """Use URLopener().open(file) instead of open(file, 'r')."""1765        fullurl = unwrap(_to_bytes(fullurl))1766        fullurl = quote(fullurl, safe="%/:=&?~#+!$,;'@()*[]|")1767        if self.tempcache and fullurl in self.tempcache:1768            filename, headers = self.tempcache[fullurl]1769            fp = open(filename, 'rb')1770            return addinfourl(fp, headers, fullurl)1771        urltype, url = _splittype(fullurl)1772        if not urltype:1773            urltype = 'file'1774        if urltype in self.proxies:1775            proxy = self.proxies[urltype]1776            urltype, proxyhost = _splittype(proxy)1777            host, selector = _splithost(proxyhost)1778            url = (host, fullurl) # Signal special case to open_*()1779        else:1780            proxy = None1781        name = 'open_' + urltype1782        self.type = urltype1783        name = name.replace('-', '_')1784        if not hasattr(self, name) or name == 'open_local_file':1785            if proxy:1786                return self.open_unknown_proxy(proxy, fullurl, data)1787            else:1788                return self.open_unknown(fullurl, data)1789        try:1790            if data is None:1791                return getattr(self, name)(url)1792            else:1793                return getattr(self, name)(url, data)1794        except (HTTPError, URLError):1795            raise1796        except OSError as msg:1797            raise OSError('socket error', msg).with_traceback(sys.exc_info()[2])1798 1799    def open_unknown(self, fullurl, data=None):1800        """Overridable interface to open unknown URL type."""1801        type, url = _splittype(fullurl)1802        raise OSError('url error', 'unknown url type', type)1803 1804    def open_unknown_proxy(self, proxy, fullurl, data=None):1805        """Overridable interface to open unknown URL type."""1806        type, url = _splittype(fullurl)1807        raise OSError('url error', 'invalid proxy for %s' % type, proxy)1808 1809    # External interface1810    def retrieve(self, url, filename=None, reporthook=None, data=None):1811        """retrieve(url) returns (filename, headers) for a local object1812        or (tempfilename, headers) for a remote object."""1813        url = unwrap(_to_bytes(url))1814        if self.tempcache and url in self.tempcache:1815            return self.tempcache[url]1816        type, url1 = _splittype(url)1817        if filename is None and (not type or type == 'file'):1818            try:1819                fp = self.open_local_file(url1)1820                hdrs = fp.info()1821                fp.close()1822                return url2pathname(_splithost(url1)[1]), hdrs1823            except OSError:1824                pass1825        fp = self.open(url, data)1826        try:1827            headers = fp.info()1828            if filename:1829                tfp = open(filename, 'wb')1830            else:1831                garbage, path = _splittype(url)1832                garbage, path = _splithost(path or "")1833                path, garbage = _splitquery(path or "")1834                path, garbage = _splitattr(path or "")1835                suffix = os.path.splitext(path)[1]1836                (fd, filename) = tempfile.mkstemp(suffix)1837                self.__tempfiles.append(filename)1838                tfp = os.fdopen(fd, 'wb')1839            try:1840                result = filename, headers1841                if self.tempcache is not None:1842                    self.tempcache[url] = result1843                bs = 1024*81844                size = -11845                read = 01846                blocknum = 01847                if "content-length" in headers:1848                    size = int(headers["Content-Length"])1849                if reporthook:1850                    reporthook(blocknum, bs, size)1851                while 1:1852                    block = fp.read(bs)1853                    if not block:1854                        break1855                    read += len(block)1856                    tfp.write(block)1857                    blocknum += 11858                    if reporthook:1859                        reporthook(blocknum, bs, size)1860            finally:1861                tfp.close()1862        finally:1863            fp.close()1864 1865        # raise exception if actual size does not match content-length header1866        if size >= 0 and read < size:1867            raise ContentTooShortError(1868                "retrieval incomplete: got only %i out of %i bytes"1869                % (read, size), result)1870 1871        return result1872 1873    # Each method named open_<type> knows how to open that type of URL1874 1875    def _open_generic_http(self, connection_factory, url, data):1876        """Make an HTTP connection using connection_class.1877 1878        This is an internal method that should be called from1879        open_http() or open_https().1880 1881        Arguments:1882        - connection_factory should take a host name and return an1883          HTTPConnection instance.1884        - url is the url to retrieval or a host, relative-path pair.1885        - data is payload for a POST request or None.1886        """1887 1888        user_passwd = None1889        proxy_passwd= None1890        if isinstance(url, str):1891            host, selector = _splithost(url)1892            if host:1893                user_passwd, host = _splituser(host)1894                host = unquote(host)1895            realhost = host1896        else:1897            host, selector = url1898            # check whether the proxy contains authorization information1899            proxy_passwd, host = _splituser(host)1900            # now we proceed with the url we want to obtain1901            urltype, rest = _splittype(selector)1902            url = rest1903            user_passwd = None1904            if urltype.lower() != 'http':1905                realhost = None1906            else:1907                realhost, rest = _splithost(rest)1908                if realhost:1909                    user_passwd, realhost = _splituser(realhost)1910                if user_passwd:1911                    selector = "%s://%s%s" % (urltype, realhost, rest)1912                if proxy_bypass(realhost):1913                    host = realhost1914 1915        if not host: raise OSError('http error', 'no host given')1916 1917        if proxy_passwd:1918            proxy_passwd = unquote(proxy_passwd)1919            proxy_auth = base64.b64encode(proxy_passwd.encode()).decode('ascii')1920        else:1921            proxy_auth = None1922 1923        if user_passwd:1924            user_passwd = unquote(user_passwd)1925            auth = base64.b64encode(user_passwd.encode()).decode('ascii')1926        else:1927            auth = None1928        http_conn = connection_factory(host)1929        headers = {}1930        if proxy_auth:1931            headers["Proxy-Authorization"] = "Basic %s" % proxy_auth1932        if auth:1933            headers["Authorization"] =  "Basic %s" % auth1934        if realhost:1935            headers["Host"] = realhost1936 1937        # Add Connection:close as we don't support persistent connections yet.1938        # This helps in closing the socket and avoiding ResourceWarning1939 1940        headers["Connection"] = "close"1941 1942        for header, value in self.addheaders:1943            headers[header] = value1944 1945        if data is not None:1946            headers["Content-Type"] = "application/x-www-form-urlencoded"1947            http_conn.request("POST", selector, data, headers)1948        else:1949            http_conn.request("GET", selector, headers=headers)1950 1951        try:1952            response = http_conn.getresponse()1953        except http.client.BadStatusLine:1954            # something went wrong with the HTTP status line1955            raise URLError("http protocol error: bad status line")1956 1957        # According to RFC 2616, "2xx" code indicates that the client's1958        # request was successfully received, understood, and accepted.1959        if 200 <= response.status < 300:1960            return addinfourl(response, response.msg, "http:" + url,1961                              response.status)1962        else:1963            return self.http_error(1964                url, response.fp,1965                response.status, response.reason, response.msg, data)1966 1967    def open_http(self, url, data=None):1968        """Use HTTP protocol."""1969        return self._open_generic_http(http.client.HTTPConnection, url, data)1970 1971    def http_error(self, url, fp, errcode, errmsg, headers, data=None):1972        """Handle http errors.1973 1974        Derived class can override this, or provide specific handlers1975        named http_error_DDD where DDD is the 3-digit error code."""1976        # First check if there's a specific handler for this error1977        name = 'http_error_%d' % errcode1978        if hasattr(self, name):1979            method = getattr(self, name)1980            if data is None:1981                result = method(url, fp, errcode, errmsg, headers)1982            else:1983                result = method(url, fp, errcode, errmsg, headers, data)1984            if result: return result1985        return self.http_error_default(url, fp, errcode, errmsg, headers)1986 1987    def http_error_default(self, url, fp, errcode, errmsg, headers):1988        """Default error handler: close the connection and raise OSError."""1989        fp.close()1990        raise HTTPError(url, errcode, errmsg, headers, None)1991 1992    if _have_ssl:1993        def _https_connection(self, host):1994            return http.client.HTTPSConnection(host,1995                                           key_file=self.key_file,1996                                           cert_file=self.cert_file)1997 1998        def open_https(self, url, data=None):1999            """Use HTTPS protocol."""2000            return self._open_generic_http(self._https_connection, url, data)2001 2002    def open_file(self, url):2003        """Use local file or FTP depending on form of URL."""2004        if not isinstance(url, str):2005            raise URLError('file error: proxy support for file protocol currently not implemented')2006        if url[:2] == '//' and url[2:3] != '/' and url[2:12].lower() != 'localhost/':2007            raise ValueError("file:// scheme is supported only on localhost")2008        else:2009            return self.open_local_file(url)2010 2011    def open_local_file(self, url):2012        """Use local file."""2013        import email.utils2014        import mimetypes2015        host, file = _splithost(url)2016        localname = url2pathname(file)2017        try:2018            stats = os.stat(localname)2019        except OSError as e:2020            raise URLError(e.strerror, e.filename)2021        size = stats.st_size2022        modified = email.utils.formatdate(stats.st_mtime, usegmt=True)2023        mtype = mimetypes.guess_type(url)[0]2024        headers = email.message_from_string(2025            'Content-Type: %s\nContent-Length: %d\nLast-modified: %s\n' %2026            (mtype or 'text/plain', size, modified))2027        if not host:2028            urlfile = file2029            if file[:1] == '/':2030                urlfile = 'file://' + file2031            return addinfourl(open(localname, 'rb'), headers, urlfile)2032        host, port = _splitport(host)2033        if (not port2034           and socket.gethostbyname(host) in ((localhost(),) + thishost())):2035            urlfile = file2036            if file[:1] == '/':2037                urlfile = 'file://' + file2038            elif file[:2] == './':2039                raise ValueError("local file url may start with / or file:. Unknown url of type: %s" % url)2040            return addinfourl(open(localname, 'rb'), headers, urlfile)2041        raise URLError('local file error: not on local host')2042 2043    def open_ftp(self, url):2044        """Use FTP protocol."""2045        if not isinstance(url, str):2046            raise URLError('ftp error: proxy support for ftp protocol currently not implemented')2047        import mimetypes2048        host, path = _splithost(url)2049        if not host: raise URLError('ftp error: no host given')2050        host, port = _splitport(host)2051        user, host = _splituser(host)2052        if user: user, passwd = _splitpasswd(user)2053        else: passwd = None2054        host = unquote(host)2055        user = unquote(user or '')2056        passwd = unquote(passwd or '')2057        host = socket.gethostbyname(host)2058        if not port:2059            import ftplib2060            port = ftplib.FTP_PORT2061        else:2062            port = int(port)2063        path, attrs = _splitattr(path)2064        path = unquote(path)2065        dirs = path.split('/')2066        dirs, file = dirs[:-1], dirs[-1]2067        if dirs and not dirs[0]: dirs = dirs[1:]2068        if dirs and not dirs[0]: dirs[0] = '/'2069        key = user, host, port, '/'.join(dirs)2070        # XXX thread unsafe!2071        if len(self.ftpcache) > MAXFTPCACHE:2072            # Prune the cache, rather arbitrarily2073            for k in list(self.ftpcache):2074                if k != key:2075                    v = self.ftpcache[k]2076                    del self.ftpcache[k]2077                    v.close()2078        try:2079            if key not in self.ftpcache:2080                self.ftpcache[key] = \2081                    ftpwrapper(user, passwd, host, port, dirs)2082            if not file: type = 'D'2083            else: type = 'I'2084            for attr in attrs:2085                attr, value = _splitvalue(attr)2086                if attr.lower() == 'type' and \2087                   value in ('a', 'A', 'i', 'I', 'd', 'D'):2088                    type = value.upper()2089            (fp, retrlen) = self.ftpcache[key].retrfile(file, type)2090            mtype = mimetypes.guess_type("ftp:" + url)[0]2091            headers = ""2092            if mtype:2093                headers += "Content-Type: %s\n" % mtype2094            if retrlen is not None and retrlen >= 0:2095                headers += "Content-Length: %d\n" % retrlen2096            headers = email.message_from_string(headers)2097            return addinfourl(fp, headers, "ftp:" + url)2098        except ftperrors() as exp:2099            raise URLError('ftp error %r' % exp).with_traceback(sys.exc_info()[2])2100 2101    def open_data(self, url, data=None):2102        """Use "data" URL."""2103        if not isinstance(url, str):2104            raise URLError('data error: proxy support for data protocol currently not implemented')2105        # ignore POSTed data2106        #2107        # syntax of data URLs:2108        # dataurl   := "data:" [ mediatype ] [ ";base64" ] "," data2109        # mediatype := [ type "/" subtype ] *( ";" parameter )2110        # data      := *urlchar2111        # parameter := attribute "=" value2112        try:2113            [type, data] = url.split(',', 1)2114        except ValueError:2115            raise OSError('data error', 'bad data URL')2116        if not type:2117            type = 'text/plain;charset=US-ASCII'2118        semi = type.rfind(';')2119        if semi >= 0 and '=' not in type[semi:]:2120            encoding = type[semi+1:]2121            type = type[:semi]2122        else:2123            encoding = ''2124        msg = []2125        msg.append('Date: %s'%time.strftime('%a, %d %b %Y %H:%M:%S GMT',2126                                            time.gmtime(time.time())))2127        msg.append('Content-type: %s' % type)2128        if encoding == 'base64':2129            # XXX is this encoding/decoding ok?2130            data = base64.decodebytes(data.encode('ascii')).decode('latin-1')2131        else:2132            data = unquote(data)2133        msg.append('Content-Length: %d' % len(data))2134        msg.append('')2135        msg.append(data)2136        msg = '\n'.join(msg)2137        headers = email.message_from_string(msg)2138        f = io.StringIO(msg)2139        #f.fileno = None     # needed for addinfourl2140        return addinfourl(f, headers, url)2141 2142 2143class FancyURLopener(URLopener):2144    """Derived class with handlers for errors we can handle (perhaps)."""2145 2146    def __init__(self, *args, **kwargs):2147        URLopener.__init__(self, *args, **kwargs)2148        self.auth_cache = {}2149        self.tries = 02150        self.maxtries = 102151 2152    def http_error_default(self, url, fp, errcode, errmsg, headers):2153        """Default error handling -- don't raise an exception."""2154        return addinfourl(fp, headers, "http:" + url, errcode)2155 2156    def http_error_302(self, url, fp, errcode, errmsg, headers, data=None):2157        """Error 302 -- relocated (temporarily)."""2158        self.tries += 12159        try:2160            if self.maxtries and self.tries >= self.maxtries:2161                if hasattr(self, "http_error_500"):2162                    meth = self.http_error_5002163                else:2164                    meth = self.http_error_default2165                return meth(url, fp, 500,2166                            "Internal Server Error: Redirect Recursion",2167                            headers)2168            result = self.redirect_internal(url, fp, errcode, errmsg,2169                                            headers, data)2170            return result2171        finally:2172            self.tries = 02173 2174    def redirect_internal(self, url, fp, errcode, errmsg, headers, data):2175        if 'location' in headers:2176            newurl = headers['location']2177        elif 'uri' in headers:2178            newurl = headers['uri']2179        else:2180            return2181        fp.close()2182 2183        # In case the server sent a relative URL, join with original:2184        newurl = urljoin(self.type + ":" + url, newurl)2185 2186        urlparts = urlparse(newurl)2187 2188        # For security reasons, we don't allow redirection to anything other2189        # than http, https and ftp.2190 2191        # We are using newer HTTPError with older redirect_internal method2192        # This older method will get deprecated in 3.32193 2194        if urlparts.scheme not in ('http', 'https', 'ftp', ''):2195            raise HTTPError(newurl, errcode,2196                            errmsg +2197                            " Redirection to url '%s' is not allowed." % newurl,2198                            headers, fp)2199 2200        return self.open(newurl)2201 2202    def http_error_301(self, url, fp, errcode, errmsg, headers, data=None):2203        """Error 301 -- also relocated (permanently)."""2204        return self.http_error_302(url, fp, errcode, errmsg, headers, data)2205 2206    def http_error_303(self, url, fp, errcode, errmsg, headers, data=None):2207        """Error 303 -- also relocated (essentially identical to 302)."""2208        return self.http_error_302(url, fp, errcode, errmsg, headers, data)2209 2210    def http_error_307(self, url, fp, errcode, errmsg, headers, data=None):2211        """Error 307 -- relocated, but turn POST into error."""2212        if data is None:2213            return self.http_error_302(url, fp, errcode, errmsg, headers, data)2214        else:2215            return self.http_error_default(url, fp, errcode, errmsg, headers)2216 2217    def http_error_401(self, url, fp, errcode, errmsg, headers, data=None,2218            retry=False):2219        """Error 401 -- authentication required.2220        This function supports Basic authentication only."""2221        if 'www-authenticate' not in headers:2222            URLopener.http_error_default(self, url, fp,2223                                         errcode, errmsg, headers)2224        stuff = headers['www-authenticate']2225        match = re.match('[ \t]*([^ \t]+)[ \t]+realm="([^"]*)"', stuff)2226        if not match:2227            URLopener.http_error_default(self, url, fp,2228                                         errcode, errmsg, headers)2229        scheme, realm = match.groups()2230        if scheme.lower() != 'basic':2231            URLopener.http_error_default(self, url, fp,2232                                         errcode, errmsg, headers)2233        if not retry:2234            URLopener.http_error_default(self, url, fp, errcode, errmsg,2235                    headers)2236        name = 'retry_' + self.type + '_basic_auth'2237        if data is None:2238            return getattr(self,name)(url, realm)2239        else:2240            return getattr(self,name)(url, realm, data)2241 2242    def http_error_407(self, url, fp, errcode, errmsg, headers, data=None,2243            retry=False):2244        """Error 407 -- proxy authentication required.2245        This function supports Basic authentication only."""2246        if 'proxy-authenticate' not in headers:2247            URLopener.http_error_default(self, url, fp,2248                                         errcode, errmsg, headers)2249        stuff = headers['proxy-authenticate']2250        match = re.match('[ \t]*([^ \t]+)[ \t]+realm="([^"]*)"', stuff)2251        if not match:2252            URLopener.http_error_default(self, url, fp,2253                                         errcode, errmsg, headers)2254        scheme, realm = match.groups()2255        if scheme.lower() != 'basic':2256            URLopener.http_error_default(self, url, fp,2257                                         errcode, errmsg, headers)2258        if not retry:2259            URLopener.http_error_default(self, url, fp, errcode, errmsg,2260                    headers)2261        name = 'retry_proxy_' + self.type + '_basic_auth'2262        if data is None:2263            return getattr(self,name)(url, realm)2264        else:2265            return getattr(self,name)(url, realm, data)2266 2267    def retry_proxy_http_basic_auth(self, url, realm, data=None):2268        host, selector = _splithost(url)2269        newurl = 'http://' + host + selector2270        proxy = self.proxies['http']2271        urltype, proxyhost = _splittype(proxy)2272        proxyhost, proxyselector = _splithost(proxyhost)2273        i = proxyhost.find('@') + 12274        proxyhost = proxyhost[i:]2275        user, passwd = self.get_user_passwd(proxyhost, realm, i)2276        if not (user or passwd): return None2277        proxyhost = "%s:%s@%s" % (quote(user, safe=''),2278                                  quote(passwd, safe=''), proxyhost)2279        self.proxies['http'] = 'http://' + proxyhost + proxyselector2280        if data is None:2281            return self.open(newurl)2282        else:2283            return self.open(newurl, data)2284 2285    def retry_proxy_https_basic_auth(self, url, realm, data=None):2286        host, selector = _splithost(url)2287        newurl = 'https://' + host + selector2288        proxy = self.proxies['https']2289        urltype, proxyhost = _splittype(proxy)2290        proxyhost, proxyselector = _splithost(proxyhost)2291        i = proxyhost.find('@') + 12292        proxyhost = proxyhost[i:]2293        user, passwd = self.get_user_passwd(proxyhost, realm, i)2294        if not (user or passwd): return None2295        proxyhost = "%s:%s@%s" % (quote(user, safe=''),2296                                  quote(passwd, safe=''), proxyhost)2297        self.proxies['https'] = 'https://' + proxyhost + proxyselector2298        if data is None:2299            return self.open(newurl)2300        else:2301            return self.open(newurl, data)2302 2303    def retry_http_basic_auth(self, url, realm, data=None):2304        host, selector = _splithost(url)2305        i = host.find('@') + 12306        host = host[i:]2307        user, passwd = self.get_user_passwd(host, realm, i)2308        if not (user or passwd): return None2309        host = "%s:%s@%s" % (quote(user, safe=''),2310                             quote(passwd, safe=''), host)2311        newurl = 'http://' + host + selector2312        if data is None:2313            return self.open(newurl)2314        else:2315            return self.open(newurl, data)2316 2317    def retry_https_basic_auth(self, url, realm, data=None):2318        host, selector = _splithost(url)2319        i = host.find('@') + 12320        host = host[i:]2321        user, passwd = self.get_user_passwd(host, realm, i)2322        if not (user or passwd): return None2323        host = "%s:%s@%s" % (quote(user, safe=''),2324                             quote(passwd, safe=''), host)2325        newurl = 'https://' + host + selector2326        if data is None:2327            return self.open(newurl)2328        else:2329            return self.open(newurl, data)2330 2331    def get_user_passwd(self, host, realm, clear_cache=0):2332        key = realm + '@' + host.lower()2333        if key in self.auth_cache:2334            if clear_cache:2335                del self.auth_cache[key]2336            else:2337                return self.auth_cache[key]2338        user, passwd = self.prompt_user_passwd(host, realm)2339        if user or passwd: self.auth_cache[key] = (user, passwd)2340        return user, passwd2341 2342    def prompt_user_passwd(self, host, realm):2343        """Override this in a GUI environment!"""2344        import getpass2345        try:2346            user = input("Enter username for %s at %s: " % (realm, host))2347            passwd = getpass.getpass("Enter password for %s in %s at %s: " %2348                (user, realm, host))2349            return user, passwd2350        except KeyboardInterrupt:2351            print()2352            return None, None2353 2354 2355# Utility functions2356 2357_localhost = None2358def localhost():2359    """Return the IP address of the magic hostname 'localhost'."""2360    global _localhost2361    if _localhost is None:2362        _localhost = socket.gethostbyname('localhost')2363    return _localhost2364 2365_thishost = None2366def thishost():2367    """Return the IP addresses of the current host."""2368    global _thishost2369    if _thishost is None:2370        try:2371            _thishost = tuple(socket.gethostbyname_ex(socket.gethostname())[2])2372        except socket.gaierror:2373            _thishost = tuple(socket.gethostbyname_ex('localhost')[2])2374    return _thishost2375 2376_ftperrors = None2377def ftperrors():2378    """Return the set of errors raised by the FTP class."""2379    global _ftperrors2380    if _ftperrors is None:2381        import ftplib2382        _ftperrors = ftplib.all_errors2383    return _ftperrors2384 2385_noheaders = None2386def noheaders():2387    """Return an empty email Message object."""2388    global _noheaders2389    if _noheaders is None:2390        _noheaders = email.message_from_string("")2391    return _noheaders2392 2393 2394# Utility classes2395 2396class ftpwrapper:2397    """Class used by open_ftp() for cache of open FTP connections."""2398 2399    def __init__(self, user, passwd, host, port, dirs, timeout=None,2400                 persistent=True):2401        self.user = user2402        self.passwd = passwd2403        self.host = host2404        self.port = port2405        self.dirs = dirs2406        self.timeout = timeout2407        self.refcount = 02408        self.keepalive = persistent2409        try:2410            self.init()2411        except:2412            self.close()2413            raise2414 2415    def init(self):2416        import ftplib2417        self.busy = 02418        self.ftp = ftplib.FTP()2419        self.ftp.connect(self.host, self.port, self.timeout)2420        self.ftp.login(self.user, self.passwd)2421        _target = '/'.join(self.dirs)2422        self.ftp.cwd(_target)2423 2424    def retrfile(self, file, type):2425        import ftplib2426        self.endtransfer()2427        if type in ('d', 'D'): cmd = 'TYPE A'; isdir = 12428        else: cmd = 'TYPE ' + type; isdir = 02429        try:2430            self.ftp.voidcmd(cmd)2431        except ftplib.all_errors:2432            self.init()2433            self.ftp.voidcmd(cmd)2434        conn = None2435        if file and not isdir:2436            # Try to retrieve as a file2437            try:2438                cmd = 'RETR ' + file2439                conn, retrlen = self.ftp.ntransfercmd(cmd)2440            except ftplib.error_perm as reason:2441                if str(reason)[:3] != '550':2442                    raise URLError('ftp error: %r' % reason).with_traceback(2443                        sys.exc_info()[2])2444        if not conn:2445            # Set transfer mode to ASCII!2446            self.ftp.voidcmd('TYPE A')2447            # Try a directory listing. Verify that directory exists.2448            if file:2449                pwd = self.ftp.pwd()2450                try:2451                    try:2452                        self.ftp.cwd(file)2453                    except ftplib.error_perm as reason:2454                        raise URLError('ftp error: %r' % reason) from reason2455                finally:2456                    self.ftp.cwd(pwd)2457                cmd = 'LIST ' + file2458            else:2459                cmd = 'LIST'2460            conn, retrlen = self.ftp.ntransfercmd(cmd)2461        self.busy = 12462 2463        ftpobj = addclosehook(conn.makefile('rb'), self.file_close)2464        self.refcount += 12465        conn.close()2466        # Pass back both a suitably decorated object and a retrieval length2467        return (ftpobj, retrlen)2468 2469    def endtransfer(self):2470        self.busy = 02471 2472    def close(self):2473        self.keepalive = False2474        if self.refcount <= 0:2475            self.real_close()2476 2477    def file_close(self):2478        self.endtransfer()2479        self.refcount -= 12480        if self.refcount <= 0 and not self.keepalive:2481            self.real_close()2482 2483    def real_close(self):2484        self.endtransfer()2485        try:2486            self.ftp.close()2487        except ftperrors():2488            pass2489 2490# Proxy handling2491def getproxies_environment():2492    """Return a dictionary of scheme -> proxy server URL mappings.2493 2494    Scan the environment for variables named <scheme>_proxy;2495    this seems to be the standard convention.  If you need a2496    different way, you can pass a proxies dictionary to the2497    [Fancy]URLopener constructor.2498 2499    """2500    proxies = {}2501    # in order to prefer lowercase variables, process environment in2502    # two passes: first matches any, second pass matches lowercase only2503    for name, value in os.environ.items():2504        name = name.lower()2505        if value and name[-6:] == '_proxy':2506            proxies[name[:-6]] = value2507    # CVE-2016-1000110 - If we are running as CGI script, forget HTTP_PROXY2508    # (non-all-lowercase) as it may be set from the web server by a "Proxy:"2509    # header from the client2510    # If "proxy" is lowercase, it will still be used thanks to the next block2511    if 'REQUEST_METHOD' in os.environ:2512        proxies.pop('http', None)2513    for name, value in os.environ.items():2514        if name[-6:] == '_proxy':2515            name = name.lower()2516            if value:2517                proxies[name[:-6]] = value2518            else:2519                proxies.pop(name[:-6], None)2520    return proxies2521 2522def proxy_bypass_environment(host, proxies=None):2523    """Test if proxies should not be used for a particular host.2524 2525    Checks the proxy dict for the value of no_proxy, which should2526    be a list of comma separated DNS suffixes, or '*' for all hosts.2527 2528    """2529    if proxies is None:2530        proxies = getproxies_environment()2531    # don't bypass, if no_proxy isn't specified2532    try:2533        no_proxy = proxies['no']2534    except KeyError:2535        return False2536    # '*' is special case for always bypass2537    if no_proxy == '*':2538        return True2539    host = host.lower()2540    # strip port off host2541    hostonly, port = _splitport(host)2542    # check if the host ends with any of the DNS suffixes2543    for name in no_proxy.split(','):2544        name = name.strip()2545        if name:2546            name = name.lstrip('.')  # ignore leading dots2547            name = name.lower()2548            if hostonly == name or host == name:2549                return True2550            name = '.' + name2551            if hostonly.endswith(name) or host.endswith(name):2552                return True2553    # otherwise, don't bypass2554    return False2555 2556 2557# This code tests an OSX specific data structure but is testable on all2558# platforms2559def _proxy_bypass_macosx_sysconf(host, proxy_settings):2560    """2561    Return True iff this host shouldn't be accessed using a proxy2562 2563    This function uses the MacOSX framework SystemConfiguration2564    to fetch the proxy information.2565 2566    proxy_settings come from _scproxy._get_proxy_settings or get mocked ie:2567    { 'exclude_simple': bool,2568      'exceptions': ['foo.bar', '*.bar.com', '127.0.0.1', '10.1', '10.0/16']2569    }2570    """2571    from fnmatch import fnmatch2572    from ipaddress import AddressValueError, IPv4Address2573 2574    hostonly, port = _splitport(host)2575 2576    def ip2num(ipAddr):2577        parts = ipAddr.split('.')2578        parts = list(map(int, parts))2579        if len(parts) != 4:2580            parts = (parts + [0, 0, 0, 0])[:4]2581        return (parts[0] << 24) | (parts[1] << 16) | (parts[2] << 8) | parts[3]2582 2583    # Check for simple host names:2584    if '.' not in host:2585        if proxy_settings['exclude_simple']:2586            return True2587 2588    hostIP = None2589    try:2590        hostIP = int(IPv4Address(hostonly))2591    except AddressValueError:2592        pass2593 2594    for value in proxy_settings.get('exceptions', ()):2595        # Items in the list are strings like these: *.local, 169.254/162596        if not value: continue2597 2598        m = re.match(r"(\d+(?:\.\d+)*)(/\d+)?", value)2599        if m is not None and hostIP is not None:2600            base = ip2num(m.group(1))2601            mask = m.group(2)2602            if mask is None:2603                mask = 8 * (m.group(1).count('.') + 1)2604            else:2605                mask = int(mask[1:])2606 2607            if mask < 0 or mask > 32:2608                # System libraries ignore invalid prefix lengths2609                continue2610 2611            mask = 32 - mask2612 2613            if (hostIP >> mask) == (base >> mask):2614                return True2615 2616        elif fnmatch(host, value):2617            return True2618 2619    return False2620 2621 2622# Same as _proxy_bypass_macosx_sysconf, testable on all platforms2623def _proxy_bypass_winreg_override(host, override):2624    """Return True if the host should bypass the proxy server.2625 2626    The proxy override list is obtained from the Windows2627    Internet settings proxy override registry value.2628 2629    An example of a proxy override value is:2630    "www.example.com;*.example.net; 192.168.0.1"2631    """2632    from fnmatch import fnmatch2633 2634    host, _ = _splitport(host)2635    proxy_override = override.split(';')2636    for test in proxy_override:2637        test = test.strip()2638        # "<local>" should bypass the proxy server for all intranet addresses2639        if test == '<local>':2640            if '.' not in host:2641                return True2642        elif fnmatch(host, test):2643            return True2644    return False2645 2646 2647if sys.platform == 'darwin':2648    from _scproxy import _get_proxy_settings, _get_proxies2649 2650    def proxy_bypass_macosx_sysconf(host):2651        proxy_settings = _get_proxy_settings()2652        return _proxy_bypass_macosx_sysconf(host, proxy_settings)2653 2654    def getproxies_macosx_sysconf():2655        """Return a dictionary of scheme -> proxy server URL mappings.2656 2657        This function uses the MacOSX framework SystemConfiguration2658        to fetch the proxy information.2659        """2660        return _get_proxies()2661 2662 2663 2664    def proxy_bypass(host):2665        """Return True, if host should be bypassed.2666 2667        Checks proxy settings gathered from the environment, if specified,2668        or from the MacOSX framework SystemConfiguration.2669 2670        """2671        proxies = getproxies_environment()2672        if proxies:2673            return proxy_bypass_environment(host, proxies)2674        else:2675            return proxy_bypass_macosx_sysconf(host)2676 2677    def getproxies():2678        return getproxies_environment() or getproxies_macosx_sysconf()2679 2680 2681elif os.name == 'nt':2682    def getproxies_registry():2683        """Return a dictionary of scheme -> proxy server URL mappings.2684 2685        Win32 uses the registry to store proxies.2686 2687        """2688        proxies = {}2689        try:2690            import winreg2691        except ImportError:2692            # Std module, so should be around - but you never know!2693            return proxies2694        try:2695            internetSettings = winreg.OpenKey(winreg.HKEY_CURRENT_USER,2696                r'Software\Microsoft\Windows\CurrentVersion\Internet Settings')2697            proxyEnable = winreg.QueryValueEx(internetSettings,2698                                               'ProxyEnable')[0]2699            if proxyEnable:2700                # Returned as Unicode but problems if not converted to ASCII2701                proxyServer = str(winreg.QueryValueEx(internetSettings,2702                                                       'ProxyServer')[0])2703                if '=' not in proxyServer and ';' not in proxyServer:2704                    # Use one setting for all protocols.2705                    proxyServer = 'http={0};https={0};ftp={0}'.format(proxyServer)2706                for p in proxyServer.split(';'):2707                    protocol, address = p.split('=', 1)2708                    # See if address has a type:// prefix2709                    if not re.match('(?:[^/:]+)://', address):2710                        # Add type:// prefix to address without specifying type2711                        if protocol in ('http', 'https', 'ftp'):2712                            # The default proxy type of Windows is HTTP2713                            address = 'http://' + address2714                        elif protocol == 'socks':2715                            address = 'socks://' + address2716                    proxies[protocol] = address2717                # Use SOCKS proxy for HTTP(S) protocols2718                if proxies.get('socks'):2719                    # The default SOCKS proxy type of Windows is SOCKS42720                    address = re.sub(r'^socks://', 'socks4://', proxies['socks'])2721                    proxies['http'] = proxies.get('http') or address2722                    proxies['https'] = proxies.get('https') or address2723            internetSettings.Close()2724        except (OSError, ValueError, TypeError):2725            # Either registry key not found etc, or the value in an2726            # unexpected format.2727            # proxies already set up to be empty so nothing to do2728            pass2729        return proxies2730 2731    def getproxies():2732        """Return a dictionary of scheme -> proxy server URL mappings.2733 2734        Returns settings gathered from the environment, if specified,2735        or the registry.2736 2737        """2738        return getproxies_environment() or getproxies_registry()2739 2740    def proxy_bypass_registry(host):2741        try:2742            import winreg2743        except ImportError:2744            # Std modules, so should be around - but you never know!2745            return False2746        try:2747            internetSettings = winreg.OpenKey(winreg.HKEY_CURRENT_USER,2748                r'Software\Microsoft\Windows\CurrentVersion\Internet Settings')2749            proxyEnable = winreg.QueryValueEx(internetSettings,2750                                               'ProxyEnable')[0]2751            proxyOverride = str(winreg.QueryValueEx(internetSettings,2752                                                     'ProxyOverride')[0])2753            # ^^^^ Returned as Unicode but problems if not converted to ASCII2754        except OSError:2755            return False2756        if not proxyEnable or not proxyOverride:2757            return False2758        return _proxy_bypass_winreg_override(host, proxyOverride)2759 2760    def proxy_bypass(host):2761        """Return True, if host should be bypassed.2762 2763        Checks proxy settings gathered from the environment, if specified,2764        or the registry.2765 2766        """2767        proxies = getproxies_environment()2768        if proxies:2769            return proxy_bypass_environment(host, proxies)2770        else:2771            return proxy_bypass_registry(host)2772 2773else:2774    # By default use environment variables2775    getproxies = getproxies_environment2776    proxy_bypass = proxy_bypass_environment2777