File Explorer

/proc/self/root/proc/self/root/proc/self/root/proc/1/root/lib64/python3.9
This explorer reads the filesystem of the server it runs on, so /workspace/user isn't present here. Browsing and the terminal still work against this server's own disk from /.
30 dirs
174 files
pickletools.py91.3 KB · 2891 lines
1'''"Executable documentation" for the pickle module.2 3Extensive comments about the pickle protocols and pickle-machine opcodes4can be found here.  Some functions meant for external use:5 6genops(pickle)7   Generate all the opcodes in a pickle, as (opcode, arg, position) triples.8 9dis(pickle, out=None, memo=None, indentlevel=4)10   Print a symbolic disassembly of a pickle.11'''12 13import codecs14import io15import pickle16import re17import sys18 19__all__ = ['dis', 'genops', 'optimize']20 21bytes_types = pickle.bytes_types22 23# Other ideas:24#25# - A pickle verifier:  read a pickle and check it exhaustively for26#   well-formedness.  dis() does a lot of this already.27#28# - A protocol identifier:  examine a pickle and return its protocol number29#   (== the highest .proto attr value among all the opcodes in the pickle).30#   dis() already prints this info at the end.31#32# - A pickle optimizer:  for example, tuple-building code is sometimes more33#   elaborate than necessary, catering for the possibility that the tuple34#   is recursive.  Or lots of times a PUT is generated that's never accessed35#   by a later GET.36 37 38# "A pickle" is a program for a virtual pickle machine (PM, but more accurately39# called an unpickling machine).  It's a sequence of opcodes, interpreted by the40# PM, building an arbitrarily complex Python object.41#42# For the most part, the PM is very simple:  there are no looping, testing, or43# conditional instructions, no arithmetic and no function calls.  Opcodes are44# executed once each, from first to last, until a STOP opcode is reached.45#46# The PM has two data areas, "the stack" and "the memo".47#48# Many opcodes push Python objects onto the stack; e.g., INT pushes a Python49# integer object on the stack, whose value is gotten from a decimal string50# literal immediately following the INT opcode in the pickle bytestream.  Other51# opcodes take Python objects off the stack.  The result of unpickling is52# whatever object is left on the stack when the final STOP opcode is executed.53#54# The memo is simply an array of objects, or it can be implemented as a dict55# mapping little integers to objects.  The memo serves as the PM's "long term56# memory", and the little integers indexing the memo are akin to variable57# names.  Some opcodes pop a stack object into the memo at a given index,58# and others push a memo object at a given index onto the stack again.59#60# At heart, that's all the PM has.  Subtleties arise for these reasons:61#62# + Object identity.  Objects can be arbitrarily complex, and subobjects63#   may be shared (for example, the list [a, a] refers to the same object a64#   twice).  It can be vital that unpickling recreate an isomorphic object65#   graph, faithfully reproducing sharing.66#67# + Recursive objects.  For example, after "L = []; L.append(L)", L is a68#   list, and L[0] is the same list.  This is related to the object identity69#   point, and some sequences of pickle opcodes are subtle in order to70#   get the right result in all cases.71#72# + Things pickle doesn't know everything about.  Examples of things pickle73#   does know everything about are Python's builtin scalar and container74#   types, like ints and tuples.  They generally have opcodes dedicated to75#   them.  For things like module references and instances of user-defined76#   classes, pickle's knowledge is limited.  Historically, many enhancements77#   have been made to the pickle protocol in order to do a better (faster,78#   and/or more compact) job on those.79#80# + Backward compatibility and micro-optimization.  As explained below,81#   pickle opcodes never go away, not even when better ways to do a thing82#   get invented.  The repertoire of the PM just keeps growing over time.83#   For example, protocol 0 had two opcodes for building Python integers (INT84#   and LONG), protocol 1 added three more for more-efficient pickling of short85#   integers, and protocol 2 added two more for more-efficient pickling of86#   long integers (before protocol 2, the only ways to pickle a Python long87#   took time quadratic in the number of digits, for both pickling and88#   unpickling).  "Opcode bloat" isn't so much a subtlety as a source of89#   wearying complication.90#91#92# Pickle protocols:93#94# For compatibility, the meaning of a pickle opcode never changes.  Instead new95# pickle opcodes get added, and each version's unpickler can handle all the96# pickle opcodes in all protocol versions to date.  So old pickles continue to97# be readable forever.  The pickler can generally be told to restrict itself to98# the subset of opcodes available under previous protocol versions too, so that99# users can create pickles under the current version readable by older100# versions.  However, a pickle does not contain its version number embedded101# within it.  If an older unpickler tries to read a pickle using a later102# protocol, the result is most likely an exception due to seeing an unknown (in103# the older unpickler) opcode.104#105# The original pickle used what's now called "protocol 0", and what was called106# "text mode" before Python 2.3.  The entire pickle bytestream is made up of107# printable 7-bit ASCII characters, plus the newline character, in protocol 0.108# That's why it was called text mode.  Protocol 0 is small and elegant, but109# sometimes painfully inefficient.110#111# The second major set of additions is now called "protocol 1", and was called112# "binary mode" before Python 2.3.  This added many opcodes with arguments113# consisting of arbitrary bytes, including NUL bytes and unprintable "high bit"114# bytes.  Binary mode pickles can be substantially smaller than equivalent115# text mode pickles, and sometimes faster too; e.g., BININT represents a 4-byte116# int as 4 bytes following the opcode, which is cheaper to unpickle than the117# (perhaps) 11-character decimal string attached to INT.  Protocol 1 also added118# a number of opcodes that operate on many stack elements at once (like APPENDS119# and SETITEMS), and "shortcut" opcodes (like EMPTY_DICT and EMPTY_TUPLE).120#121# The third major set of additions came in Python 2.3, and is called "protocol122# 2".  This added:123#124# - A better way to pickle instances of new-style classes (NEWOBJ).125#126# - A way for a pickle to identify its protocol (PROTO).127#128# - Time- and space- efficient pickling of long ints (LONG{1,4}).129#130# - Shortcuts for small tuples (TUPLE{1,2,3}}.131#132# - Dedicated opcodes for bools (NEWTRUE, NEWFALSE).133#134# - The "extension registry", a vector of popular objects that can be pushed135#   efficiently by index (EXT{1,2,4}).  This is akin to the memo and GET, but136#   the registry contents are predefined (there's nothing akin to the memo's137#   PUT).138#139# Another independent change with Python 2.3 is the abandonment of any140# pretense that it might be safe to load pickles received from untrusted141# parties -- no sufficient security analysis has been done to guarantee142# this and there isn't a use case that warrants the expense of such an143# analysis.144#145# To this end, all tests for __safe_for_unpickling__ or for146# copyreg.safe_constructors are removed from the unpickling code.147# References to these variables in the descriptions below are to be seen148# as describing unpickling in Python 2.2 and before.149 150 151# Meta-rule:  Descriptions are stored in instances of descriptor objects,152# with plain constructors.  No meta-language is defined from which153# descriptors could be constructed.  If you want, e.g., XML, write a little154# program to generate XML from the objects.155 156##############################################################################157# Some pickle opcodes have an argument, following the opcode in the158# bytestream.  An argument is of a specific type, described by an instance159# of ArgumentDescriptor.  These are not to be confused with arguments taken160# off the stack -- ArgumentDescriptor applies only to arguments embedded in161# the opcode stream, immediately following an opcode.162 163# Represents the number of bytes consumed by an argument delimited by the164# next newline character.165UP_TO_NEWLINE = -1166 167# Represents the number of bytes consumed by a two-argument opcode where168# the first argument gives the number of bytes in the second argument.169TAKEN_FROM_ARGUMENT1  = -2   # num bytes is 1-byte unsigned int170TAKEN_FROM_ARGUMENT4  = -3   # num bytes is 4-byte signed little-endian int171TAKEN_FROM_ARGUMENT4U = -4   # num bytes is 4-byte unsigned little-endian int172TAKEN_FROM_ARGUMENT8U = -5   # num bytes is 8-byte unsigned little-endian int173 174class ArgumentDescriptor(object):175    __slots__ = (176        # name of descriptor record, also a module global name; a string177        'name',178 179        # length of argument, in bytes; an int; UP_TO_NEWLINE and180        # TAKEN_FROM_ARGUMENT{1,4,8} are negative values for variable-length181        # cases182        'n',183 184        # a function taking a file-like object, reading this kind of argument185        # from the object at the current position, advancing the current186        # position by n bytes, and returning the value of the argument187        'reader',188 189        # human-readable docs for this arg descriptor; a string190        'doc',191    )192 193    def __init__(self, name, n, reader, doc):194        assert isinstance(name, str)195        self.name = name196 197        assert isinstance(n, int) and (n >= 0 or198                                       n in (UP_TO_NEWLINE,199                                             TAKEN_FROM_ARGUMENT1,200                                             TAKEN_FROM_ARGUMENT4,201                                             TAKEN_FROM_ARGUMENT4U,202                                             TAKEN_FROM_ARGUMENT8U))203        self.n = n204 205        self.reader = reader206 207        assert isinstance(doc, str)208        self.doc = doc209 210from struct import unpack as _unpack211 212def read_uint1(f):213    r"""214    >>> import io215    >>> read_uint1(io.BytesIO(b'\xff'))216    255217    """218 219    data = f.read(1)220    if data:221        return data[0]222    raise ValueError("not enough data in stream to read uint1")223 224uint1 = ArgumentDescriptor(225            name='uint1',226            n=1,227            reader=read_uint1,228            doc="One-byte unsigned integer.")229 230 231def read_uint2(f):232    r"""233    >>> import io234    >>> read_uint2(io.BytesIO(b'\xff\x00'))235    255236    >>> read_uint2(io.BytesIO(b'\xff\xff'))237    65535238    """239 240    data = f.read(2)241    if len(data) == 2:242        return _unpack("<H", data)[0]243    raise ValueError("not enough data in stream to read uint2")244 245uint2 = ArgumentDescriptor(246            name='uint2',247            n=2,248            reader=read_uint2,249            doc="Two-byte unsigned integer, little-endian.")250 251 252def read_int4(f):253    r"""254    >>> import io255    >>> read_int4(io.BytesIO(b'\xff\x00\x00\x00'))256    255257    >>> read_int4(io.BytesIO(b'\x00\x00\x00\x80')) == -(2**31)258    True259    """260 261    data = f.read(4)262    if len(data) == 4:263        return _unpack("<i", data)[0]264    raise ValueError("not enough data in stream to read int4")265 266int4 = ArgumentDescriptor(267           name='int4',268           n=4,269           reader=read_int4,270           doc="Four-byte signed integer, little-endian, 2's complement.")271 272 273def read_uint4(f):274    r"""275    >>> import io276    >>> read_uint4(io.BytesIO(b'\xff\x00\x00\x00'))277    255278    >>> read_uint4(io.BytesIO(b'\x00\x00\x00\x80')) == 2**31279    True280    """281 282    data = f.read(4)283    if len(data) == 4:284        return _unpack("<I", data)[0]285    raise ValueError("not enough data in stream to read uint4")286 287uint4 = ArgumentDescriptor(288            name='uint4',289            n=4,290            reader=read_uint4,291            doc="Four-byte unsigned integer, little-endian.")292 293 294def read_uint8(f):295    r"""296    >>> import io297    >>> read_uint8(io.BytesIO(b'\xff\x00\x00\x00\x00\x00\x00\x00'))298    255299    >>> read_uint8(io.BytesIO(b'\xff' * 8)) == 2**64-1300    True301    """302 303    data = f.read(8)304    if len(data) == 8:305        return _unpack("<Q", data)[0]306    raise ValueError("not enough data in stream to read uint8")307 308uint8 = ArgumentDescriptor(309            name='uint8',310            n=8,311            reader=read_uint8,312            doc="Eight-byte unsigned integer, little-endian.")313 314 315def read_stringnl(f, decode=True, stripquotes=True):316    r"""317    >>> import io318    >>> read_stringnl(io.BytesIO(b"'abcd'\nefg\n"))319    'abcd'320 321    >>> read_stringnl(io.BytesIO(b"\n"))322    Traceback (most recent call last):323    ...324    ValueError: no string quotes around b''325 326    >>> read_stringnl(io.BytesIO(b"\n"), stripquotes=False)327    ''328 329    >>> read_stringnl(io.BytesIO(b"''\n"))330    ''331 332    >>> read_stringnl(io.BytesIO(b'"abcd"'))333    Traceback (most recent call last):334    ...335    ValueError: no newline found when trying to read stringnl336 337    Embedded escapes are undone in the result.338    >>> read_stringnl(io.BytesIO(br"'a\n\\b\x00c\td'" + b"\n'e'"))339    'a\n\\b\x00c\td'340    """341 342    data = f.readline()343    if not data.endswith(b'\n'):344        raise ValueError("no newline found when trying to read stringnl")345    data = data[:-1]    # lose the newline346 347    if stripquotes:348        for q in (b'"', b"'"):349            if data.startswith(q):350                if not data.endswith(q):351                    raise ValueError("strinq quote %r not found at both "352                                     "ends of %r" % (q, data))353                data = data[1:-1]354                break355        else:356            raise ValueError("no string quotes around %r" % data)357 358    if decode:359        data = codecs.escape_decode(data)[0].decode("ascii")360    return data361 362stringnl = ArgumentDescriptor(363               name='stringnl',364               n=UP_TO_NEWLINE,365               reader=read_stringnl,366               doc="""A newline-terminated string.367 368                   This is a repr-style string, with embedded escapes, and369                   bracketing quotes.370                   """)371 372def read_stringnl_noescape(f):373    return read_stringnl(f, stripquotes=False)374 375stringnl_noescape = ArgumentDescriptor(376                        name='stringnl_noescape',377                        n=UP_TO_NEWLINE,378                        reader=read_stringnl_noescape,379                        doc="""A newline-terminated string.380 381                        This is a str-style string, without embedded escapes,382                        or bracketing quotes.  It should consist solely of383                        printable ASCII characters.384                        """)385 386def read_stringnl_noescape_pair(f):387    r"""388    >>> import io389    >>> read_stringnl_noescape_pair(io.BytesIO(b"Queue\nEmpty\njunk"))390    'Queue Empty'391    """392 393    return "%s %s" % (read_stringnl_noescape(f), read_stringnl_noescape(f))394 395stringnl_noescape_pair = ArgumentDescriptor(396                             name='stringnl_noescape_pair',397                             n=UP_TO_NEWLINE,398                             reader=read_stringnl_noescape_pair,399                             doc="""A pair of newline-terminated strings.400 401                             These are str-style strings, without embedded402                             escapes, or bracketing quotes.  They should403                             consist solely of printable ASCII characters.404                             The pair is returned as a single string, with405                             a single blank separating the two strings.406                             """)407 408 409def read_string1(f):410    r"""411    >>> import io412    >>> read_string1(io.BytesIO(b"\x00"))413    ''414    >>> read_string1(io.BytesIO(b"\x03abcdef"))415    'abc'416    """417 418    n = read_uint1(f)419    assert n >= 0420    data = f.read(n)421    if len(data) == n:422        return data.decode("latin-1")423    raise ValueError("expected %d bytes in a string1, but only %d remain" %424                     (n, len(data)))425 426string1 = ArgumentDescriptor(427              name="string1",428              n=TAKEN_FROM_ARGUMENT1,429              reader=read_string1,430              doc="""A counted string.431 432              The first argument is a 1-byte unsigned int giving the number433              of bytes in the string, and the second argument is that many434              bytes.435              """)436 437 438def read_string4(f):439    r"""440    >>> import io441    >>> read_string4(io.BytesIO(b"\x00\x00\x00\x00abc"))442    ''443    >>> read_string4(io.BytesIO(b"\x03\x00\x00\x00abcdef"))444    'abc'445    >>> read_string4(io.BytesIO(b"\x00\x00\x00\x03abcdef"))446    Traceback (most recent call last):447    ...448    ValueError: expected 50331648 bytes in a string4, but only 6 remain449    """450 451    n = read_int4(f)452    if n < 0:453        raise ValueError("string4 byte count < 0: %d" % n)454    data = f.read(n)455    if len(data) == n:456        return data.decode("latin-1")457    raise ValueError("expected %d bytes in a string4, but only %d remain" %458                     (n, len(data)))459 460string4 = ArgumentDescriptor(461              name="string4",462              n=TAKEN_FROM_ARGUMENT4,463              reader=read_string4,464              doc="""A counted string.465 466              The first argument is a 4-byte little-endian signed int giving467              the number of bytes in the string, and the second argument is468              that many bytes.469              """)470 471 472def read_bytes1(f):473    r"""474    >>> import io475    >>> read_bytes1(io.BytesIO(b"\x00"))476    b''477    >>> read_bytes1(io.BytesIO(b"\x03abcdef"))478    b'abc'479    """480 481    n = read_uint1(f)482    assert n >= 0483    data = f.read(n)484    if len(data) == n:485        return data486    raise ValueError("expected %d bytes in a bytes1, but only %d remain" %487                     (n, len(data)))488 489bytes1 = ArgumentDescriptor(490              name="bytes1",491              n=TAKEN_FROM_ARGUMENT1,492              reader=read_bytes1,493              doc="""A counted bytes string.494 495              The first argument is a 1-byte unsigned int giving the number496              of bytes, and the second argument is that many bytes.497              """)498 499 500def read_bytes4(f):501    r"""502    >>> import io503    >>> read_bytes4(io.BytesIO(b"\x00\x00\x00\x00abc"))504    b''505    >>> read_bytes4(io.BytesIO(b"\x03\x00\x00\x00abcdef"))506    b'abc'507    >>> read_bytes4(io.BytesIO(b"\x00\x00\x00\x03abcdef"))508    Traceback (most recent call last):509    ...510    ValueError: expected 50331648 bytes in a bytes4, but only 6 remain511    """512 513    n = read_uint4(f)514    assert n >= 0515    if n > sys.maxsize:516        raise ValueError("bytes4 byte count > sys.maxsize: %d" % n)517    data = f.read(n)518    if len(data) == n:519        return data520    raise ValueError("expected %d bytes in a bytes4, but only %d remain" %521                     (n, len(data)))522 523bytes4 = ArgumentDescriptor(524              name="bytes4",525              n=TAKEN_FROM_ARGUMENT4U,526              reader=read_bytes4,527              doc="""A counted bytes string.528 529              The first argument is a 4-byte little-endian unsigned int giving530              the number of bytes, and the second argument is that many bytes.531              """)532 533 534def read_bytes8(f):535    r"""536    >>> import io, struct, sys537    >>> read_bytes8(io.BytesIO(b"\x00\x00\x00\x00\x00\x00\x00\x00abc"))538    b''539    >>> read_bytes8(io.BytesIO(b"\x03\x00\x00\x00\x00\x00\x00\x00abcdef"))540    b'abc'541    >>> bigsize8 = struct.pack("<Q", sys.maxsize//3)542    >>> read_bytes8(io.BytesIO(bigsize8 + b"abcdef"))  #doctest: +ELLIPSIS543    Traceback (most recent call last):544    ...545    ValueError: expected ... bytes in a bytes8, but only 6 remain546    """547 548    n = read_uint8(f)549    assert n >= 0550    if n > sys.maxsize:551        raise ValueError("bytes8 byte count > sys.maxsize: %d" % n)552    data = f.read(n)553    if len(data) == n:554        return data555    raise ValueError("expected %d bytes in a bytes8, but only %d remain" %556                     (n, len(data)))557 558bytes8 = ArgumentDescriptor(559              name="bytes8",560              n=TAKEN_FROM_ARGUMENT8U,561              reader=read_bytes8,562              doc="""A counted bytes string.563 564              The first argument is an 8-byte little-endian unsigned int giving565              the number of bytes, and the second argument is that many bytes.566              """)567 568 569def read_bytearray8(f):570    r"""571    >>> import io, struct, sys572    >>> read_bytearray8(io.BytesIO(b"\x00\x00\x00\x00\x00\x00\x00\x00abc"))573    bytearray(b'')574    >>> read_bytearray8(io.BytesIO(b"\x03\x00\x00\x00\x00\x00\x00\x00abcdef"))575    bytearray(b'abc')576    >>> bigsize8 = struct.pack("<Q", sys.maxsize//3)577    >>> read_bytearray8(io.BytesIO(bigsize8 + b"abcdef"))  #doctest: +ELLIPSIS578    Traceback (most recent call last):579    ...580    ValueError: expected ... bytes in a bytearray8, but only 6 remain581    """582 583    n = read_uint8(f)584    assert n >= 0585    if n > sys.maxsize:586        raise ValueError("bytearray8 byte count > sys.maxsize: %d" % n)587    data = f.read(n)588    if len(data) == n:589        return bytearray(data)590    raise ValueError("expected %d bytes in a bytearray8, but only %d remain" %591                     (n, len(data)))592 593bytearray8 = ArgumentDescriptor(594              name="bytearray8",595              n=TAKEN_FROM_ARGUMENT8U,596              reader=read_bytearray8,597              doc="""A counted bytearray.598 599              The first argument is an 8-byte little-endian unsigned int giving600              the number of bytes, and the second argument is that many bytes.601              """)602 603def read_unicodestringnl(f):604    r"""605    >>> import io606    >>> read_unicodestringnl(io.BytesIO(b"abc\\uabcd\njunk")) == 'abc\uabcd'607    True608    """609 610    data = f.readline()611    if not data.endswith(b'\n'):612        raise ValueError("no newline found when trying to read "613                         "unicodestringnl")614    data = data[:-1]    # lose the newline615    return str(data, 'raw-unicode-escape')616 617unicodestringnl = ArgumentDescriptor(618                      name='unicodestringnl',619                      n=UP_TO_NEWLINE,620                      reader=read_unicodestringnl,621                      doc="""A newline-terminated Unicode string.622 623                      This is raw-unicode-escape encoded, so consists of624                      printable ASCII characters, and may contain embedded625                      escape sequences.626                      """)627 628 629def read_unicodestring1(f):630    r"""631    >>> import io632    >>> s = 'abcd\uabcd'633    >>> enc = s.encode('utf-8')634    >>> enc635    b'abcd\xea\xaf\x8d'636    >>> n = bytes([len(enc)])  # little-endian 1-byte length637    >>> t = read_unicodestring1(io.BytesIO(n + enc + b'junk'))638    >>> s == t639    True640 641    >>> read_unicodestring1(io.BytesIO(n + enc[:-1]))642    Traceback (most recent call last):643    ...644    ValueError: expected 7 bytes in a unicodestring1, but only 6 remain645    """646 647    n = read_uint1(f)648    assert n >= 0649    data = f.read(n)650    if len(data) == n:651        return str(data, 'utf-8', 'surrogatepass')652    raise ValueError("expected %d bytes in a unicodestring1, but only %d "653                     "remain" % (n, len(data)))654 655unicodestring1 = ArgumentDescriptor(656                    name="unicodestring1",657                    n=TAKEN_FROM_ARGUMENT1,658                    reader=read_unicodestring1,659                    doc="""A counted Unicode string.660 661                    The first argument is a 1-byte little-endian signed int662                    giving the number of bytes in the string, and the second663                    argument-- the UTF-8 encoding of the Unicode string --664                    contains that many bytes.665                    """)666 667 668def read_unicodestring4(f):669    r"""670    >>> import io671    >>> s = 'abcd\uabcd'672    >>> enc = s.encode('utf-8')673    >>> enc674    b'abcd\xea\xaf\x8d'675    >>> n = bytes([len(enc), 0, 0, 0])  # little-endian 4-byte length676    >>> t = read_unicodestring4(io.BytesIO(n + enc + b'junk'))677    >>> s == t678    True679 680    >>> read_unicodestring4(io.BytesIO(n + enc[:-1]))681    Traceback (most recent call last):682    ...683    ValueError: expected 7 bytes in a unicodestring4, but only 6 remain684    """685 686    n = read_uint4(f)687    assert n >= 0688    if n > sys.maxsize:689        raise ValueError("unicodestring4 byte count > sys.maxsize: %d" % n)690    data = f.read(n)691    if len(data) == n:692        return str(data, 'utf-8', 'surrogatepass')693    raise ValueError("expected %d bytes in a unicodestring4, but only %d "694                     "remain" % (n, len(data)))695 696unicodestring4 = ArgumentDescriptor(697                    name="unicodestring4",698                    n=TAKEN_FROM_ARGUMENT4U,699                    reader=read_unicodestring4,700                    doc="""A counted Unicode string.701 702                    The first argument is a 4-byte little-endian signed int703                    giving the number of bytes in the string, and the second704                    argument-- the UTF-8 encoding of the Unicode string --705                    contains that many bytes.706                    """)707 708 709def read_unicodestring8(f):710    r"""711    >>> import io712    >>> s = 'abcd\uabcd'713    >>> enc = s.encode('utf-8')714    >>> enc715    b'abcd\xea\xaf\x8d'716    >>> n = bytes([len(enc)]) + b'\0' * 7  # little-endian 8-byte length717    >>> t = read_unicodestring8(io.BytesIO(n + enc + b'junk'))718    >>> s == t719    True720 721    >>> read_unicodestring8(io.BytesIO(n + enc[:-1]))722    Traceback (most recent call last):723    ...724    ValueError: expected 7 bytes in a unicodestring8, but only 6 remain725    """726 727    n = read_uint8(f)728    assert n >= 0729    if n > sys.maxsize:730        raise ValueError("unicodestring8 byte count > sys.maxsize: %d" % n)731    data = f.read(n)732    if len(data) == n:733        return str(data, 'utf-8', 'surrogatepass')734    raise ValueError("expected %d bytes in a unicodestring8, but only %d "735                     "remain" % (n, len(data)))736 737unicodestring8 = ArgumentDescriptor(738                    name="unicodestring8",739                    n=TAKEN_FROM_ARGUMENT8U,740                    reader=read_unicodestring8,741                    doc="""A counted Unicode string.742 743                    The first argument is an 8-byte little-endian signed int744                    giving the number of bytes in the string, and the second745                    argument-- the UTF-8 encoding of the Unicode string --746                    contains that many bytes.747                    """)748 749 750def read_decimalnl_short(f):751    r"""752    >>> import io753    >>> read_decimalnl_short(io.BytesIO(b"1234\n56"))754    1234755 756    >>> read_decimalnl_short(io.BytesIO(b"1234L\n56"))757    Traceback (most recent call last):758    ...759    ValueError: invalid literal for int() with base 10: b'1234L'760    """761 762    s = read_stringnl(f, decode=False, stripquotes=False)763 764    # There's a hack for True and False here.765    if s == b"00":766        return False767    elif s == b"01":768        return True769 770    return int(s)771 772def read_decimalnl_long(f):773    r"""774    >>> import io775 776    >>> read_decimalnl_long(io.BytesIO(b"1234L\n56"))777    1234778 779    >>> read_decimalnl_long(io.BytesIO(b"123456789012345678901234L\n6"))780    123456789012345678901234781    """782 783    s = read_stringnl(f, decode=False, stripquotes=False)784    if s[-1:] == b'L':785        s = s[:-1]786    return int(s)787 788 789decimalnl_short = ArgumentDescriptor(790                      name='decimalnl_short',791                      n=UP_TO_NEWLINE,792                      reader=read_decimalnl_short,793                      doc="""A newline-terminated decimal integer literal.794 795                          This never has a trailing 'L', and the integer fit796                          in a short Python int on the box where the pickle797                          was written -- but there's no guarantee it will fit798                          in a short Python int on the box where the pickle799                          is read.800                          """)801 802decimalnl_long = ArgumentDescriptor(803                     name='decimalnl_long',804                     n=UP_TO_NEWLINE,805                     reader=read_decimalnl_long,806                     doc="""A newline-terminated decimal integer literal.807 808                         This has a trailing 'L', and can represent integers809                         of any size.810                         """)811 812 813def read_floatnl(f):814    r"""815    >>> import io816    >>> read_floatnl(io.BytesIO(b"-1.25\n6"))817    -1.25818    """819    s = read_stringnl(f, decode=False, stripquotes=False)820    return float(s)821 822floatnl = ArgumentDescriptor(823              name='floatnl',824              n=UP_TO_NEWLINE,825              reader=read_floatnl,826              doc="""A newline-terminated decimal floating literal.827 828              In general this requires 17 significant digits for roundtrip829              identity, and pickling then unpickling infinities, NaNs, and830              minus zero doesn't work across boxes, or on some boxes even831              on itself (e.g., Windows can't read the strings it produces832              for infinities or NaNs).833              """)834 835def read_float8(f):836    r"""837    >>> import io, struct838    >>> raw = struct.pack(">d", -1.25)839    >>> raw840    b'\xbf\xf4\x00\x00\x00\x00\x00\x00'841    >>> read_float8(io.BytesIO(raw + b"\n"))842    -1.25843    """844 845    data = f.read(8)846    if len(data) == 8:847        return _unpack(">d", data)[0]848    raise ValueError("not enough data in stream to read float8")849 850 851float8 = ArgumentDescriptor(852             name='float8',853             n=8,854             reader=read_float8,855             doc="""An 8-byte binary representation of a float, big-endian.856 857             The format is unique to Python, and shared with the struct858             module (format string '>d') "in theory" (the struct and pickle859             implementations don't share the code -- they should).  It's860             strongly related to the IEEE-754 double format, and, in normal861             cases, is in fact identical to the big-endian 754 double format.862             On other boxes the dynamic range is limited to that of a 754863             double, and "add a half and chop" rounding is used to reduce864             the precision to 53 bits.  However, even on a 754 box,865             infinities, NaNs, and minus zero may not be handled correctly866             (may not survive roundtrip pickling intact).867             """)868 869# Protocol 2 formats870 871from pickle import decode_long872 873def read_long1(f):874    r"""875    >>> import io876    >>> read_long1(io.BytesIO(b"\x00"))877    0878    >>> read_long1(io.BytesIO(b"\x02\xff\x00"))879    255880    >>> read_long1(io.BytesIO(b"\x02\xff\x7f"))881    32767882    >>> read_long1(io.BytesIO(b"\x02\x00\xff"))883    -256884    >>> read_long1(io.BytesIO(b"\x02\x00\x80"))885    -32768886    """887 888    n = read_uint1(f)889    data = f.read(n)890    if len(data) != n:891        raise ValueError("not enough data in stream to read long1")892    return decode_long(data)893 894long1 = ArgumentDescriptor(895    name="long1",896    n=TAKEN_FROM_ARGUMENT1,897    reader=read_long1,898    doc="""A binary long, little-endian, using 1-byte size.899 900    This first reads one byte as an unsigned size, then reads that901    many bytes and interprets them as a little-endian 2's-complement long.902    If the size is 0, that's taken as a shortcut for the long 0L.903    """)904 905def read_long4(f):906    r"""907    >>> import io908    >>> read_long4(io.BytesIO(b"\x02\x00\x00\x00\xff\x00"))909    255910    >>> read_long4(io.BytesIO(b"\x02\x00\x00\x00\xff\x7f"))911    32767912    >>> read_long4(io.BytesIO(b"\x02\x00\x00\x00\x00\xff"))913    -256914    >>> read_long4(io.BytesIO(b"\x02\x00\x00\x00\x00\x80"))915    -32768916    >>> read_long1(io.BytesIO(b"\x00\x00\x00\x00"))917    0918    """919 920    n = read_int4(f)921    if n < 0:922        raise ValueError("long4 byte count < 0: %d" % n)923    data = f.read(n)924    if len(data) != n:925        raise ValueError("not enough data in stream to read long4")926    return decode_long(data)927 928long4 = ArgumentDescriptor(929    name="long4",930    n=TAKEN_FROM_ARGUMENT4,931    reader=read_long4,932    doc="""A binary representation of a long, little-endian.933 934    This first reads four bytes as a signed size (but requires the935    size to be >= 0), then reads that many bytes and interprets them936    as a little-endian 2's-complement long.  If the size is 0, that's taken937    as a shortcut for the int 0, although LONG1 should really be used938    then instead (and in any case where # of bytes < 256).939    """)940 941 942##############################################################################943# Object descriptors.  The stack used by the pickle machine holds objects,944# and in the stack_before and stack_after attributes of OpcodeInfo945# descriptors we need names to describe the various types of objects that can946# appear on the stack.947 948class StackObject(object):949    __slots__ = (950        # name of descriptor record, for info only951        'name',952 953        # type of object, or tuple of type objects (meaning the object can954        # be of any type in the tuple)955        'obtype',956 957        # human-readable docs for this kind of stack object; a string958        'doc',959    )960 961    def __init__(self, name, obtype, doc):962        assert isinstance(name, str)963        self.name = name964 965        assert isinstance(obtype, type) or isinstance(obtype, tuple)966        if isinstance(obtype, tuple):967            for contained in obtype:968                assert isinstance(contained, type)969        self.obtype = obtype970 971        assert isinstance(doc, str)972        self.doc = doc973 974    def __repr__(self):975        return self.name976 977 978pyint = pylong = StackObject(979    name='int',980    obtype=int,981    doc="A Python integer object.")982 983pyinteger_or_bool = StackObject(984    name='int_or_bool',985    obtype=(int, bool),986    doc="A Python integer or boolean object.")987 988pybool = StackObject(989    name='bool',990    obtype=bool,991    doc="A Python boolean object.")992 993pyfloat = StackObject(994    name='float',995    obtype=float,996    doc="A Python float object.")997 998pybytes_or_str = pystring = StackObject(999    name='bytes_or_str',1000    obtype=(bytes, str),1001    doc="A Python bytes or (Unicode) string object.")1002 1003pybytes = StackObject(1004    name='bytes',1005    obtype=bytes,1006    doc="A Python bytes object.")1007 1008pybytearray = StackObject(1009    name='bytearray',1010    obtype=bytearray,1011    doc="A Python bytearray object.")1012 1013pyunicode = StackObject(1014    name='str',1015    obtype=str,1016    doc="A Python (Unicode) string object.")1017 1018pynone = StackObject(1019    name="None",1020    obtype=type(None),1021    doc="The Python None object.")1022 1023pytuple = StackObject(1024    name="tuple",1025    obtype=tuple,1026    doc="A Python tuple object.")1027 1028pylist = StackObject(1029    name="list",1030    obtype=list,1031    doc="A Python list object.")1032 1033pydict = StackObject(1034    name="dict",1035    obtype=dict,1036    doc="A Python dict object.")1037 1038pyset = StackObject(1039    name="set",1040    obtype=set,1041    doc="A Python set object.")1042 1043pyfrozenset = StackObject(1044    name="frozenset",1045    obtype=set,1046    doc="A Python frozenset object.")1047 1048pybuffer = StackObject(1049    name='buffer',1050    obtype=object,1051    doc="A Python buffer-like object.")1052 1053anyobject = StackObject(1054    name='any',1055    obtype=object,1056    doc="Any kind of object whatsoever.")1057 1058markobject = StackObject(1059    name="mark",1060    obtype=StackObject,1061    doc="""'The mark' is a unique object.1062 1063Opcodes that operate on a variable number of objects1064generally don't embed the count of objects in the opcode,1065or pull it off the stack.  Instead the MARK opcode is used1066to push a special marker object on the stack, and then1067some other opcodes grab all the objects from the top of1068the stack down to (but not including) the topmost marker1069object.1070""")1071 1072stackslice = StackObject(1073    name="stackslice",1074    obtype=StackObject,1075    doc="""An object representing a contiguous slice of the stack.1076 1077This is used in conjunction with markobject, to represent all1078of the stack following the topmost markobject.  For example,1079the POP_MARK opcode changes the stack from1080 1081    [..., markobject, stackslice]1082to1083    [...]1084 1085No matter how many object are on the stack after the topmost1086markobject, POP_MARK gets rid of all of them (including the1087topmost markobject too).1088""")1089 1090##############################################################################1091# Descriptors for pickle opcodes.1092 1093class OpcodeInfo(object):1094 1095    __slots__ = (1096        # symbolic name of opcode; a string1097        'name',1098 1099        # the code used in a bytestream to represent the opcode; a1100        # one-character string1101        'code',1102 1103        # If the opcode has an argument embedded in the byte string, an1104        # instance of ArgumentDescriptor specifying its type.  Note that1105        # arg.reader(s) can be used to read and decode the argument from1106        # the bytestream s, and arg.doc documents the format of the raw1107        # argument bytes.  If the opcode doesn't have an argument embedded1108        # in the bytestream, arg should be None.1109        'arg',1110 1111        # what the stack looks like before this opcode runs; a list1112        'stack_before',1113 1114        # what the stack looks like after this opcode runs; a list1115        'stack_after',1116 1117        # the protocol number in which this opcode was introduced; an int1118        'proto',1119 1120        # human-readable docs for this opcode; a string1121        'doc',1122    )1123 1124    def __init__(self, name, code, arg,1125                 stack_before, stack_after, proto, doc):1126        assert isinstance(name, str)1127        self.name = name1128 1129        assert isinstance(code, str)1130        assert len(code) == 11131        self.code = code1132 1133        assert arg is None or isinstance(arg, ArgumentDescriptor)1134        self.arg = arg1135 1136        assert isinstance(stack_before, list)1137        for x in stack_before:1138            assert isinstance(x, StackObject)1139        self.stack_before = stack_before1140 1141        assert isinstance(stack_after, list)1142        for x in stack_after:1143            assert isinstance(x, StackObject)1144        self.stack_after = stack_after1145 1146        assert isinstance(proto, int) and 0 <= proto <= pickle.HIGHEST_PROTOCOL1147        self.proto = proto1148 1149        assert isinstance(doc, str)1150        self.doc = doc1151 1152I = OpcodeInfo1153opcodes = [1154 1155    # Ways to spell integers.1156 1157    I(name='INT',1158      code='I',1159      arg=decimalnl_short,1160      stack_before=[],1161      stack_after=[pyinteger_or_bool],1162      proto=0,1163      doc="""Push an integer or bool.1164 1165      The argument is a newline-terminated decimal literal string.1166 1167      The intent may have been that this always fit in a short Python int,1168      but INT can be generated in pickles written on a 64-bit box that1169      require a Python long on a 32-bit box.  The difference between this1170      and LONG then is that INT skips a trailing 'L', and produces a short1171      int whenever possible.1172 1173      Another difference is due to that, when bool was introduced as a1174      distinct type in 2.3, builtin names True and False were also added to1175      2.2.2, mapping to ints 1 and 0.  For compatibility in both directions,1176      True gets pickled as INT + "I01\\n", and False as INT + "I00\\n".1177      Leading zeroes are never produced for a genuine integer.  The 2.31178      (and later) unpicklers special-case these and return bool instead;1179      earlier unpicklers ignore the leading "0" and return the int.1180      """),1181 1182    I(name='BININT',1183      code='J',1184      arg=int4,1185      stack_before=[],1186      stack_after=[pyint],1187      proto=1,1188      doc="""Push a four-byte signed integer.1189 1190      This handles the full range of Python (short) integers on a 32-bit1191      box, directly as binary bytes (1 for the opcode and 4 for the integer).1192      If the integer is non-negative and fits in 1 or 2 bytes, pickling via1193      BININT1 or BININT2 saves space.1194      """),1195 1196    I(name='BININT1',1197      code='K',1198      arg=uint1,1199      stack_before=[],1200      stack_after=[pyint],1201      proto=1,1202      doc="""Push a one-byte unsigned integer.1203 1204      This is a space optimization for pickling very small non-negative ints,1205      in range(256).1206      """),1207 1208    I(name='BININT2',1209      code='M',1210      arg=uint2,1211      stack_before=[],1212      stack_after=[pyint],1213      proto=1,1214      doc="""Push a two-byte unsigned integer.1215 1216      This is a space optimization for pickling small positive ints, in1217      range(256, 2**16).  Integers in range(256) can also be pickled via1218      BININT2, but BININT1 instead saves a byte.1219      """),1220 1221    I(name='LONG',1222      code='L',1223      arg=decimalnl_long,1224      stack_before=[],1225      stack_after=[pyint],1226      proto=0,1227      doc="""Push a long integer.1228 1229      The same as INT, except that the literal ends with 'L', and always1230      unpickles to a Python long.  There doesn't seem a real purpose to the1231      trailing 'L'.1232 1233      Note that LONG takes time quadratic in the number of digits when1234      unpickling (this is simply due to the nature of decimal->binary1235      conversion).  Proto 2 added linear-time (in C; still quadratic-time1236      in Python) LONG1 and LONG4 opcodes.1237      """),1238 1239    I(name="LONG1",1240      code='\x8a',1241      arg=long1,1242      stack_before=[],1243      stack_after=[pyint],1244      proto=2,1245      doc="""Long integer using one-byte length.1246 1247      A more efficient encoding of a Python long; the long1 encoding1248      says it all."""),1249 1250    I(name="LONG4",1251      code='\x8b',1252      arg=long4,1253      stack_before=[],1254      stack_after=[pyint],1255      proto=2,1256      doc="""Long integer using found-byte length.1257 1258      A more efficient encoding of a Python long; the long4 encoding1259      says it all."""),1260 1261    # Ways to spell strings (8-bit, not Unicode).1262 1263    I(name='STRING',1264      code='S',1265      arg=stringnl,1266      stack_before=[],1267      stack_after=[pybytes_or_str],1268      proto=0,1269      doc="""Push a Python string object.1270 1271      The argument is a repr-style string, with bracketing quote characters,1272      and perhaps embedded escapes.  The argument extends until the next1273      newline character.  These are usually decoded into a str instance1274      using the encoding given to the Unpickler constructor. or the default,1275      'ASCII'.  If the encoding given was 'bytes' however, they will be1276      decoded as bytes object instead.1277      """),1278 1279    I(name='BINSTRING',1280      code='T',1281      arg=string4,1282      stack_before=[],1283      stack_after=[pybytes_or_str],1284      proto=1,1285      doc="""Push a Python string object.1286 1287      There are two arguments: the first is a 4-byte little-endian1288      signed int giving the number of bytes in the string, and the1289      second is that many bytes, which are taken literally as the string1290      content.  These are usually decoded into a str instance using the1291      encoding given to the Unpickler constructor. or the default,1292      'ASCII'.  If the encoding given was 'bytes' however, they will be1293      decoded as bytes object instead.1294      """),1295 1296    I(name='SHORT_BINSTRING',1297      code='U',1298      arg=string1,1299      stack_before=[],1300      stack_after=[pybytes_or_str],1301      proto=1,1302      doc="""Push a Python string object.1303 1304      There are two arguments: the first is a 1-byte unsigned int giving1305      the number of bytes in the string, and the second is that many1306      bytes, which are taken literally as the string content.  These are1307      usually decoded into a str instance using the encoding given to1308      the Unpickler constructor. or the default, 'ASCII'.  If the1309      encoding given was 'bytes' however, they will be decoded as bytes1310      object instead.1311      """),1312 1313    # Bytes (protocol 3 and higher)1314 1315    I(name='BINBYTES',1316      code='B',1317      arg=bytes4,1318      stack_before=[],1319      stack_after=[pybytes],1320      proto=3,1321      doc="""Push a Python bytes object.1322 1323      There are two arguments:  the first is a 4-byte little-endian unsigned int1324      giving the number of bytes, and the second is that many bytes, which are1325      taken literally as the bytes content.1326      """),1327 1328    I(name='SHORT_BINBYTES',1329      code='C',1330      arg=bytes1,1331      stack_before=[],1332      stack_after=[pybytes],1333      proto=3,1334      doc="""Push a Python bytes object.1335 1336      There are two arguments:  the first is a 1-byte unsigned int giving1337      the number of bytes, and the second is that many bytes, which are taken1338      literally as the string content.1339      """),1340 1341    I(name='BINBYTES8',1342      code='\x8e',1343      arg=bytes8,1344      stack_before=[],1345      stack_after=[pybytes],1346      proto=4,1347      doc="""Push a Python bytes object.1348 1349      There are two arguments:  the first is an 8-byte unsigned int giving1350      the number of bytes in the string, and the second is that many bytes,1351      which are taken literally as the string content.1352      """),1353 1354    # Bytearray (protocol 5 and higher)1355 1356    I(name='BYTEARRAY8',1357      code='\x96',1358      arg=bytearray8,1359      stack_before=[],1360      stack_after=[pybytearray],1361      proto=5,1362      doc="""Push a Python bytearray object.1363 1364      There are two arguments:  the first is an 8-byte unsigned int giving1365      the number of bytes in the bytearray, and the second is that many bytes,1366      which are taken literally as the bytearray content.1367      """),1368 1369    # Out-of-band buffer (protocol 5 and higher)1370 1371    I(name='NEXT_BUFFER',1372      code='\x97',1373      arg=None,1374      stack_before=[],1375      stack_after=[pybuffer],1376      proto=5,1377      doc="Push an out-of-band buffer object."),1378 1379    I(name='READONLY_BUFFER',1380      code='\x98',1381      arg=None,1382      stack_before=[pybuffer],1383      stack_after=[pybuffer],1384      proto=5,1385      doc="Make an out-of-band buffer object read-only."),1386 1387    # Ways to spell None.1388 1389    I(name='NONE',1390      code='N',1391      arg=None,1392      stack_before=[],1393      stack_after=[pynone],1394      proto=0,1395      doc="Push None on the stack."),1396 1397    # Ways to spell bools, starting with proto 2.  See INT for how this was1398    # done before proto 2.1399 1400    I(name='NEWTRUE',1401      code='\x88',1402      arg=None,1403      stack_before=[],1404      stack_after=[pybool],1405      proto=2,1406      doc="Push True onto the stack."),1407 1408    I(name='NEWFALSE',1409      code='\x89',1410      arg=None,1411      stack_before=[],1412      stack_after=[pybool],1413      proto=2,1414      doc="Push False onto the stack."),1415 1416    # Ways to spell Unicode strings.1417 1418    I(name='UNICODE',1419      code='V',1420      arg=unicodestringnl,1421      stack_before=[],1422      stack_after=[pyunicode],1423      proto=0,  # this may be pure-text, but it's a later addition1424      doc="""Push a Python Unicode string object.1425 1426      The argument is a raw-unicode-escape encoding of a Unicode string,1427      and so may contain embedded escape sequences.  The argument extends1428      until the next newline character.1429      """),1430 1431    I(name='SHORT_BINUNICODE',1432      code='\x8c',1433      arg=unicodestring1,1434      stack_before=[],1435      stack_after=[pyunicode],1436      proto=4,1437      doc="""Push a Python Unicode string object.1438 1439      There are two arguments:  the first is a 1-byte little-endian signed int1440      giving the number of bytes in the string.  The second is that many1441      bytes, and is the UTF-8 encoding of the Unicode string.1442      """),1443 1444    I(name='BINUNICODE',1445      code='X',1446      arg=unicodestring4,1447      stack_before=[],1448      stack_after=[pyunicode],1449      proto=1,1450      doc="""Push a Python Unicode string object.1451 1452      There are two arguments:  the first is a 4-byte little-endian unsigned int1453      giving the number of bytes in the string.  The second is that many1454      bytes, and is the UTF-8 encoding of the Unicode string.1455      """),1456 1457    I(name='BINUNICODE8',1458      code='\x8d',1459      arg=unicodestring8,1460      stack_before=[],1461      stack_after=[pyunicode],1462      proto=4,1463      doc="""Push a Python Unicode string object.1464 1465      There are two arguments:  the first is an 8-byte little-endian signed int1466      giving the number of bytes in the string.  The second is that many1467      bytes, and is the UTF-8 encoding of the Unicode string.1468      """),1469 1470    # Ways to spell floats.1471 1472    I(name='FLOAT',1473      code='F',1474      arg=floatnl,1475      stack_before=[],1476      stack_after=[pyfloat],1477      proto=0,1478      doc="""Newline-terminated decimal float literal.1479 1480      The argument is repr(a_float), and in general requires 17 significant1481      digits for roundtrip conversion to be an identity (this is so for1482      IEEE-754 double precision values, which is what Python float maps to1483      on most boxes).1484 1485      In general, FLOAT cannot be used to transport infinities, NaNs, or1486      minus zero across boxes (or even on a single box, if the platform C1487      library can't read the strings it produces for such things -- Windows1488      is like that), but may do less damage than BINFLOAT on boxes with1489      greater precision or dynamic range than IEEE-754 double.1490      """),1491 1492    I(name='BINFLOAT',1493      code='G',1494      arg=float8,1495      stack_before=[],1496      stack_after=[pyfloat],1497      proto=1,1498      doc="""Float stored in binary form, with 8 bytes of data.1499 1500      This generally requires less than half the space of FLOAT encoding.1501      In general, BINFLOAT cannot be used to transport infinities, NaNs, or1502      minus zero, raises an exception if the exponent exceeds the range of1503      an IEEE-754 double, and retains no more than 53 bits of precision (if1504      there are more than that, "add a half and chop" rounding is used to1505      cut it back to 53 significant bits).1506      """),1507 1508    # Ways to build lists.1509 1510    I(name='EMPTY_LIST',1511      code=']',1512      arg=None,1513      stack_before=[],1514      stack_after=[pylist],1515      proto=1,1516      doc="Push an empty list."),1517 1518    I(name='APPEND',1519      code='a',1520      arg=None,1521      stack_before=[pylist, anyobject],1522      stack_after=[pylist],1523      proto=0,1524      doc="""Append an object to a list.1525 1526      Stack before:  ... pylist anyobject1527      Stack after:   ... pylist+[anyobject]1528 1529      although pylist is really extended in-place.1530      """),1531 1532    I(name='APPENDS',1533      code='e',1534      arg=None,1535      stack_before=[pylist, markobject, stackslice],1536      stack_after=[pylist],1537      proto=1,1538      doc="""Extend a list by a slice of stack objects.1539 1540      Stack before:  ... pylist markobject stackslice1541      Stack after:   ... pylist+stackslice1542 1543      although pylist is really extended in-place.1544      """),1545 1546    I(name='LIST',1547      code='l',1548      arg=None,1549      stack_before=[markobject, stackslice],1550      stack_after=[pylist],1551      proto=0,1552      doc="""Build a list out of the topmost stack slice, after markobject.1553 1554      All the stack entries following the topmost markobject are placed into1555      a single Python list, which single list object replaces all of the1556      stack from the topmost markobject onward.  For example,1557 1558      Stack before: ... markobject 1 2 3 'abc'1559      Stack after:  ... [1, 2, 3, 'abc']1560      """),1561 1562    # Ways to build tuples.1563 1564    I(name='EMPTY_TUPLE',1565      code=')',1566      arg=None,1567      stack_before=[],1568      stack_after=[pytuple],1569      proto=1,1570      doc="Push an empty tuple."),1571 1572    I(name='TUPLE',1573      code='t',1574      arg=None,1575      stack_before=[markobject, stackslice],1576      stack_after=[pytuple],1577      proto=0,1578      doc="""Build a tuple out of the topmost stack slice, after markobject.1579 1580      All the stack entries following the topmost markobject are placed into1581      a single Python tuple, which single tuple object replaces all of the1582      stack from the topmost markobject onward.  For example,1583 1584      Stack before: ... markobject 1 2 3 'abc'1585      Stack after:  ... (1, 2, 3, 'abc')1586      """),1587 1588    I(name='TUPLE1',1589      code='\x85',1590      arg=None,1591      stack_before=[anyobject],1592      stack_after=[pytuple],1593      proto=2,1594      doc="""Build a one-tuple out of the topmost item on the stack.1595 1596      This code pops one value off the stack and pushes a tuple of1597      length 1 whose one item is that value back onto it.  In other1598      words:1599 1600          stack[-1] = tuple(stack[-1:])1601      """),1602 1603    I(name='TUPLE2',1604      code='\x86',1605      arg=None,1606      stack_before=[anyobject, anyobject],1607      stack_after=[pytuple],1608      proto=2,1609      doc="""Build a two-tuple out of the top two items on the stack.1610 1611      This code pops two values off the stack and pushes a tuple of1612      length 2 whose items are those values back onto it.  In other1613      words:1614 1615          stack[-2:] = [tuple(stack[-2:])]1616      """),1617 1618    I(name='TUPLE3',1619      code='\x87',1620      arg=None,1621      stack_before=[anyobject, anyobject, anyobject],1622      stack_after=[pytuple],1623      proto=2,1624      doc="""Build a three-tuple out of the top three items on the stack.1625 1626      This code pops three values off the stack and pushes a tuple of1627      length 3 whose items are those values back onto it.  In other1628      words:1629 1630          stack[-3:] = [tuple(stack[-3:])]1631      """),1632 1633    # Ways to build dicts.1634 1635    I(name='EMPTY_DICT',1636      code='}',1637      arg=None,1638      stack_before=[],1639      stack_after=[pydict],1640      proto=1,1641      doc="Push an empty dict."),1642 1643    I(name='DICT',1644      code='d',1645      arg=None,1646      stack_before=[markobject, stackslice],1647      stack_after=[pydict],1648      proto=0,1649      doc="""Build a dict out of the topmost stack slice, after markobject.1650 1651      All the stack entries following the topmost markobject are placed into1652      a single Python dict, which single dict object replaces all of the1653      stack from the topmost markobject onward.  The stack slice alternates1654      key, value, key, value, ....  For example,1655 1656      Stack before: ... markobject 1 2 3 'abc'1657      Stack after:  ... {1: 2, 3: 'abc'}1658      """),1659 1660    I(name='SETITEM',1661      code='s',1662      arg=None,1663      stack_before=[pydict, anyobject, anyobject],1664      stack_after=[pydict],1665      proto=0,1666      doc="""Add a key+value pair to an existing dict.1667 1668      Stack before:  ... pydict key value1669      Stack after:   ... pydict1670 1671      where pydict has been modified via pydict[key] = value.1672      """),1673 1674    I(name='SETITEMS',1675      code='u',1676      arg=None,1677      stack_before=[pydict, markobject, stackslice],1678      stack_after=[pydict],1679      proto=1,1680      doc="""Add an arbitrary number of key+value pairs to an existing dict.1681 1682      The slice of the stack following the topmost markobject is taken as1683      an alternating sequence of keys and values, added to the dict1684      immediately under the topmost markobject.  Everything at and after the1685      topmost markobject is popped, leaving the mutated dict at the top1686      of the stack.1687 1688      Stack before:  ... pydict markobject key_1 value_1 ... key_n value_n1689      Stack after:   ... pydict1690 1691      where pydict has been modified via pydict[key_i] = value_i for i in1692      1, 2, ..., n, and in that order.1693      """),1694 1695    # Ways to build sets1696 1697    I(name='EMPTY_SET',1698      code='\x8f',1699      arg=None,1700      stack_before=[],1701      stack_after=[pyset],1702      proto=4,1703      doc="Push an empty set."),1704 1705    I(name='ADDITEMS',1706      code='\x90',1707      arg=None,1708      stack_before=[pyset, markobject, stackslice],1709      stack_after=[pyset],1710      proto=4,1711      doc="""Add an arbitrary number of items to an existing set.1712 1713      The slice of the stack following the topmost markobject is taken as1714      a sequence of items, added to the set immediately under the topmost1715      markobject.  Everything at and after the topmost markobject is popped,1716      leaving the mutated set at the top of the stack.1717 1718      Stack before:  ... pyset markobject item_1 ... item_n1719      Stack after:   ... pyset1720 1721      where pyset has been modified via pyset.add(item_i) = item_i for i in1722      1, 2, ..., n, and in that order.1723      """),1724 1725    # Way to build frozensets1726 1727    I(name='FROZENSET',1728      code='\x91',1729      arg=None,1730      stack_before=[markobject, stackslice],1731      stack_after=[pyfrozenset],1732      proto=4,1733      doc="""Build a frozenset out of the topmost slice, after markobject.1734 1735      All the stack entries following the topmost markobject are placed into1736      a single Python frozenset, which single frozenset object replaces all1737      of the stack from the topmost markobject onward.  For example,1738 1739      Stack before: ... markobject 1 2 31740      Stack after:  ... frozenset({1, 2, 3})1741      """),1742 1743    # Stack manipulation.1744 1745    I(name='POP',1746      code='0',1747      arg=None,1748      stack_before=[anyobject],1749      stack_after=[],1750      proto=0,1751      doc="Discard the top stack item, shrinking the stack by one item."),1752 1753    I(name='DUP',1754      code='2',1755      arg=None,1756      stack_before=[anyobject],1757      stack_after=[anyobject, anyobject],1758      proto=0,1759      doc="Push the top stack item onto the stack again, duplicating it."),1760 1761    I(name='MARK',1762      code='(',1763      arg=None,1764      stack_before=[],1765      stack_after=[markobject],1766      proto=0,1767      doc="""Push markobject onto the stack.1768 1769      markobject is a unique object, used by other opcodes to identify a1770      region of the stack containing a variable number of objects for them1771      to work on.  See markobject.doc for more detail.1772      """),1773 1774    I(name='POP_MARK',1775      code='1',1776      arg=None,1777      stack_before=[markobject, stackslice],1778      stack_after=[],1779      proto=1,1780      doc="""Pop all the stack objects at and above the topmost markobject.1781 1782      When an opcode using a variable number of stack objects is done,1783      POP_MARK is used to remove those objects, and to remove the markobject1784      that delimited their starting position on the stack.1785      """),1786 1787    # Memo manipulation.  There are really only two operations (get and put),1788    # each in all-text, "short binary", and "long binary" flavors.1789 1790    I(name='GET',1791      code='g',1792      arg=decimalnl_short,1793      stack_before=[],1794      stack_after=[anyobject],1795      proto=0,1796      doc="""Read an object from the memo and push it on the stack.1797 1798      The index of the memo object to push is given by the newline-terminated1799      decimal string following.  BINGET and LONG_BINGET are space-optimized1800      versions.1801      """),1802 1803    I(name='BINGET',1804      code='h',1805      arg=uint1,1806      stack_before=[],1807      stack_after=[anyobject],1808      proto=1,1809      doc="""Read an object from the memo and push it on the stack.1810 1811      The index of the memo object to push is given by the 1-byte unsigned1812      integer following.1813      """),1814 1815    I(name='LONG_BINGET',1816      code='j',1817      arg=uint4,1818      stack_before=[],1819      stack_after=[anyobject],1820      proto=1,1821      doc="""Read an object from the memo and push it on the stack.1822 1823      The index of the memo object to push is given by the 4-byte unsigned1824      little-endian integer following.1825      """),1826 1827    I(name='PUT',1828      code='p',1829      arg=decimalnl_short,1830      stack_before=[],1831      stack_after=[],1832      proto=0,1833      doc="""Store the stack top into the memo.  The stack is not popped.1834 1835      The index of the memo location to write into is given by the newline-1836      terminated decimal string following.  BINPUT and LONG_BINPUT are1837      space-optimized versions.1838      """),1839 1840    I(name='BINPUT',1841      code='q',1842      arg=uint1,1843      stack_before=[],1844      stack_after=[],1845      proto=1,1846      doc="""Store the stack top into the memo.  The stack is not popped.1847 1848      The index of the memo location to write into is given by the 1-byte1849      unsigned integer following.1850      """),1851 1852    I(name='LONG_BINPUT',1853      code='r',1854      arg=uint4,1855      stack_before=[],1856      stack_after=[],1857      proto=1,1858      doc="""Store the stack top into the memo.  The stack is not popped.1859 1860      The index of the memo location to write into is given by the 4-byte1861      unsigned little-endian integer following.1862      """),1863 1864    I(name='MEMOIZE',1865      code='\x94',1866      arg=None,1867      stack_before=[anyobject],1868      stack_after=[anyobject],1869      proto=4,1870      doc="""Store the stack top into the memo.  The stack is not popped.1871 1872      The index of the memo location to write is the number of1873      elements currently present in the memo.1874      """),1875 1876    # Access the extension registry (predefined objects).  Akin to the GET1877    # family.1878 1879    I(name='EXT1',1880      code='\x82',1881      arg=uint1,1882      stack_before=[],1883      stack_after=[anyobject],1884      proto=2,1885      doc="""Extension code.1886 1887      This code and the similar EXT2 and EXT4 allow using a registry1888      of popular objects that are pickled by name, typically classes.1889      It is envisioned that through a global negotiation and1890      registration process, third parties can set up a mapping between1891      ints and object names.1892 1893      In order to guarantee pickle interchangeability, the extension1894      code registry ought to be global, although a range of codes may1895      be reserved for private use.1896 1897      EXT1 has a 1-byte integer argument.  This is used to index into the1898      extension registry, and the object at that index is pushed on the stack.1899      """),1900 1901    I(name='EXT2',1902      code='\x83',1903      arg=uint2,1904      stack_before=[],1905      stack_after=[anyobject],1906      proto=2,1907      doc="""Extension code.1908 1909      See EXT1.  EXT2 has a two-byte integer argument.1910      """),1911 1912    I(name='EXT4',1913      code='\x84',1914      arg=int4,1915      stack_before=[],1916      stack_after=[anyobject],1917      proto=2,1918      doc="""Extension code.1919 1920      See EXT1.  EXT4 has a four-byte integer argument.1921      """),1922 1923    # Push a class object, or module function, on the stack, via its module1924    # and name.1925 1926    I(name='GLOBAL',1927      code='c',1928      arg=stringnl_noescape_pair,1929      stack_before=[],1930      stack_after=[anyobject],1931      proto=0,1932      doc="""Push a global object (module.attr) on the stack.1933 1934      Two newline-terminated strings follow the GLOBAL opcode.  The first is1935      taken as a module name, and the second as a class name.  The class1936      object module.class is pushed on the stack.  More accurately, the1937      object returned by self.find_class(module, class) is pushed on the1938      stack, so unpickling subclasses can override this form of lookup.1939      """),1940 1941    I(name='STACK_GLOBAL',1942      code='\x93',1943      arg=None,1944      stack_before=[pyunicode, pyunicode],1945      stack_after=[anyobject],1946      proto=4,1947      doc="""Push a global object (module.attr) on the stack.1948      """),1949 1950    # Ways to build objects of classes pickle doesn't know about directly1951    # (user-defined classes).  I despair of documenting this accurately1952    # and comprehensibly -- you really have to read the pickle code to1953    # find all the special cases.1954 1955    I(name='REDUCE',1956      code='R',1957      arg=None,1958      stack_before=[anyobject, anyobject],1959      stack_after=[anyobject],1960      proto=0,1961      doc="""Push an object built from a callable and an argument tuple.1962 1963      The opcode is named to remind of the __reduce__() method.1964 1965      Stack before: ... callable pytuple1966      Stack after:  ... callable(*pytuple)1967 1968      The callable and the argument tuple are the first two items returned1969      by a __reduce__ method.  Applying the callable to the argtuple is1970      supposed to reproduce the original object, or at least get it started.1971      If the __reduce__ method returns a 3-tuple, the last component is an1972      argument to be passed to the object's __setstate__, and then the REDUCE1973      opcode is followed by code to create setstate's argument, and then a1974      BUILD opcode to apply  __setstate__ to that argument.1975 1976      If not isinstance(callable, type), REDUCE complains unless the1977      callable has been registered with the copyreg module's1978      safe_constructors dict, or the callable has a magic1979      '__safe_for_unpickling__' attribute with a true value.  I'm not sure1980      why it does this, but I've sure seen this complaint often enough when1981      I didn't want to <wink>.1982      """),1983 1984    I(name='BUILD',1985      code='b',1986      arg=None,1987      stack_before=[anyobject, anyobject],1988      stack_after=[anyobject],1989      proto=0,1990      doc="""Finish building an object, via __setstate__ or dict update.1991 1992      Stack before: ... anyobject argument1993      Stack after:  ... anyobject1994 1995      where anyobject may have been mutated, as follows:1996 1997      If the object has a __setstate__ method,1998 1999          anyobject.__setstate__(argument)2000 2001      is called.2002 2003      Else the argument must be a dict, the object must have a __dict__, and2004      the object is updated via2005 2006          anyobject.__dict__.update(argument)2007      """),2008 2009    I(name='INST',2010      code='i',2011      arg=stringnl_noescape_pair,2012      stack_before=[markobject, stackslice],2013      stack_after=[anyobject],2014      proto=0,2015      doc="""Build a class instance.2016 2017      This is the protocol 0 version of protocol 1's OBJ opcode.2018      INST is followed by two newline-terminated strings, giving a2019      module and class name, just as for the GLOBAL opcode (and see2020      GLOBAL for more details about that).  self.find_class(module, name)2021      is used to get a class object.2022 2023      In addition, all the objects on the stack following the topmost2024      markobject are gathered into a tuple and popped (along with the2025      topmost markobject), just as for the TUPLE opcode.2026 2027      Now it gets complicated.  If all of these are true:2028 2029        + The argtuple is empty (markobject was at the top of the stack2030          at the start).2031 2032        + The class object does not have a __getinitargs__ attribute.2033 2034      then we want to create an old-style class instance without invoking2035      its __init__() method (pickle has waffled on this over the years; not2036      calling __init__() is current wisdom).  In this case, an instance of2037      an old-style dummy class is created, and then we try to rebind its2038      __class__ attribute to the desired class object.  If this succeeds,2039      the new instance object is pushed on the stack, and we're done.2040 2041      Else (the argtuple is not empty, it's not an old-style class object,2042      or the class object does have a __getinitargs__ attribute), the code2043      first insists that the class object have a __safe_for_unpickling__2044      attribute.  Unlike as for the __safe_for_unpickling__ check in REDUCE,2045      it doesn't matter whether this attribute has a true or false value, it2046      only matters whether it exists (XXX this is a bug).  If2047      __safe_for_unpickling__ doesn't exist, UnpicklingError is raised.2048 2049      Else (the class object does have a __safe_for_unpickling__ attr),2050      the class object obtained from INST's arguments is applied to the2051      argtuple obtained from the stack, and the resulting instance object2052      is pushed on the stack.2053 2054      NOTE:  checks for __safe_for_unpickling__ went away in Python 2.3.2055      NOTE:  the distinction between old-style and new-style classes does2056             not make sense in Python 3.2057      """),2058 2059    I(name='OBJ',2060      code='o',2061      arg=None,2062      stack_before=[markobject, anyobject, stackslice],2063      stack_after=[anyobject],2064      proto=1,2065      doc="""Build a class instance.2066 2067      This is the protocol 1 version of protocol 0's INST opcode, and is2068      very much like it.  The major difference is that the class object2069      is taken off the stack, allowing it to be retrieved from the memo2070      repeatedly if several instances of the same class are created.  This2071      can be much more efficient (in both time and space) than repeatedly2072      embedding the module and class names in INST opcodes.2073 2074      Unlike INST, OBJ takes no arguments from the opcode stream.  Instead2075      the class object is taken off the stack, immediately above the2076      topmost markobject:2077 2078      Stack before: ... markobject classobject stackslice2079      Stack after:  ... new_instance_object2080 2081      As for INST, the remainder of the stack above the markobject is2082      gathered into an argument tuple, and then the logic seems identical,2083      except that no __safe_for_unpickling__ check is done (XXX this is2084      a bug).  See INST for the gory details.2085 2086      NOTE:  In Python 2.3, INST and OBJ are identical except for how they2087      get the class object.  That was always the intent; the implementations2088      had diverged for accidental reasons.2089      """),2090 2091    I(name='NEWOBJ',2092      code='\x81',2093      arg=None,2094      stack_before=[anyobject, anyobject],2095      stack_after=[anyobject],2096      proto=2,2097      doc="""Build an object instance.2098 2099      The stack before should be thought of as containing a class2100      object followed by an argument tuple (the tuple being the stack2101      top).  Call these cls and args.  They are popped off the stack,2102      and the value returned by cls.__new__(cls, *args) is pushed back2103      onto the stack.2104      """),2105 2106    I(name='NEWOBJ_EX',2107      code='\x92',2108      arg=None,2109      stack_before=[anyobject, anyobject, anyobject],2110      stack_after=[anyobject],2111      proto=4,2112      doc="""Build an object instance.2113 2114      The stack before should be thought of as containing a class2115      object followed by an argument tuple and by a keyword argument dict2116      (the dict being the stack top).  Call these cls and args.  They are2117      popped off the stack, and the value returned by2118      cls.__new__(cls, *args, *kwargs) is  pushed back  onto the stack.2119      """),2120 2121    # Machine control.2122 2123    I(name='PROTO',2124      code='\x80',2125      arg=uint1,2126      stack_before=[],2127      stack_after=[],2128      proto=2,2129      doc="""Protocol version indicator.2130 2131      For protocol 2 and above, a pickle must start with this opcode.2132      The argument is the protocol version, an int in range(2, 256).2133      """),2134 2135    I(name='STOP',2136      code='.',2137      arg=None,2138      stack_before=[anyobject],2139      stack_after=[],2140      proto=0,2141      doc="""Stop the unpickling machine.2142 2143      Every pickle ends with this opcode.  The object at the top of the stack2144      is popped, and that's the result of unpickling.  The stack should be2145      empty then.2146      """),2147 2148    # Framing support.2149 2150    I(name='FRAME',2151      code='\x95',2152      arg=uint8,2153      stack_before=[],2154      stack_after=[],2155      proto=4,2156      doc="""Indicate the beginning of a new frame.2157 2158      The unpickler may use this opcode to safely prefetch data from its2159      underlying stream.2160      """),2161 2162    # Ways to deal with persistent IDs.2163 2164    I(name='PERSID',2165      code='P',2166      arg=stringnl_noescape,2167      stack_before=[],2168      stack_after=[anyobject],2169      proto=0,2170      doc="""Push an object identified by a persistent ID.2171 2172      The pickle module doesn't define what a persistent ID means.  PERSID's2173      argument is a newline-terminated str-style (no embedded escapes, no2174      bracketing quote characters) string, which *is* "the persistent ID".2175      The unpickler passes this string to self.persistent_load().  Whatever2176      object that returns is pushed on the stack.  There is no implementation2177      of persistent_load() in Python's unpickler:  it must be supplied by an2178      unpickler subclass.2179      """),2180 2181    I(name='BINPERSID',2182      code='Q',2183      arg=None,2184      stack_before=[anyobject],2185      stack_after=[anyobject],2186      proto=1,2187      doc="""Push an object identified by a persistent ID.2188 2189      Like PERSID, except the persistent ID is popped off the stack (instead2190      of being a string embedded in the opcode bytestream).  The persistent2191      ID is passed to self.persistent_load(), and whatever object that2192      returns is pushed on the stack.  See PERSID for more detail.2193      """),2194]2195del I2196 2197# Verify uniqueness of .name and .code members.2198name2i = {}2199code2i = {}2200 2201for i, d in enumerate(opcodes):2202    if d.name in name2i:2203        raise ValueError("repeated name %r at indices %d and %d" %2204                         (d.name, name2i[d.name], i))2205    if d.code in code2i:2206        raise ValueError("repeated code %r at indices %d and %d" %2207                         (d.code, code2i[d.code], i))2208 2209    name2i[d.name] = i2210    code2i[d.code] = i2211 2212del name2i, code2i, i, d2213 2214##############################################################################2215# Build a code2op dict, mapping opcode characters to OpcodeInfo records.2216# Also ensure we've got the same stuff as pickle.py, although the2217# introspection here is dicey.2218 2219code2op = {}2220for d in opcodes:2221    code2op[d.code] = d2222del d2223 2224def assure_pickle_consistency(verbose=False):2225 2226    copy = code2op.copy()2227    for name in pickle.__all__:2228        if not re.match("[A-Z][A-Z0-9_]+$", name):2229            if verbose:2230                print("skipping %r: it doesn't look like an opcode name" % name)2231            continue2232        picklecode = getattr(pickle, name)2233        if not isinstance(picklecode, bytes) or len(picklecode) != 1:2234            if verbose:2235                print(("skipping %r: value %r doesn't look like a pickle "2236                       "code" % (name, picklecode)))2237            continue2238        picklecode = picklecode.decode("latin-1")2239        if picklecode in copy:2240            if verbose:2241                print("checking name %r w/ code %r for consistency" % (2242                      name, picklecode))2243            d = copy[picklecode]2244            if d.name != name:2245                raise ValueError("for pickle code %r, pickle.py uses name %r "2246                                 "but we're using name %r" % (picklecode,2247                                                              name,2248                                                              d.name))2249            # Forget this one.  Any left over in copy at the end are a problem2250            # of a different kind.2251            del copy[picklecode]2252        else:2253            raise ValueError("pickle.py appears to have a pickle opcode with "2254                             "name %r and code %r, but we don't" %2255                             (name, picklecode))2256    if copy:2257        msg = ["we appear to have pickle opcodes that pickle.py doesn't have:"]2258        for code, d in copy.items():2259            msg.append("    name %r with code %r" % (d.name, code))2260        raise ValueError("\n".join(msg))2261 2262assure_pickle_consistency()2263del assure_pickle_consistency2264 2265##############################################################################2266# A pickle opcode generator.2267 2268def _genops(data, yield_end_pos=False):2269    if isinstance(data, bytes_types):2270        data = io.BytesIO(data)2271 2272    if hasattr(data, "tell"):2273        getpos = data.tell2274    else:2275        getpos = lambda: None2276 2277    while True:2278        pos = getpos()2279        code = data.read(1)2280        opcode = code2op.get(code.decode("latin-1"))2281        if opcode is None:2282            if code == b"":2283                raise ValueError("pickle exhausted before seeing STOP")2284            else:2285                raise ValueError("at position %s, opcode %r unknown" % (2286                                 "<unknown>" if pos is None else pos,2287                                 code))2288        if opcode.arg is None:2289            arg = None2290        else:2291            arg = opcode.arg.reader(data)2292        if yield_end_pos:2293            yield opcode, arg, pos, getpos()2294        else:2295            yield opcode, arg, pos2296        if code == b'.':2297            assert opcode.name == 'STOP'2298            break2299 2300def genops(pickle):2301    """Generate all the opcodes in a pickle.2302 2303    'pickle' is a file-like object, or string, containing the pickle.2304 2305    Each opcode in the pickle is generated, from the current pickle position,2306    stopping after a STOP opcode is delivered.  A triple is generated for2307    each opcode:2308 2309        opcode, arg, pos2310 2311    opcode is an OpcodeInfo record, describing the current opcode.2312 2313    If the opcode has an argument embedded in the pickle, arg is its decoded2314    value, as a Python object.  If the opcode doesn't have an argument, arg2315    is None.2316 2317    If the pickle has a tell() method, pos was the value of pickle.tell()2318    before reading the current opcode.  If the pickle is a bytes object,2319    it's wrapped in a BytesIO object, and the latter's tell() result is2320    used.  Else (the pickle doesn't have a tell(), and it's not obvious how2321    to query its current position) pos is None.2322    """2323    return _genops(pickle)2324 2325##############################################################################2326# A pickle optimizer.2327 2328def optimize(p):2329    'Optimize a pickle string by removing unused PUT opcodes'2330    put = 'PUT'2331    get = 'GET'2332    oldids = set()          # set of all PUT ids2333    newids = {}             # set of ids used by a GET opcode2334    opcodes = []            # (op, idx) or (pos, end_pos)2335    proto = 02336    protoheader = b''2337    for opcode, arg, pos, end_pos in _genops(p, yield_end_pos=True):2338        if 'PUT' in opcode.name:2339            oldids.add(arg)2340            opcodes.append((put, arg))2341        elif opcode.name == 'MEMOIZE':2342            idx = len(oldids)2343            oldids.add(idx)2344            opcodes.append((put, idx))2345        elif 'FRAME' in opcode.name:2346            pass2347        elif 'GET' in opcode.name:2348            if opcode.proto > proto:2349                proto = opcode.proto2350            newids[arg] = None2351            opcodes.append((get, arg))2352        elif opcode.name == 'PROTO':2353            if arg > proto:2354                proto = arg2355            if pos == 0:2356                protoheader = p[pos:end_pos]2357            else:2358                opcodes.append((pos, end_pos))2359        else:2360            opcodes.append((pos, end_pos))2361    del oldids2362 2363    # Copy the opcodes except for PUTS without a corresponding GET2364    out = io.BytesIO()2365    # Write the PROTO header before any framing2366    out.write(protoheader)2367    pickler = pickle._Pickler(out, proto)2368    if proto >= 4:2369        pickler.framer.start_framing()2370    idx = 02371    for op, arg in opcodes:2372        frameless = False2373        if op is put:2374            if arg not in newids:2375                continue2376            data = pickler.put(idx)2377            newids[arg] = idx2378            idx += 12379        elif op is get:2380            data = pickler.get(newids[arg])2381        else:2382            data = p[op:arg]2383            frameless = len(data) > pickler.framer._FRAME_SIZE_TARGET2384        pickler.framer.commit_frame(force=frameless)2385        if frameless:2386            pickler.framer.file_write(data)2387        else:2388            pickler.write(data)2389    pickler.framer.end_framing()2390    return out.getvalue()2391 2392##############################################################################2393# A symbolic pickle disassembler.2394 2395def dis(pickle, out=None, memo=None, indentlevel=4, annotate=0):2396    """Produce a symbolic disassembly of a pickle.2397 2398    'pickle' is a file-like object, or string, containing a (at least one)2399    pickle.  The pickle is disassembled from the current position, through2400    the first STOP opcode encountered.2401 2402    Optional arg 'out' is a file-like object to which the disassembly is2403    printed.  It defaults to sys.stdout.2404 2405    Optional arg 'memo' is a Python dict, used as the pickle's memo.  It2406    may be mutated by dis(), if the pickle contains PUT or BINPUT opcodes.2407    Passing the same memo object to another dis() call then allows disassembly2408    to proceed across multiple pickles that were all created by the same2409    pickler with the same memo.  Ordinarily you don't need to worry about this.2410 2411    Optional arg 'indentlevel' is the number of blanks by which to indent2412    a new MARK level.  It defaults to 4.2413 2414    Optional arg 'annotate' if nonzero instructs dis() to add short2415    description of the opcode on each line of disassembled output.2416    The value given to 'annotate' must be an integer and is used as a2417    hint for the column where annotation should start.  The default2418    value is 0, meaning no annotations.2419 2420    In addition to printing the disassembly, some sanity checks are made:2421 2422    + All embedded opcode arguments "make sense".2423 2424    + Explicit and implicit pop operations have enough items on the stack.2425 2426    + When an opcode implicitly refers to a markobject, a markobject is2427      actually on the stack.2428 2429    + A memo entry isn't referenced before it's defined.2430 2431    + The markobject isn't stored in the memo.2432 2433    + A memo entry isn't redefined.2434    """2435 2436    # Most of the hair here is for sanity checks, but most of it is needed2437    # anyway to detect when a protocol 0 POP takes a MARK off the stack2438    # (which in turn is needed to indent MARK blocks correctly).2439 2440    stack = []          # crude emulation of unpickler stack2441    if memo is None:2442        memo = {}       # crude emulation of unpickler memo2443    maxproto = -1       # max protocol number seen2444    markstack = []      # bytecode positions of MARK opcodes2445    indentchunk = ' ' * indentlevel2446    errormsg = None2447    annocol = annotate  # column hint for annotations2448    for opcode, arg, pos in genops(pickle):2449        if pos is not None:2450            print("%5d:" % pos, end=' ', file=out)2451 2452        line = "%-4s %s%s" % (repr(opcode.code)[1:-1],2453                              indentchunk * len(markstack),2454                              opcode.name)2455 2456        maxproto = max(maxproto, opcode.proto)2457        before = opcode.stack_before    # don't mutate2458        after = opcode.stack_after      # don't mutate2459        numtopop = len(before)2460 2461        # See whether a MARK should be popped.2462        markmsg = None2463        if markobject in before or (opcode.name == "POP" and2464                                    stack and2465                                    stack[-1] is markobject):2466            assert markobject not in after2467            if __debug__:2468                if markobject in before:2469                    assert before[-1] is stackslice2470            if markstack:2471                markpos = markstack.pop()2472                if markpos is None:2473                    markmsg = "(MARK at unknown opcode offset)"2474                else:2475                    markmsg = "(MARK at %d)" % markpos2476                # Pop everything at and after the topmost markobject.2477                while stack[-1] is not markobject:2478                    stack.pop()2479                stack.pop()2480                # Stop later code from popping too much.2481                try:2482                    numtopop = before.index(markobject)2483                except ValueError:2484                    assert opcode.name == "POP"2485                    numtopop = 02486            else:2487                errormsg = markmsg = "no MARK exists on stack"2488 2489        # Check for correct memo usage.2490        if opcode.name in ("PUT", "BINPUT", "LONG_BINPUT", "MEMOIZE"):2491            if opcode.name == "MEMOIZE":2492                memo_idx = len(memo)2493                markmsg = "(as %d)" % memo_idx2494            else:2495                assert arg is not None2496                memo_idx = arg2497            if memo_idx in memo:2498                errormsg = "memo key %r already defined" % arg2499            elif not stack:2500                errormsg = "stack is empty -- can't store into memo"2501            elif stack[-1] is markobject:2502                errormsg = "can't store markobject in the memo"2503            else:2504                memo[memo_idx] = stack[-1]2505        elif opcode.name in ("GET", "BINGET", "LONG_BINGET"):2506            if arg in memo:2507                assert len(after) == 12508                after = [memo[arg]]     # for better stack emulation2509            else:2510                errormsg = "memo key %r has never been stored into" % arg2511 2512        if arg is not None or markmsg:2513            # make a mild effort to align arguments2514            line += ' ' * (10 - len(opcode.name))2515            if arg is not None:2516                line += ' ' + repr(arg)2517            if markmsg:2518                line += ' ' + markmsg2519        if annotate:2520            line += ' ' * (annocol - len(line))2521            # make a mild effort to align annotations2522            annocol = len(line)2523            if annocol > 50:2524                annocol = annotate2525            line += ' ' + opcode.doc.split('\n', 1)[0]2526        print(line, file=out)2527 2528        if errormsg:2529            # Note that we delayed complaining until the offending opcode2530            # was printed.2531            raise ValueError(errormsg)2532 2533        # Emulate the stack effects.2534        if len(stack) < numtopop:2535            raise ValueError("tries to pop %d items from stack with "2536                             "only %d items" % (numtopop, len(stack)))2537        if numtopop:2538            del stack[-numtopop:]2539        if markobject in after:2540            assert markobject not in before2541            markstack.append(pos)2542 2543        stack.extend(after)2544 2545    print("highest protocol among opcodes =", maxproto, file=out)2546    if stack:2547        raise ValueError("stack not empty after STOP: %r" % stack)2548 2549# For use in the doctest, simply as an example of a class to pickle.2550class _Example:2551    def __init__(self, value):2552        self.value = value2553 2554_dis_test = r"""2555>>> import pickle2556>>> x = [1, 2, (3, 4), {b'abc': "def"}]2557>>> pkl0 = pickle.dumps(x, 0)2558>>> dis(pkl0)2559    0: (    MARK2560    1: l        LIST       (MARK at 0)2561    2: p    PUT        02562    5: I    INT        12563    8: a    APPEND2564    9: I    INT        22565   12: a    APPEND2566   13: (    MARK2567   14: I        INT        32568   17: I        INT        42569   20: t        TUPLE      (MARK at 13)2570   21: p    PUT        12571   24: a    APPEND2572   25: (    MARK2573   26: d        DICT       (MARK at 25)2574   27: p    PUT        22575   30: c    GLOBAL     '_codecs encode'2576   46: p    PUT        32577   49: (    MARK2578   50: V        UNICODE    'abc'2579   55: p        PUT        42580   58: V        UNICODE    'latin1'2581   66: p        PUT        52582   69: t        TUPLE      (MARK at 49)2583   70: p    PUT        62584   73: R    REDUCE2585   74: p    PUT        72586   77: V    UNICODE    'def'2587   82: p    PUT        82588   85: s    SETITEM2589   86: a    APPEND2590   87: .    STOP2591highest protocol among opcodes = 02592 2593Try again with a "binary" pickle.2594 2595>>> pkl1 = pickle.dumps(x, 1)2596>>> dis(pkl1)2597    0: ]    EMPTY_LIST2598    1: q    BINPUT     02599    3: (    MARK2600    4: K        BININT1    12601    6: K        BININT1    22602    8: (        MARK2603    9: K            BININT1    32604   11: K            BININT1    42605   13: t            TUPLE      (MARK at 8)2606   14: q        BINPUT     12607   16: }        EMPTY_DICT2608   17: q        BINPUT     22609   19: c        GLOBAL     '_codecs encode'2610   35: q        BINPUT     32611   37: (        MARK2612   38: X            BINUNICODE 'abc'2613   46: q            BINPUT     42614   48: X            BINUNICODE 'latin1'2615   59: q            BINPUT     52616   61: t            TUPLE      (MARK at 37)2617   62: q        BINPUT     62618   64: R        REDUCE2619   65: q        BINPUT     72620   67: X        BINUNICODE 'def'2621   75: q        BINPUT     82622   77: s        SETITEM2623   78: e        APPENDS    (MARK at 3)2624   79: .    STOP2625highest protocol among opcodes = 12626 2627Exercise the INST/OBJ/BUILD family.2628 2629>>> import pickletools2630>>> dis(pickle.dumps(pickletools.dis, 0))2631    0: c    GLOBAL     'pickletools dis'2632   17: p    PUT        02633   20: .    STOP2634highest protocol among opcodes = 02635 2636>>> from pickletools import _Example2637>>> x = [_Example(42)] * 22638>>> dis(pickle.dumps(x, 0))2639    0: (    MARK2640    1: l        LIST       (MARK at 0)2641    2: p    PUT        02642    5: c    GLOBAL     'copy_reg _reconstructor'2643   30: p    PUT        12644   33: (    MARK2645   34: c        GLOBAL     'pickletools _Example'2646   56: p        PUT        22647   59: c        GLOBAL     '__builtin__ object'2648   79: p        PUT        32649   82: N        NONE2650   83: t        TUPLE      (MARK at 33)2651   84: p    PUT        42652   87: R    REDUCE2653   88: p    PUT        52654   91: (    MARK2655   92: d        DICT       (MARK at 91)2656   93: p    PUT        62657   96: V    UNICODE    'value'2658  103: p    PUT        72659  106: I    INT        422660  110: s    SETITEM2661  111: b    BUILD2662  112: a    APPEND2663  113: g    GET        52664  116: a    APPEND2665  117: .    STOP2666highest protocol among opcodes = 02667 2668>>> dis(pickle.dumps(x, 1))2669    0: ]    EMPTY_LIST2670    1: q    BINPUT     02671    3: (    MARK2672    4: c        GLOBAL     'copy_reg _reconstructor'2673   29: q        BINPUT     12674   31: (        MARK2675   32: c            GLOBAL     'pickletools _Example'2676   54: q            BINPUT     22677   56: c            GLOBAL     '__builtin__ object'2678   76: q            BINPUT     32679   78: N            NONE2680   79: t            TUPLE      (MARK at 31)2681   80: q        BINPUT     42682   82: R        REDUCE2683   83: q        BINPUT     52684   85: }        EMPTY_DICT2685   86: q        BINPUT     62686   88: X        BINUNICODE 'value'2687   98: q        BINPUT     72688  100: K        BININT1    422689  102: s        SETITEM2690  103: b        BUILD2691  104: h        BINGET     52692  106: e        APPENDS    (MARK at 3)2693  107: .    STOP2694highest protocol among opcodes = 12695 2696Try "the canonical" recursive-object test.2697 2698>>> L = []2699>>> T = L,2700>>> L.append(T)2701>>> L[0] is T2702True2703>>> T[0] is L2704True2705>>> L[0][0] is L2706True2707>>> T[0][0] is T2708True2709>>> dis(pickle.dumps(L, 0))2710    0: (    MARK2711    1: l        LIST       (MARK at 0)2712    2: p    PUT        02713    5: (    MARK2714    6: g        GET        02715    9: t        TUPLE      (MARK at 5)2716   10: p    PUT        12717   13: a    APPEND2718   14: .    STOP2719highest protocol among opcodes = 02720 2721>>> dis(pickle.dumps(L, 1))2722    0: ]    EMPTY_LIST2723    1: q    BINPUT     02724    3: (    MARK2725    4: h        BINGET     02726    6: t        TUPLE      (MARK at 3)2727    7: q    BINPUT     12728    9: a    APPEND2729   10: .    STOP2730highest protocol among opcodes = 12731 2732Note that, in the protocol 0 pickle of the recursive tuple, the disassembler2733has to emulate the stack in order to realize that the POP opcode at 16 gets2734rid of the MARK at 0.2735 2736>>> dis(pickle.dumps(T, 0))2737    0: (    MARK2738    1: (        MARK2739    2: l            LIST       (MARK at 1)2740    3: p        PUT        02741    6: (        MARK2742    7: g            GET        02743   10: t            TUPLE      (MARK at 6)2744   11: p        PUT        12745   14: a        APPEND2746   15: 0        POP2747   16: 0        POP        (MARK at 0)2748   17: g    GET        12749   20: .    STOP2750highest protocol among opcodes = 02751 2752>>> dis(pickle.dumps(T, 1))2753    0: (    MARK2754    1: ]        EMPTY_LIST2755    2: q        BINPUT     02756    4: (        MARK2757    5: h            BINGET     02758    7: t            TUPLE      (MARK at 4)2759    8: q        BINPUT     12760   10: a        APPEND2761   11: 1        POP_MARK   (MARK at 0)2762   12: h    BINGET     12763   14: .    STOP2764highest protocol among opcodes = 12765 2766Try protocol 2.2767 2768>>> dis(pickle.dumps(L, 2))2769    0: \x80 PROTO      22770    2: ]    EMPTY_LIST2771    3: q    BINPUT     02772    5: h    BINGET     02773    7: \x85 TUPLE12774    8: q    BINPUT     12775   10: a    APPEND2776   11: .    STOP2777highest protocol among opcodes = 22778 2779>>> dis(pickle.dumps(T, 2))2780    0: \x80 PROTO      22781    2: ]    EMPTY_LIST2782    3: q    BINPUT     02783    5: h    BINGET     02784    7: \x85 TUPLE12785    8: q    BINPUT     12786   10: a    APPEND2787   11: 0    POP2788   12: h    BINGET     12789   14: .    STOP2790highest protocol among opcodes = 22791 2792Try protocol 3 with annotations:2793 2794>>> dis(pickle.dumps(T, 3), annotate=1)2795    0: \x80 PROTO      3 Protocol version indicator.2796    2: ]    EMPTY_LIST   Push an empty list.2797    3: q    BINPUT     0 Store the stack top into the memo.  The stack is not popped.2798    5: h    BINGET     0 Read an object from the memo and push it on the stack.2799    7: \x85 TUPLE1       Build a one-tuple out of the topmost item on the stack.2800    8: q    BINPUT     1 Store the stack top into the memo.  The stack is not popped.2801   10: a    APPEND       Append an object to a list.2802   11: 0    POP          Discard the top stack item, shrinking the stack by one item.2803   12: h    BINGET     1 Read an object from the memo and push it on the stack.2804   14: .    STOP         Stop the unpickling machine.2805highest protocol among opcodes = 22806 2807"""2808 2809_memo_test = r"""2810>>> import pickle2811>>> import io2812>>> f = io.BytesIO()2813>>> p = pickle.Pickler(f, 2)2814>>> x = [1, 2, 3]2815>>> p.dump(x)2816>>> p.dump(x)2817>>> f.seek(0)281802819>>> memo = {}2820>>> dis(f, memo=memo)2821    0: \x80 PROTO      22822    2: ]    EMPTY_LIST2823    3: q    BINPUT     02824    5: (    MARK2825    6: K        BININT1    12826    8: K        BININT1    22827   10: K        BININT1    32828   12: e        APPENDS    (MARK at 5)2829   13: .    STOP2830highest protocol among opcodes = 22831>>> dis(f, memo=memo)2832   14: \x80 PROTO      22833   16: h    BINGET     02834   18: .    STOP2835highest protocol among opcodes = 22836"""2837 2838__test__ = {'disassembler_test': _dis_test,2839            'disassembler_memo_test': _memo_test,2840           }2841 2842def _test():2843    import doctest2844    return doctest.testmod()2845 2846if __name__ == "__main__":2847    import argparse2848    parser = argparse.ArgumentParser(2849        description='disassemble one or more pickle files')2850    parser.add_argument(2851        'pickle_file', type=argparse.FileType('br'),2852        nargs='*', help='the pickle file')2853    parser.add_argument(2854        '-o', '--output', default=sys.stdout, type=argparse.FileType('w'),2855        help='the file where the output should be written')2856    parser.add_argument(2857        '-m', '--memo', action='store_true',2858        help='preserve memo between disassemblies')2859    parser.add_argument(2860        '-l', '--indentlevel', default=4, type=int,2861        help='the number of blanks by which to indent a new MARK level')2862    parser.add_argument(2863        '-a', '--annotate',  action='store_true',2864        help='annotate each line with a short opcode description')2865    parser.add_argument(2866        '-p', '--preamble', default="==> {name} <==",2867        help='if more than one pickle file is specified, print this before'2868        ' each disassembly')2869    parser.add_argument(2870        '-t', '--test', action='store_true',2871        help='run self-test suite')2872    parser.add_argument(2873        '-v', action='store_true',2874        help='run verbosely; only affects self-test run')2875    args = parser.parse_args()2876    if args.test:2877        _test()2878    else:2879        annotate = 30 if args.annotate else 02880        if not args.pickle_file:2881            parser.print_help()2882        elif len(args.pickle_file) == 1:2883            dis(args.pickle_file[0], args.output, None,2884                args.indentlevel, annotate)2885        else:2886            memo = {} if args.memo else None2887            for f in args.pickle_file:2888                preamble = args.preamble.format(name=f.name)2889                args.output.write(preamble + '\n')2890                dis(f, args.output, memo, args.indentlevel, annotate)2891