Merging the py3k-pep3137 branch back into the py3k branch.

No detailed change log; just check out the change log for the py3k-pep3137 branch. The most obvious changes: - str8 renamed to bytes (PyString at the C level); - bytes renamed to buffer (PyBytes at the C level); - PyString and PyUnicode are no longer compatible. I.e. we now have an immutable bytes type and a mutable bytes type. The behavior of PyString was modified quite a bit, to make it more bytes-like. Some changes are still on the to-do list.
pablogsal · Nov 6, 2007 · 98297ee · 98297ee
1 parent a19f80c
commit 98297ee
Show file tree

Hide file tree

Showing 148 changed files with 2,528 additions and 3,512 deletions.
diff --git a/Doc/library/array.rst b/Doc/library/array.rst
@@ -56,8 +56,9 @@ The module defines the following type:
 .. function:: array(typecode[, initializer])
 
    Return a new array whose items are restricted by *typecode*, and initialized
-   from the optional *initializer* value, which must be a list, string, or iterable
-   over elements of the appropriate type.
+   from the optional *initializer* value, which must be a list, object
+   supporting the buffer interface, or iterable over elements of the 
+   appropriate type.
 
    If given a list or string, the initializer is passed to the new array's
    :meth:`fromlist`, :meth:`fromstring`, or :meth:`fromunicode` method (see below)
@@ -69,6 +70,10 @@ The module defines the following type:
 
    Obsolete alias for :func:`array`.
 
+.. data:: typecodes
+
+   A string with all available type codes.
+
 Array objects support the ordinary sequence operations of indexing, slicing,
 concatenation, and multiplication.  When using slice assignment, the assigned
 value must be an array object with the same type code; in all other cases,

diff --git a/Doc/library/exceptions.rst b/Doc/library/exceptions.rst
@@ -405,7 +405,11 @@ module for more information.
 
    Base class for warnings related to Unicode.
 
-The class hierarchy for built-in exceptions is:
+.. exception:: BytesWarning
+
+   Base class for warnings related to :class:`bytes` and :class:`buffer`.
 
 
+The class hierarchy for built-in exceptions is:
+
 .. literalinclude:: ../../Lib/test/exception_hierarchy.txt
diff --git a/Doc/library/functions.rst b/Doc/library/functions.rst
@@ -118,31 +118,44 @@ available.  They are listed here in alphabetical order.
    .. index:: pair: Boolean; type
 
 
-.. function:: bytes([arg[, encoding[, errors]]])
+.. function:: buffer([arg[, encoding[, errors]]])
 
-   Return a new array of bytes.  The :class:`bytes` type is a mutable sequence
+   Return a new array of bytes.  The :class:`buffer` type is an immutable sequence
    of integers in the range 0 <= x < 256.  It has most of the usual methods of
-   mutable sequences, described in :ref:`typesseq-mutable`, as well as a few
-   methods borrowed from strings, described in :ref:`bytes-methods`.
+   mutable sequences, described in :ref:`typesseq-mutable`, as well as most methods
+   that the :class:`str` type has, see :ref:`bytes-methods`.
 
    The optional *arg* parameter can be used to initialize the array in a few
    different ways:
 
    * If it is a *string*, you must also give the *encoding* (and optionally,
-     *errors*) parameters; :func:`bytes` then acts like :meth:`str.encode`.
+     *errors*) parameters; :func:`buffer` then converts the Unicode string to
+     bytes using :meth:`str.encode`.
 
    * If it is an *integer*, the array will have that size and will be
      initialized with null bytes.
 
    * If it is an object conforming to the *buffer* interface, a read-only buffer
      of the object will be used to initialize the bytes array.
 
-   * If it is an *iterable*, it must be an iterable of integers in the range 0
-     <= x < 256, which are used as the initial contents of the array.
+   * If it is an *iterable*, it must be an iterable of integers in the range
+     ``0 <= x < 256``, which are used as the initial contents of the array.
 
    Without an argument, an array of size 0 is created.
 
 
+.. function:: bytes([arg[, encoding[, errors]]])
+
+   Return a new "bytes" object, which is an immutable sequence of integers in
+   the range ``0 <= x < 256``.  :class:`bytes` is an immutable version of
+   :class:`buffer` -- it has the same non-mutating methods and the same indexing
+   and slicing behavior.
+
+   Accordingly, constructor arguments are interpreted as for :func:`buffer`.
+
+   Bytes objects can also be created with literals, see :ref:`strings`.
+
+
 .. function:: chr(i)
 
    Return the string of one character whose Unicode codepoint is the integer

diff --git a/Doc/library/stdtypes.rst b/Doc/library/stdtypes.rst
@@ -1313,9 +1313,11 @@ Bytes and Buffer Methods
 
 Bytes and buffer objects, being "strings of bytes", have all methods found on
 strings, with the exception of :func:`encode`, :func:`format` and
-:func:`isidentifier`, which do not make sense with these types.  Wherever one of
-these methods needs to interpret the bytes as characters (e.g. the :func:`is...`
-methods), the ASCII character set is assumed.
+:func:`isidentifier`, which do not make sense with these types.  For converting
+the objects to strings, they have a :func:`decode` method.
+
+Wherever one of these methods needs to interpret the bytes as characters
+(e.g. the :func:`is...` methods), the ASCII character set is assumed.
 
 .. note::
 

diff --git a/Doc/library/warnings.rst b/Doc/library/warnings.rst
@@ -80,6 +80,10 @@ following warnings category classes are currently defined:
 | :exc:`UnicodeWarning`            | Base category for warnings related to         |
 |                                  | Unicode.                                      |
 +----------------------------------+-----------------------------------------------+
+| :exc:`BytesWarning`              | Base category for warnings related to         |
+|                                  | :class:`bytes` and :class:`buffer`.           |
++----------------------------------+-----------------------------------------------+
+
 
 While these are technically built-in exceptions, they are documented here,
 because conceptually they belong to the warnings mechanism.

diff --git a/Doc/whatsnew/3.0.rst b/Doc/whatsnew/3.0.rst
@@ -131,11 +131,6 @@ changes to rarely used features.)
   that if a file is opened using an incorrect mode or encoding, I/O
   will likely fail.
 
-* Bytes aren't hashable, and don't support certain operations like
-  ``b.lower()``, ``b.strip()`` or ``b.split()``.
-  For the latter two, use ``b.strip(b" \t\r\n\f")`` or
-  ``b.split(b" \t\r\n\f")``.
-
 * ``map()`` and ``filter()`` return iterators.  A quick fix is e.g.
   ``list(map(...))``, but a better fix is often to use a list
   comprehension (especially when the original code uses ``lambda``).
@@ -158,13 +153,11 @@ Strings and Bytes
 * There is only one string type; its name is ``str`` but its behavior
   and implementation are more like ``unicode`` in 2.x.
 
-* PEP 358: There is a new type, ``bytes``, to represent binary data
+* PEP 3137: There is a new type, ``bytes``, to represent binary data
   (and encoded text, which is treated as binary data until you decide
   to decode it).  The ``str`` and ``bytes`` types cannot be mixed; you
   must always explicitly convert between them, using the ``.encode()``
-  (str -> bytes) or ``.decode()`` (bytes -> str) methods.  Comparing a
-  bytes and a str instance for equality raises a TypeError; this
-  catches common mistakes.
+  (str -> bytes) or ``.decode()`` (bytes -> str) methods.
 
 * PEP 3112: Bytes literals.  E.g. b"abc".
 

diff --git a/Include/abstract.h b/Include/abstract.h
@@ -259,7 +259,7 @@ xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx*/
 	 string representation on success, NULL on failure.  This is
 	 the equivalent of the Python expression: repr(o).
 
-	 Called by the repr() built-in function and by reverse quotes.
+	 Called by the repr() built-in function.
 
        */
 
@@ -271,20 +271,7 @@ xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx*/
 	 string representation on success, NULL on failure.  This is
 	 the equivalent of the Python expression: str(o).)
 
-	 Called by the str() built-in function and by the print
-	 statement.
-
-       */
-
-     /* Implemented elsewhere:
-
-     PyObject *PyObject_Unicode(PyObject *o);
-
-	 Compute the unicode representation of object, o.  Returns the
-	 unicode representation on success, NULL on failure.  This is
-	 the equivalent of the Python expression: unistr(o).)
-
-	 Called by the unistr() built-in function.
+	 Called by the str() and print() built-in functions.
 
        */
 

diff --git a/Include/object.h b/Include/object.h
@@ -431,10 +431,8 @@ PyAPI_FUNC(int) PyObject_Print(PyObject *, FILE *, int);
 PyAPI_FUNC(void) _Py_BreakPoint(void);
 PyAPI_FUNC(void) _PyObject_Dump(PyObject *);
 PyAPI_FUNC(PyObject *) PyObject_Repr(PyObject *);
-PyAPI_FUNC(PyObject *) PyObject_ReprStr8(PyObject *);
-PyAPI_FUNC(PyObject *) _PyObject_Str(PyObject *);
 PyAPI_FUNC(PyObject *) PyObject_Str(PyObject *);
-PyAPI_FUNC(PyObject *) PyObject_Unicode(PyObject *);
+#define PyObject_Unicode PyObject_Str /* Compatibility */
 PyAPI_FUNC(int) PyObject_Compare(PyObject *, PyObject *);
 PyAPI_FUNC(PyObject *) PyObject_RichCompare(PyObject *, PyObject *, int);
 PyAPI_FUNC(int) PyObject_RichCompareBool(PyObject *, PyObject *, int);
@@ -478,7 +476,7 @@ PyAPI_FUNC(long) _Py_HashDouble(double);
 PyAPI_FUNC(long) _Py_HashPointer(void*);
 
 /* Helper for passing objects to printf and the like */
-#define PyObject_REPR(obj) PyString_AS_STRING(PyObject_ReprStr8(obj))
+#define PyObject_REPR(obj) PyUnicode_AsString(PyObject_Repr(obj))
 
 /* Flag bits for printing: */
 #define Py_PRINT_RAW	1	/* No string quotes etc. */

diff --git a/Include/opcode.h b/Include/opcode.h
@@ -65,7 +65,7 @@ extern "C" {
 
 #define RETURN_VALUE	83
 #define IMPORT_STAR	84
-#define MAKE_BYTES	85
+
 #define YIELD_VALUE	86
 #define POP_BLOCK	87
 #define END_FINALLY	88

diff --git a/Include/pydebug.h b/Include/pydebug.h
@@ -11,6 +11,7 @@ PyAPI_DATA(int) Py_InteractiveFlag;
 PyAPI_DATA(int) Py_InspectFlag;
 PyAPI_DATA(int) Py_OptimizeFlag;
 PyAPI_DATA(int) Py_NoSiteFlag;
+PyAPI_DATA(int) Py_BytesWarningFlag;
 PyAPI_DATA(int) Py_UseClassExceptionsFlag;
 PyAPI_DATA(int) Py_FrozenFlag;
 PyAPI_DATA(int) Py_TabcheckFlag;

diff --git a/Include/pyerrors.h b/Include/pyerrors.h
@@ -165,6 +165,7 @@ PyAPI_DATA(PyObject *) PyExc_RuntimeWarning;
 PyAPI_DATA(PyObject *) PyExc_FutureWarning;
 PyAPI_DATA(PyObject *) PyExc_ImportWarning;
 PyAPI_DATA(PyObject *) PyExc_UnicodeWarning;
+PyAPI_DATA(PyObject *) PyExc_BytesWarning;
 
 
 /* Convenience functions */

diff --git a/Include/stringobject.h b/Include/stringobject.h
@@ -25,26 +25,17 @@ functions should be applied to nil objects.
 */
 
 /* Caching the hash (ob_shash) saves recalculation of a string's hash value.
-   Interning strings (ob_sstate) tries to ensure that only one string
-   object with a given value exists, so equality tests can be one pointer
-   comparison.  This is generally restricted to strings that "look like"
-   Python identifiers, although the sys.intern() function can be used to force
-   interning of any string.
-   Together, these sped the interpreter by up to 20%. */
+   This significantly speeds up dict lookups. */
 
 typedef struct {
     PyObject_VAR_HEAD
     long ob_shash;
-    int ob_sstate;
     char ob_sval[1];
 
     /* Invariants:
      *     ob_sval contains space for 'ob_size+1' elements.
      *     ob_sval[ob_size] == 0.
      *     ob_shash is the hash of the string or -1 if not computed yet.
-     *     ob_sstate != 0 iff the string object is in stringobject.c's
-     *       'interned' dictionary; in this case the two references
-     *       from 'interned' to this object are *not counted* in ob_refcnt.
      */
 } PyStringObject;
 
@@ -74,86 +65,20 @@ PyAPI_FUNC(PyObject *) PyString_DecodeEscape(const char *, Py_ssize_t,
 						   const char *, Py_ssize_t,
 						   const char *);
 
-PyAPI_FUNC(void) PyString_InternInPlace(PyObject **);
-PyAPI_FUNC(void) PyString_InternImmortal(PyObject **);
-PyAPI_FUNC(PyObject *) PyString_InternFromString(const char *);
-PyAPI_FUNC(void) _Py_ReleaseInternedStrings(void);
-
-/* Use only if you know it's a string */
-#define PyString_CHECK_INTERNED(op) (((PyStringObject *)(op))->ob_sstate)
-
 /* Macro, trading safety for speed */
-#define PyString_AS_STRING(op) (assert(PyString_Check(op)),(((PyStringObject *)(op))->ob_sval))
+#define PyString_AS_STRING(op) (assert(PyString_Check(op)), \
+                                (((PyStringObject *)(op))->ob_sval))
 #define PyString_GET_SIZE(op)  (assert(PyString_Check(op)),Py_Size(op))
 
 /* _PyString_Join(sep, x) is like sep.join(x).  sep must be PyStringObject*,
    x must be an iterable object. */
 PyAPI_FUNC(PyObject *) _PyString_Join(PyObject *sep, PyObject *x);
 
-/* --- Generic Codecs ----------------------------------------------------- */
-
-/* Create an object by decoding the encoded string s of the
-   given size. */
-
-PyAPI_FUNC(PyObject*) PyString_Decode(
-    const char *s,              /* encoded string */
-    Py_ssize_t size,            /* size of buffer */
-    const char *encoding,       /* encoding */
-    const char *errors          /* error handling */
-    );
-
-/* Encodes a string object and returns the result as Python
-   object. */
-
-PyAPI_FUNC(PyObject*) PyString_AsEncodedObject(
-    PyObject *str,	 	/* string object */
-    const char *encoding,	/* encoding */
-    const char *errors		/* error handling */
-    );
-
-/* Encodes a string object and returns the result as Python string
-   object.
-
-   If the codec returns an Unicode object, the object is converted
-   back to a string using the default encoding.
-
-   DEPRECATED - use PyString_AsEncodedObject() instead. */
-
-PyAPI_FUNC(PyObject*) PyString_AsEncodedString(
-    PyObject *str,	 	/* string object */
-    const char *encoding,	/* encoding */
-    const char *errors		/* error handling */
-    );
-
-/* Decodes a string object and returns the result as Python
-   object. */
-
-PyAPI_FUNC(PyObject*) PyString_AsDecodedObject(
-    PyObject *str,	 	/* string object */
-    const char *encoding,	/* encoding */
-    const char *errors		/* error handling */
-    );
-
-/* Decodes a string object and returns the result as Python string
-   object.
-
-   If the codec returns an Unicode object, the object is converted
-   back to a string using the default encoding.
-
-   DEPRECATED - use PyString_AsDecodedObject() instead. */
-
-PyAPI_FUNC(PyObject*) PyString_AsDecodedString(
-    PyObject *str,	 	/* string object */
-    const char *encoding,	/* encoding */
-    const char *errors		/* error handling */
-    );
-
 /* Provides access to the internal data buffer and size of a string
    object or the default encoded version of an Unicode object. Passing
    NULL as *len parameter will force the string buffer to be
    0-terminated (passing a string with embedded NULL characters will
    cause an exception).  */
-
 PyAPI_FUNC(int) PyString_AsStringAndSize(
     register PyObject *obj,	/* string or Unicode object */
     register char **s,		/* pointer to buffer variable */
@@ -162,6 +87,12 @@ PyAPI_FUNC(int) PyString_AsStringAndSize(
 				   strings) */
     );
 
+/* Flags used by string formatting */
+#define F_LJUST (1<<0)
+#define F_SIGN	(1<<1)
+#define F_BLANK (1<<2)
+#define F_ALT	(1<<3)
+#define F_ZERO	(1<<4)
 
 #ifdef __cplusplus
 }

diff --git a/Lib/_abcoll.py b/Lib/_abcoll.py
@@ -489,7 +489,7 @@ def count(self, value):
 
 Sequence.register(tuple)
 Sequence.register(str)
-Sequence.register(str8)
+Sequence.register(bytes)
 Sequence.register(memoryview)