Skip to content

Commit

Permalink
Merging the py3k-pep3137 branch back into the py3k branch.
Browse files Browse the repository at this point in the history
No detailed change log; just check out the change log for the py3k-pep3137
branch.  The most obvious changes:

  - str8 renamed to bytes (PyString at the C level);
  - bytes renamed to buffer (PyBytes at the C level);
  - PyString and PyUnicode are no longer compatible.

I.e. we now have an immutable bytes type and a mutable bytes type.

The behavior of PyString was modified quite a bit, to make it more
bytes-like.  Some changes are still on the to-do list.
  • Loading branch information
gvanrossum committed Nov 6, 2007
1 parent a19f80c commit 98297ee
Show file tree
Hide file tree
Showing 148 changed files with 2,528 additions and 3,512 deletions.
9 changes: 7 additions & 2 deletions Doc/library/array.rst
Original file line number Diff line number Diff line change
Expand Up @@ -56,8 +56,9 @@ The module defines the following type:
.. function:: array(typecode[, initializer])

Return a new array whose items are restricted by *typecode*, and initialized
from the optional *initializer* value, which must be a list, string, or iterable
over elements of the appropriate type.
from the optional *initializer* value, which must be a list, object
supporting the buffer interface, or iterable over elements of the
appropriate type.

If given a list or string, the initializer is passed to the new array's
:meth:`fromlist`, :meth:`fromstring`, or :meth:`fromunicode` method (see below)
Expand All @@ -69,6 +70,10 @@ The module defines the following type:

Obsolete alias for :func:`array`.

.. data:: typecodes

A string with all available type codes.

Array objects support the ordinary sequence operations of indexing, slicing,
concatenation, and multiplication. When using slice assignment, the assigned
value must be an array object with the same type code; in all other cases,
Expand Down
6 changes: 5 additions & 1 deletion Doc/library/exceptions.rst
Original file line number Diff line number Diff line change
Expand Up @@ -405,7 +405,11 @@ module for more information.

Base class for warnings related to Unicode.

The class hierarchy for built-in exceptions is:
.. exception:: BytesWarning

Base class for warnings related to :class:`bytes` and :class:`buffer`.


The class hierarchy for built-in exceptions is:

.. literalinclude:: ../../Lib/test/exception_hierarchy.txt
27 changes: 20 additions & 7 deletions Doc/library/functions.rst
Original file line number Diff line number Diff line change
Expand Up @@ -118,31 +118,44 @@ available. They are listed here in alphabetical order.
.. index:: pair: Boolean; type


.. function:: bytes([arg[, encoding[, errors]]])
.. function:: buffer([arg[, encoding[, errors]]])

Return a new array of bytes. The :class:`bytes` type is a mutable sequence
Return a new array of bytes. The :class:`buffer` type is an immutable sequence
of integers in the range 0 <= x < 256. It has most of the usual methods of
mutable sequences, described in :ref:`typesseq-mutable`, as well as a few
methods borrowed from strings, described in :ref:`bytes-methods`.
mutable sequences, described in :ref:`typesseq-mutable`, as well as most methods
that the :class:`str` type has, see :ref:`bytes-methods`.

The optional *arg* parameter can be used to initialize the array in a few
different ways:

* If it is a *string*, you must also give the *encoding* (and optionally,
*errors*) parameters; :func:`bytes` then acts like :meth:`str.encode`.
*errors*) parameters; :func:`buffer` then converts the Unicode string to
bytes using :meth:`str.encode`.

* If it is an *integer*, the array will have that size and will be
initialized with null bytes.

* If it is an object conforming to the *buffer* interface, a read-only buffer
of the object will be used to initialize the bytes array.

* If it is an *iterable*, it must be an iterable of integers in the range 0
<= x < 256, which are used as the initial contents of the array.
* If it is an *iterable*, it must be an iterable of integers in the range
``0 <= x < 256``, which are used as the initial contents of the array.

Without an argument, an array of size 0 is created.


.. function:: bytes([arg[, encoding[, errors]]])

Return a new "bytes" object, which is an immutable sequence of integers in
the range ``0 <= x < 256``. :class:`bytes` is an immutable version of
:class:`buffer` -- it has the same non-mutating methods and the same indexing
and slicing behavior.

Accordingly, constructor arguments are interpreted as for :func:`buffer`.

Bytes objects can also be created with literals, see :ref:`strings`.


.. function:: chr(i)

Return the string of one character whose Unicode codepoint is the integer
Expand Down
8 changes: 5 additions & 3 deletions Doc/library/stdtypes.rst
Original file line number Diff line number Diff line change
Expand Up @@ -1313,9 +1313,11 @@ Bytes and Buffer Methods

Bytes and buffer objects, being "strings of bytes", have all methods found on
strings, with the exception of :func:`encode`, :func:`format` and
:func:`isidentifier`, which do not make sense with these types. Wherever one of
these methods needs to interpret the bytes as characters (e.g. the :func:`is...`
methods), the ASCII character set is assumed.
:func:`isidentifier`, which do not make sense with these types. For converting
the objects to strings, they have a :func:`decode` method.

Wherever one of these methods needs to interpret the bytes as characters
(e.g. the :func:`is...` methods), the ASCII character set is assumed.

.. note::

Expand Down
4 changes: 4 additions & 0 deletions Doc/library/warnings.rst
Original file line number Diff line number Diff line change
Expand Up @@ -80,6 +80,10 @@ following warnings category classes are currently defined:
| :exc:`UnicodeWarning` | Base category for warnings related to |
| | Unicode. |
+----------------------------------+-----------------------------------------------+
| :exc:`BytesWarning` | Base category for warnings related to |
| | :class:`bytes` and :class:`buffer`. |
+----------------------------------+-----------------------------------------------+


While these are technically built-in exceptions, they are documented here,
because conceptually they belong to the warnings mechanism.
Expand Down
11 changes: 2 additions & 9 deletions Doc/whatsnew/3.0.rst
Original file line number Diff line number Diff line change
Expand Up @@ -131,11 +131,6 @@ changes to rarely used features.)
that if a file is opened using an incorrect mode or encoding, I/O
will likely fail.

* Bytes aren't hashable, and don't support certain operations like
``b.lower()``, ``b.strip()`` or ``b.split()``.
For the latter two, use ``b.strip(b" \t\r\n\f")`` or
``b.split(b" \t\r\n\f")``.

* ``map()`` and ``filter()`` return iterators. A quick fix is e.g.
``list(map(...))``, but a better fix is often to use a list
comprehension (especially when the original code uses ``lambda``).
Expand All @@ -158,13 +153,11 @@ Strings and Bytes
* There is only one string type; its name is ``str`` but its behavior
and implementation are more like ``unicode`` in 2.x.

* PEP 358: There is a new type, ``bytes``, to represent binary data
* PEP 3137: There is a new type, ``bytes``, to represent binary data
(and encoded text, which is treated as binary data until you decide
to decode it). The ``str`` and ``bytes`` types cannot be mixed; you
must always explicitly convert between them, using the ``.encode()``
(str -> bytes) or ``.decode()`` (bytes -> str) methods. Comparing a
bytes and a str instance for equality raises a TypeError; this
catches common mistakes.
(str -> bytes) or ``.decode()`` (bytes -> str) methods.

* PEP 3112: Bytes literals. E.g. b"abc".

Expand Down
17 changes: 2 additions & 15 deletions Include/abstract.h
Original file line number Diff line number Diff line change
Expand Up @@ -259,7 +259,7 @@ xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx*/
string representation on success, NULL on failure. This is
the equivalent of the Python expression: repr(o).
Called by the repr() built-in function and by reverse quotes.
Called by the repr() built-in function.
*/

Expand All @@ -271,20 +271,7 @@ xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx*/
string representation on success, NULL on failure. This is
the equivalent of the Python expression: str(o).)
Called by the str() built-in function and by the print
statement.
*/

/* Implemented elsewhere:
PyObject *PyObject_Unicode(PyObject *o);
Compute the unicode representation of object, o. Returns the
unicode representation on success, NULL on failure. This is
the equivalent of the Python expression: unistr(o).)
Called by the unistr() built-in function.
Called by the str() and print() built-in functions.
*/

Expand Down
6 changes: 2 additions & 4 deletions Include/object.h
Original file line number Diff line number Diff line change
Expand Up @@ -431,10 +431,8 @@ PyAPI_FUNC(int) PyObject_Print(PyObject *, FILE *, int);
PyAPI_FUNC(void) _Py_BreakPoint(void);
PyAPI_FUNC(void) _PyObject_Dump(PyObject *);
PyAPI_FUNC(PyObject *) PyObject_Repr(PyObject *);
PyAPI_FUNC(PyObject *) PyObject_ReprStr8(PyObject *);
PyAPI_FUNC(PyObject *) _PyObject_Str(PyObject *);
PyAPI_FUNC(PyObject *) PyObject_Str(PyObject *);
PyAPI_FUNC(PyObject *) PyObject_Unicode(PyObject *);
#define PyObject_Unicode PyObject_Str /* Compatibility */
PyAPI_FUNC(int) PyObject_Compare(PyObject *, PyObject *);
PyAPI_FUNC(PyObject *) PyObject_RichCompare(PyObject *, PyObject *, int);
PyAPI_FUNC(int) PyObject_RichCompareBool(PyObject *, PyObject *, int);
Expand Down Expand Up @@ -478,7 +476,7 @@ PyAPI_FUNC(long) _Py_HashDouble(double);
PyAPI_FUNC(long) _Py_HashPointer(void*);

/* Helper for passing objects to printf and the like */
#define PyObject_REPR(obj) PyString_AS_STRING(PyObject_ReprStr8(obj))
#define PyObject_REPR(obj) PyUnicode_AsString(PyObject_Repr(obj))

/* Flag bits for printing: */
#define Py_PRINT_RAW 1 /* No string quotes etc. */
Expand Down
2 changes: 1 addition & 1 deletion Include/opcode.h
Original file line number Diff line number Diff line change
Expand Up @@ -65,7 +65,7 @@ extern "C" {

#define RETURN_VALUE 83
#define IMPORT_STAR 84
#define MAKE_BYTES 85

#define YIELD_VALUE 86
#define POP_BLOCK 87
#define END_FINALLY 88
Expand Down
1 change: 1 addition & 0 deletions Include/pydebug.h
Original file line number Diff line number Diff line change
Expand Up @@ -11,6 +11,7 @@ PyAPI_DATA(int) Py_InteractiveFlag;
PyAPI_DATA(int) Py_InspectFlag;
PyAPI_DATA(int) Py_OptimizeFlag;
PyAPI_DATA(int) Py_NoSiteFlag;
PyAPI_DATA(int) Py_BytesWarningFlag;
PyAPI_DATA(int) Py_UseClassExceptionsFlag;
PyAPI_DATA(int) Py_FrozenFlag;
PyAPI_DATA(int) Py_TabcheckFlag;
Expand Down
1 change: 1 addition & 0 deletions Include/pyerrors.h
Original file line number Diff line number Diff line change
Expand Up @@ -165,6 +165,7 @@ PyAPI_DATA(PyObject *) PyExc_RuntimeWarning;
PyAPI_DATA(PyObject *) PyExc_FutureWarning;
PyAPI_DATA(PyObject *) PyExc_ImportWarning;
PyAPI_DATA(PyObject *) PyExc_UnicodeWarning;
PyAPI_DATA(PyObject *) PyExc_BytesWarning;


/* Convenience functions */
Expand Down
87 changes: 9 additions & 78 deletions Include/stringobject.h
Original file line number Diff line number Diff line change
Expand Up @@ -25,26 +25,17 @@ functions should be applied to nil objects.
*/

/* Caching the hash (ob_shash) saves recalculation of a string's hash value.
Interning strings (ob_sstate) tries to ensure that only one string
object with a given value exists, so equality tests can be one pointer
comparison. This is generally restricted to strings that "look like"
Python identifiers, although the sys.intern() function can be used to force
interning of any string.
Together, these sped the interpreter by up to 20%. */
This significantly speeds up dict lookups. */

typedef struct {
PyObject_VAR_HEAD
long ob_shash;
int ob_sstate;
char ob_sval[1];

/* Invariants:
* ob_sval contains space for 'ob_size+1' elements.
* ob_sval[ob_size] == 0.
* ob_shash is the hash of the string or -1 if not computed yet.
* ob_sstate != 0 iff the string object is in stringobject.c's
* 'interned' dictionary; in this case the two references
* from 'interned' to this object are *not counted* in ob_refcnt.
*/
} PyStringObject;

Expand Down Expand Up @@ -74,86 +65,20 @@ PyAPI_FUNC(PyObject *) PyString_DecodeEscape(const char *, Py_ssize_t,
const char *, Py_ssize_t,
const char *);

PyAPI_FUNC(void) PyString_InternInPlace(PyObject **);
PyAPI_FUNC(void) PyString_InternImmortal(PyObject **);
PyAPI_FUNC(PyObject *) PyString_InternFromString(const char *);
PyAPI_FUNC(void) _Py_ReleaseInternedStrings(void);

/* Use only if you know it's a string */
#define PyString_CHECK_INTERNED(op) (((PyStringObject *)(op))->ob_sstate)

/* Macro, trading safety for speed */
#define PyString_AS_STRING(op) (assert(PyString_Check(op)),(((PyStringObject *)(op))->ob_sval))
#define PyString_AS_STRING(op) (assert(PyString_Check(op)), \
(((PyStringObject *)(op))->ob_sval))
#define PyString_GET_SIZE(op) (assert(PyString_Check(op)),Py_Size(op))

/* _PyString_Join(sep, x) is like sep.join(x). sep must be PyStringObject*,
x must be an iterable object. */
PyAPI_FUNC(PyObject *) _PyString_Join(PyObject *sep, PyObject *x);

/* --- Generic Codecs ----------------------------------------------------- */

/* Create an object by decoding the encoded string s of the
given size. */

PyAPI_FUNC(PyObject*) PyString_Decode(
const char *s, /* encoded string */
Py_ssize_t size, /* size of buffer */
const char *encoding, /* encoding */
const char *errors /* error handling */
);

/* Encodes a string object and returns the result as Python
object. */

PyAPI_FUNC(PyObject*) PyString_AsEncodedObject(
PyObject *str, /* string object */
const char *encoding, /* encoding */
const char *errors /* error handling */
);

/* Encodes a string object and returns the result as Python string
object.
If the codec returns an Unicode object, the object is converted
back to a string using the default encoding.
DEPRECATED - use PyString_AsEncodedObject() instead. */

PyAPI_FUNC(PyObject*) PyString_AsEncodedString(
PyObject *str, /* string object */
const char *encoding, /* encoding */
const char *errors /* error handling */
);

/* Decodes a string object and returns the result as Python
object. */

PyAPI_FUNC(PyObject*) PyString_AsDecodedObject(
PyObject *str, /* string object */
const char *encoding, /* encoding */
const char *errors /* error handling */
);

/* Decodes a string object and returns the result as Python string
object.
If the codec returns an Unicode object, the object is converted
back to a string using the default encoding.
DEPRECATED - use PyString_AsDecodedObject() instead. */

PyAPI_FUNC(PyObject*) PyString_AsDecodedString(
PyObject *str, /* string object */
const char *encoding, /* encoding */
const char *errors /* error handling */
);

/* Provides access to the internal data buffer and size of a string
object or the default encoded version of an Unicode object. Passing
NULL as *len parameter will force the string buffer to be
0-terminated (passing a string with embedded NULL characters will
cause an exception). */

PyAPI_FUNC(int) PyString_AsStringAndSize(
register PyObject *obj, /* string or Unicode object */
register char **s, /* pointer to buffer variable */
Expand All @@ -162,6 +87,12 @@ PyAPI_FUNC(int) PyString_AsStringAndSize(
strings) */
);

/* Flags used by string formatting */
#define F_LJUST (1<<0)
#define F_SIGN (1<<1)
#define F_BLANK (1<<2)
#define F_ALT (1<<3)
#define F_ZERO (1<<4)

#ifdef __cplusplus
}
Expand Down
2 changes: 1 addition & 1 deletion Lib/_abcoll.py
Original file line number Diff line number Diff line change
Expand Up @@ -489,7 +489,7 @@ def count(self, value):

Sequence.register(tuple)
Sequence.register(str)
Sequence.register(str8)
Sequence.register(bytes)
Sequence.register(memoryview)


Expand Down
Loading

0 comments on commit 98297ee

Please sign in to comment.