Issue #15984: Merge PyUnicode doc from 3.5

This commit is contained in:
Martin Panter 2016-04-15 02:27:11 +00:00
commit cda80940ed
35 changed files with 64 additions and 70 deletions

View file

@ -285,7 +285,7 @@ an error value).
.. c:function:: int PyErr_WarnEx(PyObject *category, const char *message, Py_ssize_t stack_level)
Issue a warning message. The *category* argument is a warning category (see
below) or *NULL*; the *message* argument is an UTF-8 encoded string. *stack_level* is a
below) or *NULL*; the *message* argument is a UTF-8 encoded string. *stack_level* is a
positive number giving a number of stack frames; the warning will be issued from
the currently executing line of code in that stack frame. A *stack_level* of 1
is the function calling :c:func:`PyErr_WarnEx`, 2 is the function above that,
@ -617,7 +617,7 @@ The following functions are used to create and modify Unicode exceptions from C.
.. c:function:: PyObject* PyUnicodeTranslateError_Create(const Py_UNICODE *object, Py_ssize_t length, Py_ssize_t start, Py_ssize_t end, const char *reason)
Create a :class:`UnicodeTranslateError` object with the attributes *object*,
*length*, *start*, *end* and *reason*. *reason* is an UTF-8 encoded string.
*length*, *start*, *end* and *reason*. *reason* is a UTF-8 encoded string.
.. c:function:: PyObject* PyUnicodeDecodeError_GetEncoding(PyObject *exc)
PyObject* PyUnicodeEncodeError_GetEncoding(PyObject *exc)

View file

@ -72,7 +72,7 @@ Importing Modules
.. c:function:: PyObject* PyImport_ImportModuleLevel(const char *name, PyObject *globals, PyObject *locals, PyObject *fromlist, int level)
Similar to :c:func:`PyImport_ImportModuleLevelObject`, but the name is an
Similar to :c:func:`PyImport_ImportModuleLevelObject`, but the name is a
UTF-8 encoded string instead of a Unicode object.
.. versionchanged:: 3.3

View file

@ -49,7 +49,7 @@ Method Objects
.. index:: object: method
Methods are bound function objects. Methods are always bound to an instance of
an user-defined class. Unbound methods (methods bound to a class object) are
a user-defined class. Unbound methods (methods bound to a class object) are
no longer available.

View file

@ -50,7 +50,7 @@ Module Objects
.. c:function:: PyObject* PyModule_New(const char *name)
Similar to :c:func:`PyImport_NewObject`, but the name is an UTF-8 encoded
Similar to :c:func:`PyImport_NewObject`, but the name is a UTF-8 encoded
string instead of a Unicode object.

View file

@ -423,7 +423,7 @@ APIs:
.. c:function:: PyObject *PyUnicode_FromString(const char *u)
Create a Unicode object from an UTF-8 encoded null-terminated char buffer
Create a Unicode object from a UTF-8 encoded null-terminated char buffer
*u*.
@ -556,14 +556,13 @@ APIs:
.. c:function:: PyObject* PyUnicode_FromEncodedObject(PyObject *obj, \
const char *encoding, const char *errors)
Coerce an encoded object *obj* to an Unicode object and return a reference with
incremented refcount.
Decode an encoded object *obj* to a Unicode object.
:class:`bytes`, :class:`bytearray` and other
:term:`bytes-like objects <bytes-like object>`
are decoded according to the given *encoding* and using the error handling
defined by *errors*. Both can be *NULL* to have the interface use the default
values (see the next section for details).
values (see :ref:`builtincodecs` for details).
All other objects, including Unicode objects, cause a :exc:`TypeError` to be
set.
@ -745,8 +744,11 @@ Extension modules can continue using them, as they will not be removed in Python
.. c:function:: PyObject* PyUnicode_FromObject(PyObject *obj)
Shortcut for ``PyUnicode_FromEncodedObject(obj, NULL, "strict")`` which is used
throughout the interpreter whenever coercion to Unicode is needed.
Copy an instance of a Unicode subtype to a new true Unicode object if
necessary. If *obj* is already a true Unicode object (not a subtype),
return the reference with incremented refcount.
Objects other than Unicode or its subtypes will cause a :exc:`TypeError`.
Locale Encoding
@ -1224,7 +1226,7 @@ These are the UTF-16 codec APIs:
If *Py_UNICODE_WIDE* is defined, a single :c:type:`Py_UNICODE` value may get
represented as a surrogate pair. If it is not defined, each :c:type:`Py_UNICODE`
values is interpreted as an UCS-2 character.
values is interpreted as a UCS-2 character.
Return *NULL* if an exception was raised by the codec.

View file

@ -64,7 +64,7 @@ you can do so via the :func:`~urllib.request.urlretrieve` function::
html = open(local_filename)
Many uses of urllib will be that simple (note that instead of an 'http:' URL we
could have used an URL starting with 'ftp:', 'file:', etc.). However, it's the
could have used a URL starting with 'ftp:', 'file:', etc.). However, it's the
purpose of this tutorial to explain the more complicated cases, concentrating on
HTTP.

View file

@ -242,7 +242,7 @@ for a tournament. The numbers below are *k*, not ``a[k]``::
15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30
In the tree above, each cell *k* is topping ``2*k+1`` and ``2*k+2``. In an usual
In the tree above, each cell *k* is topping ``2*k+1`` and ``2*k+2``. In a usual
binary tournament we see in sports, each cell is the winner over the two cells
it tops, and we can trace the winner down the tree to see all opponents s/he
had. However, in many computer applications of such tournaments, we do not need

View file

@ -874,7 +874,7 @@ supports sending logging messages to a Web server, using either ``GET`` or
.. method:: emit(record)
Sends the record to the Web server as an URL-encoded dictionary. The
Sends the record to the Web server as a URL-encoded dictionary. The
:meth:`mapLogRecord` method is used to convert the record to the
dictionary to be sent.

View file

@ -1762,7 +1762,7 @@ features:
``os.path.join(os.path.dirname(path), result)``.
If the *path* is a string object, the result will also be a string object,
and the call may raise an UnicodeDecodeError. If the *path* is a bytes
and the call may raise a UnicodeDecodeError. If the *path* is a bytes
object, the result will be a bytes object.
This function can also support :ref:`paths relative to directory descriptors

View file

@ -12,7 +12,7 @@ those found in Perl.
Both patterns and strings to be searched can be Unicode strings as well as
8-bit strings. However, Unicode strings and 8-bit strings cannot be mixed:
that is, you cannot match an Unicode string with a byte pattern or
that is, you cannot match a Unicode string with a byte pattern or
vice-versa; similarly, when asking for a substitution, the replacement
string must be of the same type as both the pattern and the search string.

View file

@ -701,7 +701,7 @@ the widget option ``displaycolumns``. The tree widget can also display column
headings. Columns may be accessed by number or symbolic names listed in the
widget option columns. See `Column Identifiers`_.
Each item is identified by an unique name. The widget will generate item IDs
Each item is identified by a unique name. The widget will generate item IDs
if they are not supplied by the caller. There is a distinguished root item,
named ``{}``. The root item itself is not displayed; its children appear at the
top level of the hierarchy.

View file

@ -33,7 +33,7 @@ browsers are not available on Unix, the controlling process will launch a new
browser and wait.
The script :program:`webbrowser` can be used as a command-line interface for the
module. It accepts an URL as the argument. It accepts the following optional
module. It accepts a URL as the argument. It accepts the following optional
parameters: ``-n`` opens the URL in a new browser window, if possible;
``-t`` opens the URL in a new browser page ("tab"). The options are,
naturally, mutually exclusive. Usage example::

View file

@ -978,7 +978,7 @@ QName Objects
to get proper namespace handling on output. *text_or_uri* is a string
containing the QName value, in the form {uri}local, or, if the tag argument
is given, the URI part of a QName. If *tag* is given, the first argument is
interpreted as an URI, and this argument is interpreted as a local name.
interpreted as a URI, and this argument is interpreted as a local name.
:class:`QName` instances are opaque.

View file

@ -98,7 +98,7 @@ The :class:`XMLReader` interface supports the following methods:
Process an input source, producing SAX events. The *source* object can be a
system identifier (a string identifying the input source -- typically a file
name or an URL), a file-like object, or an :class:`InputSource` object. When
name or a URL), a file-like object, or an :class:`InputSource` object. When
:meth:`parse` returns, the input is completely processed, and the parser object
can be discarded or reset.

View file

@ -420,7 +420,7 @@ by providing an invalid URI::
import xmlrpc.client
# create a ServerProxy with an URI that doesn't respond to XMLRPC requests
# create a ServerProxy with a URI that doesn't respond to XMLRPC requests
proxy = xmlrpc.client.ServerProxy("http://google.com/")
try:

View file

@ -89,7 +89,7 @@ PyAPI_FUNC(PyObject *) _PyBytes_Join(PyObject *sep, PyObject *x);
#endif
/* Provides access to the internal data buffer and size of a string
object or the default encoded version of an Unicode object. Passing
object or the default encoded version of a Unicode object. Passing
NULL as *len parameter will force the string buffer to be
0-terminated (passing a string with embedded NULL characters will
cause an exception). */

View file

@ -823,7 +823,7 @@ PyAPI_FUNC(int) PyUnicode_WriteChar(
PyAPI_FUNC(Py_UNICODE) PyUnicode_GetMax(void);
#endif
/* Resize an Unicode object. The length is the number of characters, except
/* Resize a Unicode object. The length is the number of characters, except
if the kind of the string is PyUnicode_WCHAR_KIND: in this case, the length
is the number of Py_UNICODE characters.
@ -844,17 +844,13 @@ PyAPI_FUNC(int) PyUnicode_Resize(
Py_ssize_t length /* New length */
);
/* Coerce obj to an Unicode object and return a reference with
*incremented* refcount.
/* Decode obj to an Unicode object.
Coercion is done in the following way:
bytes, bytearray and other bytes-like objects are decoded according to the
given encoding and error handler. The encoding and error handler can be
NULL to have the interface use UTF-8 and "strict".
1. bytes, bytearray and other bytes-like objects are decoded
under the assumptions that they contain data using the UTF-8
encoding. Decoding is done in "strict" mode.
2. All other objects (including Unicode objects) raise an
exception.
All other objects (including Unicode objects) raise an exception.
The API returns NULL in case of an error. The caller is responsible
for decref'ing the returned objects.
@ -867,13 +863,9 @@ PyAPI_FUNC(PyObject*) PyUnicode_FromEncodedObject(
const char *errors /* error handling */
);
/* Coerce obj to an Unicode object and return a reference with
*incremented* refcount.
Unicode objects are passed back as-is (subclasses are converted to
true Unicode objects), all other objects are delegated to
PyUnicode_FromEncodedObject(obj, NULL, "strict") which results in
using UTF-8 encoding as basis for decoding the object.
/* Copy an instance of a Unicode subtype to a new true Unicode object if
necessary. If obj is already a true Unicode object (not a subtype), return
the reference with *incremented* refcount.
The API returns NULL in case of an error. The caller is responsible
for decref'ing the returned objects.
@ -998,7 +990,7 @@ _PyUnicodeWriter_WriteLatin1String(_PyUnicodeWriter *writer,
Py_ssize_t len /* length in bytes */
);
/* Get the value of the writer as an Unicode string. Clear the
/* Get the value of the writer as a Unicode string. Clear the
buffer of the writer. Raise an exception and return NULL
on error. */
PyAPI_FUNC(PyObject *)

View file

@ -197,7 +197,7 @@ def _unregister_wait_cb(self, fut):
#
# If the IocpProactor already received the event, it's safe to call
# _unregister() because we kept a reference to the Overlapped object
# which is used as an unique key.
# which is used as a unique key.
self._proactor._unregister(self._ov)
self._proactor = None

View file

@ -87,7 +87,7 @@ def formataddr(pair, charset='utf-8'):
'utf-8'.
"""
name, address = pair
# The address MUST (per RFC) be ascii, so raise an UnicodeError if it isn't.
# The address MUST (per RFC) be ascii, so raise a UnicodeError if it isn't.
address.encode('ascii')
if name:
try:

View file

@ -54,7 +54,7 @@
In the tree above, each cell `k' is topping `2*k+1' and `2*k+2'. In
an usual binary tournament we see in sports, each cell is the winner
a usual binary tournament we see in sports, each cell is the winner
over the two cells it tops, and we can trace the winner down the tree
to see all opponents s/he had. However, in many computer applications
of such tournaments, we do not need to trace the history of a winner.

View file

@ -165,7 +165,7 @@ class NNTPDataError(NNTPError):
# Helper function(s)
def decode_header(header_str):
"""Takes an unicode string representing a munged header value
"""Takes a unicode string representing a munged header value
and decodes it as a (possibly non-ASCII) readable value."""
parts = []
for v, enc in _email_decode_header(header_str):
@ -420,7 +420,7 @@ def _putline(self, line):
def _putcmd(self, line):
"""Internal: send one command to the server (through _putline()).
The `line` must be an unicode string."""
The `line` must be a unicode string."""
if self.debugging: print('*cmd*', repr(line))
line = line.encode(self.encoding, self.errors)
self._putline(line)
@ -445,7 +445,7 @@ def _getline(self, strip_crlf=True):
def _getresp(self):
"""Internal: get a response from the server.
Raise various errors if the response indicates an error.
Returns an unicode string."""
Returns a unicode string."""
resp = self._getline()
if self.debugging: print('*resp*', repr(resp))
resp = resp.decode(self.encoding, self.errors)
@ -462,7 +462,7 @@ def _getlongresp(self, file=None):
"""Internal: get a response plus following text from the server.
Raise various errors if the response indicates an error.
Returns a (response, lines) tuple where `response` is an unicode
Returns a (response, lines) tuple where `response` is a unicode
string and `lines` is a list of bytes objects.
If `file` is a file-like object, it must be open in binary mode.
"""

View file

@ -177,7 +177,7 @@ def splitunc(p):
Return a 2-tuple (unc, rest); either part may be empty.
If unc is not empty, it has the form '//host/mount' (or similar
using backslashes). unc+rest is always the input path.
Paths containing drive letters never have an UNC part.
Paths containing drive letters never have a UNC part.
"""
import warnings
warnings.warn("ntpath.splitunc is deprecated, use ntpath.splitdrive instead",

View file

@ -2401,7 +2401,7 @@ def test_decode_unicode(self):
self.assertRaises(TypeError, decoder, "xxx")
def test_unicode_escape(self):
# Escape-decoding an unicode string is supported ang gives the same
# Escape-decoding a unicode string is supported ang gives the same
# result as decoding the equivalent ASCII bytes string.
self.assertEqual(codecs.unicode_escape_decode(r"\u1234"), ("\u1234", 6))
self.assertEqual(codecs.unicode_escape_decode(br"\u1234"), ("\u1234", 6))

View file

@ -67,7 +67,7 @@ def test_init_segfault(self):
_multibytecodec.MultibyteStreamWriter, None)
def test_decode_unicode(self):
# Trying to decode an unicode string should raise a TypeError
# Trying to decode a unicode string should raise a TypeError
for enc in ALL_CJKENCODINGS:
self.assertRaises(TypeError, codecs.getdecoder(enc), "")
@ -160,7 +160,7 @@ def test_iso2022(self):
self.assertEqual(decoder.decode(b'B@$'), '\u4e16')
def test_decode_unicode(self):
# Trying to decode an unicode string should raise a TypeError
# Trying to decode a unicode string should raise a TypeError
for enc in ALL_CJKENCODINGS:
decoder = codecs.getincrementaldecoder(enc)()
self.assertRaises(TypeError, decoder.decode, "")

View file

@ -609,7 +609,7 @@ def handle_XOVER(self, message_spec):
"\t\t6683\t16"
"\t"
"\n"
# An UTF-8 overview line from fr.comp.lang.python
# A UTF-8 overview line from fr.comp.lang.python
"59\tRe: Message d'erreur incompréhensible (par moi)"
"\tEric Brunel <eric.brunel@pragmadev.nospam.com>"
"\tWed, 15 Sep 2010 18:09:15 +0200"

View file

@ -1774,7 +1774,7 @@ def to_bytestring(self, seq):
def assertCorrectUTF8Decoding(self, seq, res, err):
"""
Check that an invalid UTF-8 sequence raises an UnicodeDecodeError when
Check that an invalid UTF-8 sequence raises a UnicodeDecodeError when
'strict' is used, returns res when 'replace' is used, and that doesn't
return anything when 'ignore' is used.
"""

View file

@ -175,7 +175,7 @@ def test_speech128(self):
self.assertEqual(zlib.decompress(ob), data)
def test_incomplete_stream(self):
# An useful error message is given
# A useful error message is given
x = zlib.compress(HAMLET_SCENE)
self.assertRaisesRegex(zlib.error,
"Error -5 while decompressing data: incomplete or truncated stream",

View file

@ -2237,7 +2237,7 @@ Library
attribute which allows setting custom per-pickler reduction functions.
Patch by sbt.
- Issue #14177: marshal.loads() now raises TypeError when given an unicode
- Issue #14177: marshal.loads() now raises TypeError when given a unicode
string. Patch by Guilherme Gonçalves.
- Issue #13550: Remove the debug machinery from the threading module: remove
@ -8304,7 +8304,7 @@ Library
- Issue #1664: Make nntplib IPv6-capable. Patch by Derek Morr.
- Issue #5006: Better handling of unicode byte-order marks (BOM) in the io
library. This means, for example, that opening an UTF-16 text file in
library. This means, for example, that opening a UTF-16 text file in
append mode doesn't add a BOM at the end of the file if the file isn't
empty.
@ -9328,7 +9328,7 @@ Library
- Issue #4756: zipfile.is_zipfile() now supports file-like objects. Patch by
Gabriel Genellina.
- Issue #4574: reading an UTF16-encoded text file crashes if \r on 64-char
- Issue #4574: reading a UTF16-encoded text file crashes if \r on 64-char
boundary.
- Issue #4223: inspect.getsource() will now correctly display source code
@ -9929,7 +9929,7 @@ Extension Modules
exploitation of poor argument checking.
- bsddb code updated to version 4.7.3pre2. This code is the same than
Python 2.6 one, since the intention is to keep an unified 2.x/3.x codebase.
Python 2.6 one, since the intention is to keep a unified 2.x/3.x codebase.
The Python code is automatically translated using "2to3". Please, do not
update this code in Python 3.0 by hand. Update the 2.6 one and then
do "2to3".
@ -18936,7 +18936,7 @@ Core language, builtins, and interpreter
- There is a new Unicode companion to the PyObject_Str() API
called PyObject_Unicode(). It behaves in the same way as the
former, but assures that the returned value is an Unicode object
former, but assures that the returned value is a Unicode object
(applying the usual coercion if necessary).
- The comparison operators support "rich comparison overloading" (PEP

View file

@ -2834,7 +2834,7 @@ Core and Builtins
static type in some cases.
- Issue #15859: PyUnicode_EncodeFSDefault(), PyUnicode_EncodeMBCS() and
PyUnicode_EncodeCodePage() now raise an exception if the object is not an
PyUnicode_EncodeCodePage() now raise an exception if the object is not a
Unicode object. For PyUnicode_EncodeFSDefault(), it was already the case on
platforms other than Windows. Patch written by Campbell Barton.
@ -3925,7 +3925,7 @@ Library
- Issue #21226: Set up modules properly in PyImport_ExecCodeModuleObject
(and friends).
- Issue #21398: Fix an unicode error in the pydoc pager when the documentation
- Issue #21398: Fix a unicode error in the pydoc pager when the documentation
contains characters not encodable to the stdout encoding.
- Issue #16531: ipaddress.IPv4Network and ipaddress.IPv6Network now accept
@ -6929,7 +6929,7 @@ Core and Builtins
- Issue #17173: Remove uses of locale-dependent C functions (isalpha() etc.)
in the interpreter.
- Issue #17137: When an Unicode string is resized, the internal wide character
- Issue #17137: When a Unicode string is resized, the internal wide character
string (wstr) format is now cleared.
- Issue #17043: The unicode-internal decoder no longer read past the end of

View file

@ -550,7 +550,7 @@ representation for a tournament. The numbers below are `k', not a[k]:\n\
\n\
\n\
In the tree above, each cell `k' is topping `2*k+1' and `2*k+2'. In\n\
an usual binary tournament we see in sports, each cell is the winner\n\
a usual binary tournament we see in sports, each cell is the winner\n\
over the two cells it tops, and we can trace the winner down the tree\n\
to see all opponents s/he had. However, in many computer applications\n\
of such tournaments, we do not need to trace the history of a winner.\n\

View file

@ -1119,7 +1119,7 @@ _Unpickler_SkipConsumed(UnpicklerObject *self)
return 0;
assert(self->peek); /* otherwise we did something wrong */
/* This makes an useless copy... */
/* This makes a useless copy... */
r = PyObject_CallFunction(self->read, "n", consumed);
if (r == NULL)
return -1;

View file

@ -379,7 +379,7 @@ tracemalloc_get_frame(PyFrameObject *pyframe, frame_t *frame)
if (!PyUnicode_Check(filename)) {
#ifdef TRACE_DEBUG
tracemalloc_error("filename is not an unicode string");
tracemalloc_error("filename is not a unicode string");
#endif
return;
}

View file

@ -520,7 +520,7 @@ _PyCode_ConstantKey(PyObject *op)
return key;
}
else {
/* for other types, use the object identifier as an unique identifier
/* for other types, use the object identifier as a unique identifier
* to ensure that they are seen as unequal. */
PyObject *obj_id = PyLong_FromVoidPtr(op);
if (obj_id == NULL)

View file

@ -850,7 +850,7 @@ Py_LOCAL_INLINE(Py_ssize_t) findchar(const void *s, int kind,
}
#ifdef Py_DEBUG
/* Fill the data of an Unicode string with invalid characters to detect bugs
/* Fill the data of a Unicode string with invalid characters to detect bugs
earlier.
_PyUnicode_CheckConsistency(str, 1) detects invalid characters, at least for

View file

@ -403,7 +403,7 @@ wrap_codec_error(const char *operation,
operation, encoding);
}
/* Encode an object (e.g. an Unicode object) using the given encoding
/* Encode an object (e.g. a Unicode object) using the given encoding
and return the resulting encoded object (usually a Python string).
errors is passed to the encoder factory as argument if non-NULL. */
@ -450,7 +450,7 @@ _PyCodec_EncodeInternal(PyObject *object,
}
/* Decode an object (usually a Python string) using the given encoding
and return an equivalent object (e.g. an Unicode object).
and return an equivalent object (e.g. a Unicode object).
errors is passed to the decoder factory as argument if non-NULL. */