Issue #27781: Change file system encoding on Windows to UTF-8 (PEP 529)

This commit is contained in:
Steve Dower 2016-09-08 10:35:16 -07:00
parent cfbd48bc56
commit cc16be85c0
18 changed files with 618 additions and 836 deletions

View file

@ -802,10 +802,11 @@ File System Encoding
""""""""""""""""""""
To encode and decode file names and other environment strings,
:c:data:`Py_FileSystemEncoding` should be used as the encoding, and
``"surrogateescape"`` should be used as the error handler (:pep:`383`). To
encode file names during argument parsing, the ``"O&"`` converter should be
used, passing :c:func:`PyUnicode_FSConverter` as the conversion function:
:c:data:`Py_FileSystemDefaultEncoding` should be used as the encoding, and
:c:data:`Py_FileSystemDefaultEncodeErrors` should be used as the error handler
(:pep:`383` and :pep:`529`). To encode file names to :class:`bytes` during
argument parsing, the ``"O&"`` converter should be used, passing
:c:func:`PyUnicode_FSConverter` as the conversion function:
.. c:function:: int PyUnicode_FSConverter(PyObject* obj, void* result)
@ -820,8 +821,9 @@ used, passing :c:func:`PyUnicode_FSConverter` as the conversion function:
.. versionchanged:: 3.6
Accepts a :term:`path-like object`.
To decode file names during argument parsing, the ``"O&"`` converter should be
used, passing :c:func:`PyUnicode_FSDecoder` as the conversion function:
To decode file names to :class:`str` during argument parsing, the ``"O&"``
converter should be used, passing :c:func:`PyUnicode_FSDecoder` as the
conversion function:
.. c:function:: int PyUnicode_FSDecoder(PyObject* obj, void* result)
@ -840,7 +842,7 @@ used, passing :c:func:`PyUnicode_FSDecoder` as the conversion function:
.. c:function:: PyObject* PyUnicode_DecodeFSDefaultAndSize(const char *s, Py_ssize_t size)
Decode a string using :c:data:`Py_FileSystemDefaultEncoding` and the
``"surrogateescape"`` error handler, or ``"strict"`` on Windows.
:c:data:`Py_FileSystemDefaultEncodeErrors` error handler.
If :c:data:`Py_FileSystemDefaultEncoding` is not set, fall back to the
locale encoding.
@ -854,28 +856,28 @@ used, passing :c:func:`PyUnicode_FSDecoder` as the conversion function:
The :c:func:`Py_DecodeLocale` function.
.. versionchanged:: 3.2
Use ``"strict"`` error handler on Windows.
.. versionchanged:: 3.6
Use :c:data:`Py_FileSystemDefaultEncodeErrors` error handler.
.. c:function:: PyObject* PyUnicode_DecodeFSDefault(const char *s)
Decode a null-terminated string using :c:data:`Py_FileSystemDefaultEncoding`
and the ``"surrogateescape"`` error handler, or ``"strict"`` on Windows.
and the :c:data:`Py_FileSystemDefaultEncodeErrors` error handler.
If :c:data:`Py_FileSystemDefaultEncoding` is not set, fall back to the
locale encoding.
Use :c:func:`PyUnicode_DecodeFSDefaultAndSize` if you know the string length.
.. versionchanged:: 3.2
Use ``"strict"`` error handler on Windows.
.. versionchanged:: 3.6
Use :c:data:`Py_FileSystemDefaultEncodeErrors` error handler.
.. c:function:: PyObject* PyUnicode_EncodeFSDefault(PyObject *unicode)
Encode a Unicode object to :c:data:`Py_FileSystemDefaultEncoding` with the
``"surrogateescape"`` error handler, or ``"strict"`` on Windows, and return
:c:data:`Py_FileSystemDefaultEncodeErrors` error handler, and return
:class:`bytes`. Note that the resulting :class:`bytes` object may contain
null bytes.
@ -892,6 +894,8 @@ used, passing :c:func:`PyUnicode_FSDecoder` as the conversion function:
.. versionadded:: 3.2
.. versionchanged:: 3.6
Use :c:data:`Py_FileSystemDefaultEncodeErrors` error handler.
wchar_t Support
"""""""""""""""

View file

@ -428,25 +428,42 @@ always available.
.. function:: getfilesystemencoding()
Return the name of the encoding used to convert Unicode filenames into
system file names. The result value depends on the operating system:
Return the name of the encoding used to convert between Unicode
filenames and bytes filenames. For best compatibility, str should be
used for filenames in all cases, although representing filenames as bytes
is also supported. Functions accepting or returning filenames should support
either str or bytes and internally convert to the system's preferred
representation.
This encoding is always ASCII-compatible.
:func:`os.fsencode` and :func:`os.fsdecode` should be used to ensure that
the correct encoding and errors mode are used.
* On Mac OS X, the encoding is ``'utf-8'``.
* On Unix, the encoding is the user's preference according to the result of
nl_langinfo(CODESET).
* On Unix, the encoding is the locale encoding.
* On Windows NT+, file names are Unicode natively, so no conversion is
performed. :func:`getfilesystemencoding` still returns ``'mbcs'``, as
this is the encoding that applications should use when they explicitly
want to convert Unicode strings to byte strings that are equivalent when
used as file names.
* On Windows 9x, the encoding is ``'mbcs'``.
* On Windows, the encoding may be ``'utf-8'`` or ``'mbcs'``, depending
on user configuration.
.. versionchanged:: 3.2
:func:`getfilesystemencoding` result cannot be ``None`` anymore.
.. versionchanged:: 3.6
Windows is no longer guaranteed to return ``'mbcs'``. See :pep:`529`
and :func:`_enablelegacywindowsfsencoding` for more information.
.. function:: getfilesystemencodeerrors()
Return the name of the error mode used to convert between Unicode filenames
and bytes filenames. The encoding name is returned from
:func:`getfilesystemencoding`.
:func:`os.fsencode` and :func:`os.fsdecode` should be used to ensure that
the correct encoding and errors mode are used.
.. versionadded:: 3.6
.. function:: getrefcount(object)
@ -1138,6 +1155,18 @@ always available.
This function has been added on a provisional basis (see :pep:`411`
for details.) Use it only for debugging purposes.
.. function:: _enablelegacywindowsfsencoding()
Changes the default filesystem encoding and errors mode to 'mbcs' and
'replace' respectively, for consistency with versions of Python prior to 3.6.
This is equivalent to defining the :envvar:`PYTHONLEGACYWINDOWSFSENCODING`
environment variable before launching Python.
Availability: Windows
.. versionadded:: 3.6
See :pep:`529` for more details.
.. data:: stdin
stdout

View file

@ -672,6 +672,20 @@ conflict.
It now has no effect if set to an empty string.
.. envvar:: PYTHONLEGACYWINDOWSFSENCODING
If set to a non-empty string, the default filesystem encoding and errors mode
will revert to their pre-3.6 values of 'mbcs' and 'replace', respectively.
Otherwise, the new defaults 'utf-8' and 'surrogatepass' are used.
This may also be enabled at runtime with
:func:`sys._enablelegacywindowsfsencoding()`.
Availability: Windows
.. versionadded:: 3.6
See :pep:`529` for more details.
Debug-mode variables
~~~~~~~~~~~~~~~~~~~~

View file

@ -76,6 +76,8 @@ Security improvements:
Windows improvements:
* PEP 529: :ref:`Change Windows filesystem encoding to UTF-8 <pep-529>`
* The ``py.exe`` launcher, when used interactively, no longer prefers
Python 2 over Python 3 when the user doesn't specify a version (via
command line arguments or a config file). Handling of shebang lines
@ -218,6 +220,33 @@ evaluated at run time, and then formatted using the :func:`format` protocol.
See :pep:`498` and the main documentation at :ref:`f-strings`.
.. _pep-529:
PEP 529: Change Windows filesystem encoding to UTF-8
----------------------------------------------------
Representing filesystem paths is best performed with str (Unicode) rather than
bytes. However, there are some situations where using bytes is sufficient and
correct.
Prior to Python 3.6, data loss could result when using bytes paths on Windows.
With this change, using bytes to represent paths is now supported on Windows,
provided those bytes are encoded with the encoding returned by
:func:`sys.getfilesystemencoding()`, which now defaults to ``'utf-8'``.
Applications that do not use str to represent paths should use
:func:`os.fsencode()` and :func:`os.fsdecode()` to ensure their bytes are
correctly encoded. To revert to the previous behaviour, set
:envvar:`PYTHONLEGACYWINDOWSFSENCODING` or call
:func:`sys._enablelegacywindowsfsencoding`.
See :pep:`529` for more information and discussion of code modifications that
may be required.
.. note::
This change is considered experimental for 3.6.0 beta releases. The default
encoding may change before the final release.
PEP 487: Simpler customization of class creation
------------------------------------------------

View file

@ -23,6 +23,7 @@ PyAPI_FUNC(char *) Py_UniversalNewlineFgets(char *, int, FILE*, PyObject *);
If non-NULL, this is different than the default encoding for strings
*/
PyAPI_DATA(const char *) Py_FileSystemDefaultEncoding;
PyAPI_DATA(const char *) Py_FileSystemDefaultEncodeErrors;
PyAPI_DATA(int) Py_HasFileSystemDefaultEncoding;
/* Internal API

View file

@ -103,10 +103,6 @@ typedef wchar_t Py_UNICODE;
# endif
#endif
#if defined(MS_WINDOWS)
# define HAVE_MBCS
#endif
#ifdef HAVE_WCHAR_H
/* Work around a cosmetic bug in BSDI 4.x wchar.h; thanks to Thomas Wouters */
# ifdef _HAVE_BSDI
@ -1657,7 +1653,7 @@ PyAPI_FUNC(PyObject *) PyUnicode_TranslateCharmap(
);
#endif
#ifdef HAVE_MBCS
#ifdef MS_WINDOWS
/* --- MBCS codecs for Windows -------------------------------------------- */
@ -1700,7 +1696,7 @@ PyAPI_FUNC(PyObject*) PyUnicode_EncodeCodePage(
const char *errors /* error handling */
);
#endif /* HAVE_MBCS */
#endif /* MS_WINDOWS */
/* --- Decimal Encoder ---------------------------------------------------- */

View file

@ -851,10 +851,7 @@ def getenvb(key, default=None):
def _fscodec():
encoding = sys.getfilesystemencoding()
if encoding == 'mbcs':
errors = 'strict'
else:
errors = 'surrogateescape'
errors = sys.getfilesystemencodeerrors()
def fsencode(filename):
"""Encode filename (an os.PathLike, bytes, or str) to the filesystem

View file

@ -90,16 +90,6 @@ def ignore_deprecation_warnings(msg_regex, quiet=False):
yield
@contextlib.contextmanager
def bytes_filename_warn(expected):
msg = 'The Windows bytes API has been deprecated'
if os.name == 'nt':
with ignore_deprecation_warnings(msg, quiet=not expected):
yield
else:
yield
class _PathLike(os.PathLike):
def __init__(self, path=""):
@ -342,8 +332,7 @@ def test_stat_attributes_bytes(self):
fname = self.fname.encode(sys.getfilesystemencoding())
except UnicodeEncodeError:
self.skipTest("cannot encode %a for the filesystem" % self.fname)
with bytes_filename_warn(True):
self.check_stat_attributes(fname)
self.check_stat_attributes(fname)
def test_stat_result_pickle(self):
result = os.stat(self.fname)
@ -1032,8 +1021,6 @@ class BytesWalkTests(WalkTests):
def setUp(self):
super().setUp()
self.stack = contextlib.ExitStack()
if os.name == 'nt':
self.stack.enter_context(bytes_filename_warn(False))
def tearDown(self):
self.stack.close()
@ -1640,8 +1627,7 @@ def tearDown(self):
def _test_link(self, file1, file2):
create_file(file1)
with bytes_filename_warn(False):
os.link(file1, file2)
os.link(file1, file2)
with open(file1, "r") as f1, open(file2, "r") as f2:
self.assertTrue(os.path.sameopenfile(f1.fileno(), f2.fileno()))
@ -1934,10 +1920,9 @@ def test_listdir_no_extended_path(self):
self.created_paths)
# bytes
with bytes_filename_warn(False):
self.assertEqual(
sorted(os.listdir(os.fsencode(support.TESTFN))),
[os.fsencode(path) for path in self.created_paths])
self.assertEqual(
sorted(os.listdir(os.fsencode(support.TESTFN))),
[os.fsencode(path) for path in self.created_paths])
def test_listdir_extended_path(self):
"""Test when the path starts with '\\\\?\\'."""
@ -1949,11 +1934,10 @@ def test_listdir_extended_path(self):
self.created_paths)
# bytes
with bytes_filename_warn(False):
path = b'\\\\?\\' + os.fsencode(os.path.abspath(support.TESTFN))
self.assertEqual(
sorted(os.listdir(path)),
[os.fsencode(path) for path in self.created_paths])
path = b'\\\\?\\' + os.fsencode(os.path.abspath(support.TESTFN))
self.assertEqual(
sorted(os.listdir(path)),
[os.fsencode(path) for path in self.created_paths])
@unittest.skipUnless(sys.platform == "win32", "Win32 specific tests")
@ -2028,10 +2012,8 @@ def check_stat(self, link, target):
self.assertNotEqual(os.lstat(link), os.stat(link))
bytes_link = os.fsencode(link)
with bytes_filename_warn(True):
self.assertEqual(os.stat(bytes_link), os.stat(target))
with bytes_filename_warn(True):
self.assertNotEqual(os.lstat(bytes_link), os.stat(bytes_link))
self.assertEqual(os.stat(bytes_link), os.stat(target))
self.assertNotEqual(os.lstat(bytes_link), os.stat(bytes_link))
def test_12084(self):
level1 = os.path.abspath(support.TESTFN)
@ -2589,46 +2571,6 @@ def listxattr(path, *args):
self._check_xattrs(getxattr, setxattr, removexattr, listxattr)
@unittest.skipUnless(sys.platform == "win32", "Win32 specific tests")
class Win32DeprecatedBytesAPI(unittest.TestCase):
def test_deprecated(self):
import nt
filename = os.fsencode(support.TESTFN)
for func, *args in (
(nt._getfullpathname, filename),
(nt._isdir, filename),
(os.access, filename, os.R_OK),
(os.chdir, filename),
(os.chmod, filename, 0o777),
(os.getcwdb,),
(os.link, filename, filename),
(os.listdir, filename),
(os.lstat, filename),
(os.mkdir, filename),
(os.open, filename, os.O_RDONLY),
(os.rename, filename, filename),
(os.rmdir, filename),
(os.startfile, filename),
(os.stat, filename),
(os.unlink, filename),
(os.utime, filename),
):
with bytes_filename_warn(True):
try:
func(*args)
except OSError:
# ignore OSError, we only care about DeprecationWarning
pass
@support.skip_unless_symlink
def test_symlink(self):
self.addCleanup(support.unlink, support.TESTFN)
filename = os.fsencode(support.TESTFN)
with bytes_filename_warn(True):
os.symlink(filename, filename)
@unittest.skipUnless(hasattr(os, 'get_terminal_size'), "requires os.get_terminal_size")
class TermsizeTests(unittest.TestCase):
def test_does_not_crash(self):
@ -2712,16 +2654,7 @@ def test_oserror_filename(self):
(self.bytes_filenames, os.replace, b"dst"),
(self.unicode_filenames, os.rename, "dst"),
(self.unicode_filenames, os.replace, "dst"),
# Issue #16414: Don't test undecodable names with listdir()
# because of a Windows bug.
#
# With the ANSI code page 932, os.listdir(b'\xe7') return an
# empty list (instead of failing), whereas os.listdir(b'\xff')
# raises a FileNotFoundError. It looks like a Windows bug:
# b'\xe7' directory does not exist, FindFirstFileA(b'\xe7')
# fails with ERROR_FILE_NOT_FOUND (2), instead of
# ERROR_PATH_NOT_FOUND (3).
(self.unicode_filenames, os.listdir,),
(self.unicode_filenames, os.listdir, ),
))
else:
funcs.extend((
@ -2762,19 +2695,24 @@ def test_oserror_filename(self):
else:
funcs.append((self.filenames, os.readlink,))
for filenames, func, *func_args in funcs:
for name in filenames:
try:
if isinstance(name, str):
if isinstance(name, (str, bytes)):
func(name, *func_args)
elif isinstance(name, bytes):
with bytes_filename_warn(False):
func(name, *func_args)
else:
with self.assertWarnsRegex(DeprecationWarning, 'should be'):
func(name, *func_args)
except OSError as err:
self.assertIs(err.filename, name)
self.assertIs(err.filename, name, str(func))
except RuntimeError as err:
if sys.platform != 'win32':
raise
# issue27781: undecodable bytes currently raise RuntimeError
# by 3.6.0b4 this will become UnicodeDecodeError or nothing
self.assertIsInstance(err.__context__, UnicodeDecodeError)
else:
self.fail("No exception thrown by {}".format(func))
@ -3086,7 +3024,6 @@ def test_fspath_protocol(self):
entry = self.create_file_entry()
self.assertEqual(os.fspath(entry), os.path.join(self.path, 'file.txt'))
@unittest.skipIf(os.name == "nt", "test requires bytes path support")
def test_fspath_protocol_bytes(self):
bytes_filename = os.fsencode('bytesfile.txt')
bytes_entry = self.create_file_entry(name=bytes_filename)
@ -3158,12 +3095,6 @@ def test_broken_symlink(self):
entry.stat(follow_symlinks=False)
def test_bytes(self):
if os.name == "nt":
# On Windows, os.scandir(bytes) must raise an exception
with bytes_filename_warn(True):
self.assertRaises(TypeError, os.scandir, b'.')
return
self.create_file("file.txt")
path_bytes = os.fsencode(self.path)

View file

@ -286,6 +286,8 @@ Build
Windows
-------
- Issue #27781: Change file system encoding on Windows to UTF-8 (PEP 529)
- Issue #27731: Opt-out of MAX_PATH on Windows 10
- Issue #6135: Adds encoding and errors parameters to subprocess.
@ -2632,7 +2634,7 @@ Library
- Issue #24774: Fix docstring in http.server.test. Patch from Chiu-Hsiang Hsu.
- Issue #21159: Improve message in configparser.InterpolationMissingOptionError.
Patch from Łukasz Langa.
Patch from Å?ukasz Langa.
- Issue #20362: Honour TestCase.longMessage correctly in assertRegex.
Patch from Ilia Kurenkov.
@ -4560,7 +4562,7 @@ Library
Based on patch by Martin Panter.
- Issue #17293: uuid.getnode() now determines MAC address on AIX using netstat.
Based on patch by Aivars Kalvāns.
Based on patch by Aivars KalvÄ?ns.
- Issue #22769: Fixed ttk.Treeview.tag_has() when called without arguments.

View file

@ -604,7 +604,7 @@ _codecs_charmap_decode_impl(PyObject *module, Py_buffer *data,
return codec_tuple(decoded, data->len);
}
#ifdef HAVE_MBCS
#ifdef MS_WINDOWS
/*[clinic input]
_codecs.mbcs_decode
@ -666,7 +666,7 @@ _codecs_code_page_decode_impl(PyObject *module, int codepage,
return codec_tuple(decoded, consumed);
}
#endif /* HAVE_MBCS */
#endif /* MS_WINDOWS */
/* --- Encoder ------------------------------------------------------------ */
@ -972,7 +972,7 @@ _codecs_charmap_build_impl(PyObject *module, PyObject *map)
return PyUnicode_BuildEncodingMap(map);
}
#ifdef HAVE_MBCS
#ifdef MS_WINDOWS
/*[clinic input]
_codecs.mbcs_encode
@ -1021,7 +1021,7 @@ _codecs_code_page_encode_impl(PyObject *module, int code_page, PyObject *str,
PyUnicode_GET_LENGTH(str));
}
#endif /* HAVE_MBCS */
#endif /* MS_WINDOWS */
/* --- Error handler registry --------------------------------------------- */

View file

@ -764,7 +764,7 @@ exit:
return return_value;
}
#if defined(HAVE_MBCS)
#if defined(MS_WINDOWS)
PyDoc_STRVAR(_codecs_mbcs_decode__doc__,
"mbcs_decode($module, data, errors=None, final=False, /)\n"
@ -801,9 +801,9 @@ exit:
return return_value;
}
#endif /* defined(HAVE_MBCS) */
#endif /* defined(MS_WINDOWS) */
#if defined(HAVE_MBCS)
#if defined(MS_WINDOWS)
PyDoc_STRVAR(_codecs_oem_decode__doc__,
"oem_decode($module, data, errors=None, final=False, /)\n"
@ -840,9 +840,9 @@ exit:
return return_value;
}
#endif /* defined(HAVE_MBCS) */
#endif /* defined(MS_WINDOWS) */
#if defined(HAVE_MBCS)
#if defined(MS_WINDOWS)
PyDoc_STRVAR(_codecs_code_page_decode__doc__,
"code_page_decode($module, codepage, data, errors=None, final=False, /)\n"
@ -880,7 +880,7 @@ exit:
return return_value;
}
#endif /* defined(HAVE_MBCS) */
#endif /* defined(MS_WINDOWS) */
PyDoc_STRVAR(_codecs_readbuffer_encode__doc__,
"readbuffer_encode($module, data, errors=None, /)\n"
@ -1351,7 +1351,7 @@ exit:
return return_value;
}
#if defined(HAVE_MBCS)
#if defined(MS_WINDOWS)
PyDoc_STRVAR(_codecs_mbcs_encode__doc__,
"mbcs_encode($module, str, errors=None, /)\n"
@ -1381,9 +1381,9 @@ exit:
return return_value;
}
#endif /* defined(HAVE_MBCS) */
#endif /* defined(MS_WINDOWS) */
#if defined(HAVE_MBCS)
#if defined(MS_WINDOWS)
PyDoc_STRVAR(_codecs_oem_encode__doc__,
"oem_encode($module, str, errors=None, /)\n"
@ -1413,9 +1413,9 @@ exit:
return return_value;
}
#endif /* defined(HAVE_MBCS) */
#endif /* defined(MS_WINDOWS) */
#if defined(HAVE_MBCS)
#if defined(MS_WINDOWS)
PyDoc_STRVAR(_codecs_code_page_encode__doc__,
"code_page_encode($module, code_page, str, errors=None, /)\n"
@ -1447,7 +1447,7 @@ exit:
return return_value;
}
#endif /* defined(HAVE_MBCS) */
#endif /* defined(MS_WINDOWS) */
PyDoc_STRVAR(_codecs_register_error__doc__,
"register_error($module, errors, handler, /)\n"
@ -1536,4 +1536,4 @@ exit:
#ifndef _CODECS_CODE_PAGE_ENCODE_METHODDEF
#define _CODECS_CODE_PAGE_ENCODE_METHODDEF
#endif /* !defined(_CODECS_CODE_PAGE_ENCODE_METHODDEF) */
/*[clinic end generated code: output=7874e2d559d49368 input=a9049054013a1b77]*/
/*[clinic end generated code: output=ebe313ab417b17bb input=a9049054013a1b77]*/

View file

@ -1649,24 +1649,24 @@ PyDoc_STRVAR(os_execv__doc__,
{"execv", (PyCFunction)os_execv, METH_VARARGS, os_execv__doc__},
static PyObject *
os_execv_impl(PyObject *module, PyObject *path, PyObject *argv);
os_execv_impl(PyObject *module, path_t *path, PyObject *argv);
static PyObject *
os_execv(PyObject *module, PyObject *args)
{
PyObject *return_value = NULL;
PyObject *path = NULL;
path_t path = PATH_T_INITIALIZE("execv", "path", 0, 0);
PyObject *argv;
if (!PyArg_ParseTuple(args, "O&O:execv",
PyUnicode_FSConverter, &path, &argv)) {
path_converter, &path, &argv)) {
goto exit;
}
return_value = os_execv_impl(module, path, argv);
return_value = os_execv_impl(module, &path, argv);
exit:
/* Cleanup for path */
Py_XDECREF(path);
path_cleanup(&path);
return return_value;
}
@ -1719,7 +1719,7 @@ exit:
#endif /* defined(HAVE_EXECV) */
#if defined(HAVE_SPAWNV)
#if (defined(HAVE_SPAWNV) || defined(HAVE_WSPAWNV))
PyDoc_STRVAR(os_spawnv__doc__,
"spawnv($module, mode, path, argv, /)\n"
@ -1738,32 +1738,32 @@ PyDoc_STRVAR(os_spawnv__doc__,
{"spawnv", (PyCFunction)os_spawnv, METH_VARARGS, os_spawnv__doc__},
static PyObject *
os_spawnv_impl(PyObject *module, int mode, PyObject *path, PyObject *argv);
os_spawnv_impl(PyObject *module, int mode, path_t *path, PyObject *argv);
static PyObject *
os_spawnv(PyObject *module, PyObject *args)
{
PyObject *return_value = NULL;
int mode;
PyObject *path = NULL;
path_t path = PATH_T_INITIALIZE("spawnv", "path", 0, 0);
PyObject *argv;
if (!PyArg_ParseTuple(args, "iO&O:spawnv",
&mode, PyUnicode_FSConverter, &path, &argv)) {
&mode, path_converter, &path, &argv)) {
goto exit;
}
return_value = os_spawnv_impl(module, mode, path, argv);
return_value = os_spawnv_impl(module, mode, &path, argv);
exit:
/* Cleanup for path */
Py_XDECREF(path);
path_cleanup(&path);
return return_value;
}
#endif /* defined(HAVE_SPAWNV) */
#endif /* (defined(HAVE_SPAWNV) || defined(HAVE_WSPAWNV)) */
#if defined(HAVE_SPAWNV)
#if (defined(HAVE_SPAWNV) || defined(HAVE_WSPAWNV))
PyDoc_STRVAR(os_spawnve__doc__,
"spawnve($module, mode, path, argv, env, /)\n"
@ -1784,7 +1784,7 @@ PyDoc_STRVAR(os_spawnve__doc__,
{"spawnve", (PyCFunction)os_spawnve, METH_VARARGS, os_spawnve__doc__},
static PyObject *
os_spawnve_impl(PyObject *module, int mode, PyObject *path, PyObject *argv,
os_spawnve_impl(PyObject *module, int mode, path_t *path, PyObject *argv,
PyObject *env);
static PyObject *
@ -1792,24 +1792,24 @@ os_spawnve(PyObject *module, PyObject *args)
{
PyObject *return_value = NULL;
int mode;
PyObject *path = NULL;
path_t path = PATH_T_INITIALIZE("spawnve", "path", 0, 0);
PyObject *argv;
PyObject *env;
if (!PyArg_ParseTuple(args, "iO&OO:spawnve",
&mode, PyUnicode_FSConverter, &path, &argv, &env)) {
&mode, path_converter, &path, &argv, &env)) {
goto exit;
}
return_value = os_spawnve_impl(module, mode, path, argv, env);
return_value = os_spawnve_impl(module, mode, &path, argv, env);
exit:
/* Cleanup for path */
Py_XDECREF(path);
path_cleanup(&path);
return return_value;
}
#endif /* defined(HAVE_SPAWNV) */
#endif /* (defined(HAVE_SPAWNV) || defined(HAVE_WSPAWNV)) */
#if defined(HAVE_FORK1)
@ -4994,6 +4994,60 @@ os_abort(PyObject *module, PyObject *Py_UNUSED(ignored))
return os_abort_impl(module);
}
#if defined(MS_WINDOWS)
PyDoc_STRVAR(os_startfile__doc__,
"startfile($module, /, filepath, operation=None)\n"
"--\n"
"\n"
"startfile(filepath [, operation])\n"
"\n"
"Start a file with its associated application.\n"
"\n"
"When \"operation\" is not specified or \"open\", this acts like\n"
"double-clicking the file in Explorer, or giving the file name as an\n"
"argument to the DOS \"start\" command: the file is opened with whatever\n"
"application (if any) its extension is associated.\n"
"When another \"operation\" is given, it specifies what should be done with\n"
"the file. A typical operation is \"print\".\n"
"\n"
"startfile returns as soon as the associated application is launched.\n"
"There is no option to wait for the application to close, and no way\n"
"to retrieve the application\'s exit status.\n"
"\n"
"The filepath is relative to the current directory. If you want to use\n"
"an absolute path, make sure the first character is not a slash (\"/\");\n"
"the underlying Win32 ShellExecute function doesn\'t work if it is.");
#define OS_STARTFILE_METHODDEF \
{"startfile", (PyCFunction)os_startfile, METH_VARARGS|METH_KEYWORDS, os_startfile__doc__},
static PyObject *
os_startfile_impl(PyObject *module, path_t *filepath, Py_UNICODE *operation);
static PyObject *
os_startfile(PyObject *module, PyObject *args, PyObject *kwargs)
{
PyObject *return_value = NULL;
static char *_keywords[] = {"filepath", "operation", NULL};
path_t filepath = PATH_T_INITIALIZE("startfile", "filepath", 0, 0);
Py_UNICODE *operation = NULL;
if (!PyArg_ParseTupleAndKeywords(args, kwargs, "O&|u:startfile", _keywords,
path_converter, &filepath, &operation)) {
goto exit;
}
return_value = os_startfile_impl(module, &filepath, operation);
exit:
/* Cleanup for filepath */
path_cleanup(&filepath);
return return_value;
}
#endif /* defined(MS_WINDOWS) */
#if defined(HAVE_GETLOADAVG)
PyDoc_STRVAR(os_getloadavg__doc__,
@ -6034,6 +6088,10 @@ exit:
#define OS_SYSCONF_METHODDEF
#endif /* !defined(OS_SYSCONF_METHODDEF) */
#ifndef OS_STARTFILE_METHODDEF
#define OS_STARTFILE_METHODDEF
#endif /* !defined(OS_STARTFILE_METHODDEF) */
#ifndef OS_GETLOADAVG_METHODDEF
#define OS_GETLOADAVG_METHODDEF
#endif /* !defined(OS_GETLOADAVG_METHODDEF) */

View file

@ -973,28 +973,28 @@ Overlapped_AcceptEx(OverlappedObject *self, PyObject *args)
static int
parse_address(PyObject *obj, SOCKADDR *Address, int Length)
{
char *Host;
Py_UNICODE *Host;
unsigned short Port;
unsigned long FlowInfo;
unsigned long ScopeId;
memset(Address, 0, Length);
if (PyArg_ParseTuple(obj, "sH", &Host, &Port))
if (PyArg_ParseTuple(obj, "uH", &Host, &Port))
{
Address->sa_family = AF_INET;
if (WSAStringToAddressA(Host, AF_INET, NULL, Address, &Length) < 0) {
if (WSAStringToAddressW(Host, AF_INET, NULL, Address, &Length) < 0) {
SetFromWindowsErr(WSAGetLastError());
return -1;
}
((SOCKADDR_IN*)Address)->sin_port = htons(Port);
return Length;
}
else if (PyArg_ParseTuple(obj, "sHkk", &Host, &Port, &FlowInfo, &ScopeId))
else if (PyArg_ParseTuple(obj, "uHkk", &Host, &Port, &FlowInfo, &ScopeId))
{
PyErr_Clear();
Address->sa_family = AF_INET6;
if (WSAStringToAddressA(Host, AF_INET6, NULL, Address, &Length) < 0) {
if (WSAStringToAddressW(Host, AF_INET6, NULL, Address, &Length) < 0) {
SetFromWindowsErr(WSAGetLastError());
return -1;
}

File diff suppressed because it is too large Load diff

View file

@ -3185,7 +3185,7 @@ PyUnicode_Decode(const char *s,
|| strcmp(lower, "us_ascii") == 0) {
return PyUnicode_DecodeASCII(s, size, errors);
}
#ifdef HAVE_MBCS
#ifdef MS_WINDOWS
else if (strcmp(lower, "mbcs") == 0) {
return PyUnicode_DecodeMBCS(s, size, errors);
}
@ -3507,10 +3507,8 @@ PyUnicode_EncodeLocale(PyObject *unicode, const char *errors)
PyObject *
PyUnicode_EncodeFSDefault(PyObject *unicode)
{
#ifdef HAVE_MBCS
return PyUnicode_EncodeCodePage(CP_ACP, unicode, NULL);
#elif defined(__APPLE__)
return _PyUnicode_AsUTF8String(unicode, "surrogateescape");
#if defined(__APPLE__)
return _PyUnicode_AsUTF8String(unicode, Py_FileSystemDefaultEncodeErrors);
#else
PyInterpreterState *interp = PyThreadState_GET()->interp;
/* Bootstrap check: if the filesystem codec is implemented in Python, we
@ -3525,10 +3523,10 @@ PyUnicode_EncodeFSDefault(PyObject *unicode)
if (Py_FileSystemDefaultEncoding && interp->fscodec_initialized) {
return PyUnicode_AsEncodedString(unicode,
Py_FileSystemDefaultEncoding,
"surrogateescape");
Py_FileSystemDefaultEncodeErrors);
}
else {
return PyUnicode_EncodeLocale(unicode, "surrogateescape");
return PyUnicode_EncodeLocale(unicode, Py_FileSystemDefaultEncodeErrors);
}
#endif
}
@ -3577,7 +3575,7 @@ PyUnicode_AsEncodedString(PyObject *unicode,
|| strcmp(lower, "us_ascii") == 0) {
return _PyUnicode_AsASCIIString(unicode, errors);
}
#ifdef HAVE_MBCS
#ifdef MS_WINDOWS
else if (strcmp(lower, "mbcs") == 0) {
return PyUnicode_EncodeCodePage(CP_ACP, unicode, errors);
}
@ -3813,10 +3811,8 @@ PyUnicode_DecodeFSDefault(const char *s) {
PyObject*
PyUnicode_DecodeFSDefaultAndSize(const char *s, Py_ssize_t size)
{
#ifdef HAVE_MBCS
return PyUnicode_DecodeMBCS(s, size, NULL);
#elif defined(__APPLE__)
return PyUnicode_DecodeUTF8Stateful(s, size, "surrogateescape", NULL);
#if defined(__APPLE__)
return PyUnicode_DecodeUTF8Stateful(s, size, Py_FileSystemDefaultEncodeErrors, NULL);
#else
PyInterpreterState *interp = PyThreadState_GET()->interp;
/* Bootstrap check: if the filesystem codec is implemented in Python, we
@ -3829,12 +3825,24 @@ PyUnicode_DecodeFSDefaultAndSize(const char *s, Py_ssize_t size)
cannot only rely on it: check also interp->fscodec_initialized for
subinterpreters. */
if (Py_FileSystemDefaultEncoding && interp->fscodec_initialized) {
return PyUnicode_Decode(s, size,
PyObject *res = PyUnicode_Decode(s, size,
Py_FileSystemDefaultEncoding,
"surrogateescape");
Py_FileSystemDefaultEncodeErrors);
#ifdef MS_WINDOWS
if (!res && PyErr_ExceptionMatches(PyExc_UnicodeDecodeError)) {
PyObject *exc, *val, *tb;
PyErr_Fetch(&exc, &val, &tb);
PyErr_Format(PyExc_RuntimeError,
"filesystem path bytes were not correctly encoded with '%s'. " \
"Please report this at http://bugs.python.org/issue27781",
Py_FileSystemDefaultEncoding);
_PyErr_ChainExceptions(exc, val, tb);
}
#endif
return res;
}
else {
return PyUnicode_DecodeLocaleAndSize(s, size, "surrogateescape");
return PyUnicode_DecodeLocaleAndSize(s, size, Py_FileSystemDefaultEncodeErrors);
}
#endif
}
@ -4218,7 +4226,7 @@ make_decode_exception(PyObject **exceptionObject,
Py_CLEAR(*exceptionObject);
}
#ifdef HAVE_MBCS
#ifdef MS_WINDOWS
/* error handling callback helper:
build arguments, call the callback and check the arguments,
if no exception occurred, copy the replacement to the output
@ -4332,7 +4340,7 @@ unicode_decode_call_errorhandler_wchar(
Py_XDECREF(restuple);
return -1;
}
#endif /* HAVE_MBCS */
#endif /* MS_WINDOWS */
static int
unicode_decode_call_errorhandler_writer(
@ -7022,7 +7030,7 @@ PyUnicode_AsASCIIString(PyObject *unicode)
return _PyUnicode_AsASCIIString(unicode, NULL);
}
#ifdef HAVE_MBCS
#ifdef MS_WINDOWS
/* --- MBCS codecs for Windows -------------------------------------------- */
@ -7741,7 +7749,7 @@ PyUnicode_AsMBCSString(PyObject *unicode)
#undef NEED_RETRY
#endif /* HAVE_MBCS */
#endif /* MS_WINDOWS */
/* --- Character Mapping Codec -------------------------------------------- */

View file

@ -21,16 +21,18 @@
Don't forget to modify PyUnicode_DecodeFSDefault() if you touch any of the
values for Py_FileSystemDefaultEncoding!
*/
#ifdef HAVE_MBCS
const char *Py_FileSystemDefaultEncoding = "mbcs";
#if defined(__APPLE__)
const char *Py_FileSystemDefaultEncoding = "utf-8";
int Py_HasFileSystemDefaultEncoding = 1;
#elif defined(__APPLE__)
#elif defined(MS_WINDOWS)
/* may be changed by initfsencoding(), but should never be free()d */
const char *Py_FileSystemDefaultEncoding = "utf-8";
int Py_HasFileSystemDefaultEncoding = 1;
#else
const char *Py_FileSystemDefaultEncoding = NULL; /* set by initfsencoding() */
int Py_HasFileSystemDefaultEncoding = 0;
#endif
const char *Py_FileSystemDefaultEncodeErrors = "surrogateescape";
_Py_IDENTIFIER(__builtins__);
_Py_IDENTIFIER(__dict__);

View file

@ -90,6 +90,9 @@ int Py_NoUserSiteDirectory = 0; /* for -s and site.py */
int Py_UnbufferedStdioFlag = 0; /* Unbuffered binary std{in,out,err} */
int Py_HashRandomizationFlag = 0; /* for -R and PYTHONHASHSEED */
int Py_IsolatedFlag = 0; /* for -I, isolate from user's env */
#ifdef MS_WINDOWS
int Py_LegacyWindowsFSEncodingFlag = 0; /* Uses mbcs instead of utf-8 */
#endif
PyThreadState *_Py_Finalizing = NULL;
@ -321,6 +324,10 @@ _Py_InitializeEx_Private(int install_sigs, int install_importlib)
check its value further. */
if ((p = Py_GETENV("PYTHONHASHSEED")) && *p != '\0')
Py_HashRandomizationFlag = add_flag(Py_HashRandomizationFlag, p);
#ifdef MS_WINDOWS
if ((p = Py_GETENV("PYTHONLEGACYWINDOWSFSENCODING")) && *p != '\0')
Py_LegacyWindowsFSEncodingFlag = add_flag(Py_LegacyWindowsFSEncodingFlag, p);
#endif
_PyRandom_Init();
@ -958,6 +965,18 @@ initfsencoding(PyInterpreterState *interp)
{
PyObject *codec;
#ifdef MS_WINDOWS
if (Py_LegacyWindowsFSEncodingFlag)
{
Py_FileSystemDefaultEncoding = "mbcs";
Py_FileSystemDefaultEncodeErrors = "replace";
}
else
{
Py_FileSystemDefaultEncoding = "utf-8";
Py_FileSystemDefaultEncodeErrors = "surrogatepass";
}
#else
if (Py_FileSystemDefaultEncoding == NULL)
{
Py_FileSystemDefaultEncoding = get_locale_encoding();
@ -968,6 +987,7 @@ initfsencoding(PyInterpreterState *interp)
interp->fscodec_initialized = 1;
return 0;
}
#endif
/* the encoding is mbcs, utf-8 or ascii */
codec = _PyCodec_Lookup(Py_FileSystemDefaultEncoding);

View file

@ -310,6 +310,23 @@ Return the encoding used to convert Unicode filenames in\n\
operating system filenames."
);
static PyObject *
sys_getfilesystemencodeerrors(PyObject *self)
{
if (Py_FileSystemDefaultEncodeErrors)
return PyUnicode_FromString(Py_FileSystemDefaultEncodeErrors);
PyErr_SetString(PyExc_RuntimeError,
"filesystem encoding is not initialized");
return NULL;
}
PyDoc_STRVAR(getfilesystemencodeerrors_doc,
"getfilesystemencodeerrors() -> string\n\
\n\
Return the error mode used to convert Unicode filenames in\n\
operating system filenames."
);
static PyObject *
sys_intern(PyObject *self, PyObject *args)
{
@ -866,6 +883,24 @@ sys_getwindowsversion(PyObject *self)
#pragma warning(pop)
PyDoc_STRVAR(enablelegacywindowsfsencoding_doc,
"_enablelegacywindowsfsencoding()\n\
\n\
Changes the default filesystem encoding to mbcs:replace for consistency\n\
with earlier versions of Python. See PEP 529 for more information.\n\
\n\
This is equivalent to defining the PYTHONLEGACYWINDOWSFSENCODING \n\
environment variable before launching Python."
);
static PyObject *
sys_enablelegacywindowsfsencoding(PyObject *self)
{
Py_FileSystemDefaultEncoding = "mbcs";
Py_FileSystemDefaultEncodeErrors = "replace";
Py_RETURN_NONE;
}
#endif /* MS_WINDOWS */
#ifdef HAVE_DLOPEN
@ -1225,6 +1260,8 @@ static PyMethodDef sys_methods[] = {
#endif
{"getfilesystemencoding", (PyCFunction)sys_getfilesystemencoding,
METH_NOARGS, getfilesystemencoding_doc},
{ "getfilesystemencodeerrors", (PyCFunction)sys_getfilesystemencodeerrors,
METH_NOARGS, getfilesystemencodeerrors_doc },
#ifdef Py_TRACE_REFS
{"getobjects", _Py_GetObjects, METH_VARARGS},
#endif
@ -1240,6 +1277,8 @@ static PyMethodDef sys_methods[] = {
#ifdef MS_WINDOWS
{"getwindowsversion", (PyCFunction)sys_getwindowsversion, METH_NOARGS,
getwindowsversion_doc},
{"_enablelegacywindowsfsencoding", (PyCFunction)sys_enablelegacywindowsfsencoding,
METH_NOARGS, enablelegacywindowsfsencoding_doc },
#endif /* MS_WINDOWS */
{"intern", sys_intern, METH_VARARGS, intern_doc},
{"is_finalizing", sys_is_finalizing, METH_NOARGS, is_finalizing_doc},
@ -1456,14 +1495,21 @@ version -- the version of this interpreter as a string\n\
version_info -- version information as a named tuple\n\
"
)
#ifdef MS_WINDOWS
#ifdef MS_COREDLL
/* concatenating string here */
PyDoc_STR(
"dllhandle -- [Windows only] integer handle of the Python DLL\n\
winver -- [Windows only] version number of the Python DLL\n\
"
)
#endif /* MS_WINDOWS */
#endif /* MS_COREDLL */
#ifdef MS_WINDOWS
/* concatenating string here */
PyDoc_STR(
"_enablelegacywindowsfsencoding -- [Windows only] \n\
"
)
#endif
PyDoc_STR(
"__stdin__ -- the original stdin; don't touch!\n\
__stdout__ -- the original stdout; don't touch!\n\