Commit graph

1067 commits

Author SHA1 Message Date
Victor Stinner da5727a120
gh-92651: Remove the Include/token.h header file (#92652)
Remove the token.h header file. There was never any public tokenizer
C API. The token.h header file was only designed to be used by Python
internals.

Move Include/token.h to Include/internal/pycore_token.h. Including
this header file now requires that the Py_BUILD_CORE macro is
defined. It no longer checks for the Py_LIMITED_API macro.

Rename functions:

* PyToken_OneChar() => _PyToken_OneChar()
* PyToken_TwoChars() => _PyToken_TwoChars()
* PyToken_ThreeChars() => _PyToken_ThreeChars()
2022-05-11 23:22:50 +02:00
Victor Stinner d716a0dfe2
Use static inline function Py_EnterRecursiveCall() (#91988)
Currently, calling Py_EnterRecursiveCall() and
Py_LeaveRecursiveCall() may use a function call or a static inline
function call, depending if the internal pycore_ceval.h header file
is included or not. Use a different name for the static inline
function to ensure that the static inline function is always used in
Python internals for best performance. Similar approach than
PyThreadState_GET() (function call) and _PyThreadState_GET() (static
inline function).

* Rename _Py_EnterRecursiveCall() to _Py_EnterRecursiveCallTstate()
* Rename _Py_LeaveRecursiveCall() to _Py_LeaveRecursiveCallTstate()
* pycore_ceval.h: Rename Py_EnterRecursiveCall() to
  _Py_EnterRecursiveCall() and Py_LeaveRecursiveCall() and
  _Py_LeaveRecursiveCall()
2022-05-04 13:30:23 +02:00
Serhiy Storchaka 3483299a24
gh-81548: Deprecate octal escape sequences with value larger than 0o377 (GH-91668) 2022-04-30 13:16:27 +03:00
Serhiy Storchaka 43a8bf1ea4
gh-87999: Change warning type for numeric literal followed by keyword (GH-91980)
The warning emitted by the Python parser for a numeric literal
immediately followed by keyword has been changed from deprecation
warning to syntax warning.
2022-04-27 20:15:14 +03:00
Matthieu Dartiailh aa0f056a00
bpo-47212: Improve error messages for un-parenthesized generator expressions (GH-32302) 2022-04-05 14:47:13 +01:00
Christian Heimes 3df0e63aab
bpo-46315: Use fopencookie only on Emscripten 3.x and newer (GH-32266) 2022-04-02 23:11:38 +02:00
Hugo van Kemenade 6881ea936e
bpo-47126: Update to canonical PEP URLs specified by PEP 676 (GH-32124) 2022-03-30 12:00:27 +01:00
Maciej Górski 7b44ade018
bpo-47129: Add more informative messages to f-string syntax errors (32127)
* Add more informative messages to f-string syntax errors

* 📜🤖 Added by blurb_it.

* Fix whitespaces

* Change error message

* Remove the 'else' statement (as sugested in review)

Co-authored-by: blurb-it[bot] <43283697+blurb-it[bot]@users.noreply.github.com>
2022-03-28 17:08:36 -04:00
Matthew Rahtz e8e737bcf6
bpo-43224: Implement PEP 646 grammar changes (GH-31018)
Co-authored-by: Jelle Zijlstra <jelle.zijlstra@gmail.com>
2022-03-26 09:55:35 -07:00
Pablo Galindo Salgado 26cca8067b
bpo-47117: Don't crash if we fail to decode characters when the tokenizer buffers are uninitialized (GH-32129)
Automerge-Triggered-By: GH:pablogsal
2022-03-26 09:29:02 -07:00
Christian Heimes 9b889b5bda
bpo-46315: Use fopencookie() to avoid dup() in _PyTokenizer_FindEncodingFilename (GH-32033)
WASI does not have dup() and Emscripten's emulation is slow.
2022-03-22 17:08:51 +01:00
Pablo Galindo Salgado 7d810b6a4e
bpo-46838: Syntax error improvements for function definitions (GH-31590) 2022-03-22 11:38:41 +00:00
Oleg Iarygin 13b0412223
bpo-46920: Remove code that has explainers why it was disabled (GH-31813) 2022-03-14 17:04:22 +01:00
Oleg Iarygin a52f82baf2
bpo-46920: Remove disabled debug code added decades ago and likely unnecessary (GH-31812) 2022-03-14 17:03:21 +01:00
Serhiy Storchaka 090e5c4b94
bpo-46820: Fix a SyntaxError in a numeric literal followed by "not in" (GH-31479)
Fix parsing a numeric literal immediately (without spaces) followed by
"not in" keywords, like in "1not in x". Now the parser only emits
a warning, not a syntax error.
2022-02-22 09:51:51 +02:00
Eric V. Smith ffd9f8ff84
bpo-46762: Fix an assert failure in f-strings where > or < is the last character if the f-string is missing a trailing right brace. (#31365) 2022-02-16 05:54:09 -05:00
Pablo Galindo Salgado e19059ecd8
Don't print rejected tokens when using the debug flags in the parser (GH-31258) 2022-02-10 14:38:27 +00:00
Pablo Galindo Salgado 390459de6d
Allow the parser to avoid nested processing of invalid rules (GH-31252) 2022-02-10 13:12:14 +00:00
Pablo Galindo Salgado b71dc71905
bpo-46707: Avoid potential exponential backtracking in some syntax errors (GH-31241) 2022-02-10 03:37:17 +00:00
Eric Snow 81c72044a1
bpo-46541: Replace core use of _Py_IDENTIFIER() with statically initialized global objects. (gh-30928)
We're no longer using _Py_IDENTIFIER() (or _Py_static_string()) in any core CPython code.  It is still used in a number of non-builtin stdlib modules.

The replacement is: PyUnicodeObject (not pointer) fields under _PyRuntimeState, statically initialized as part of _PyRuntime.  A new _Py_GET_GLOBAL_IDENTIFIER() macro facilitates lookup of the fields (along with _Py_GET_GLOBAL_STRING() for non-identifier strings).

https://bugs.python.org/issue46541#msg411799 explains the rationale for this change.

The core of the change is in:

* (new) Include/internal/pycore_global_strings.h - the declarations for the global strings, along with the macros
* Include/internal/pycore_runtime_init.h - added the static initializers for the global strings
* Include/internal/pycore_global_objects.h - where the struct in pycore_global_strings.h is hooked into _PyRuntimeState
* Tools/scripts/generate_global_objects.py - added generation of the global string declarations and static initializers

I've also added a --check flag to generate_global_objects.py (along with make check-global-objects) to check for unused global strings.  That check is added to the PR CI config.

The remainder of this change updates the core code to use _Py_GET_GLOBAL_IDENTIFIER() instead of _Py_IDENTIFIER() and the related _Py*Id functions (likewise for _Py_GET_GLOBAL_STRING() instead of _Py_static_string()).  This includes adding a few functions where there wasn't already an alternative to _Py*Id(), replacing the _Py_Identifier * parameter with PyObject *.

The following are not changed (yet):

* stop using _Py_IDENTIFIER() in the stdlib modules
* (maybe) get rid of _Py_IDENTIFIER(), etc. entirely -- this may not be doable as at least one package on PyPI using this (private) API
* (maybe) intern the strings during runtime init

https://bugs.python.org/issue46541
2022-02-08 13:39:07 -07:00
Pablo Galindo Salgado 69e10976b2
bpo-46521: Fix codeop to use a new partial-input mode of the parser (GH-31010) 2022-02-08 11:54:37 +00:00
Paul m. p. P 89b13042fc
bpo-14916: use specified tokenizer fd for file input (GH-31006)
@pablogsal, sorry i failed to rebase to main, so i recreated https://github.com/python/cpython/pull/22190#issuecomment-1024633392

> PyRun_InteractiveOne\*() functions allow to explicitily set fd instead of stdin.
but stdin was hardcoded in readline call.

> This patch does not fix target file for prompt unlike original bpo one : prompt fd is unrelated to tokenizer source which could be read only. It is more of a bugfix regarding the docs :  actual documentation say "prompt the user" so one would expect prompt to go on stdout not a file for both PyRun_InteractiveOne\*() and PyRun_InteractiveLoop\*().

Automerge-Triggered-By: GH:pablogsal
2022-02-01 14:33:52 -08:00
Pablo Galindo Salgado a0efc0c196
bpo-46091: Correctly calculate indentation levels for whitespace lines with continuation characters (GH-30130) 2022-01-25 22:12:14 +00:00
Eric V. Smith 0daf72194b
bpo-46503: Prevent an assert from firing when parsing some invalid \N sequences in f-strings. (GH-30865)
* bpo-46503: Prevent an assert from firing.  Also fix one nearby tiny PEP-7 nit.

* Added blurb.
2022-01-24 21:53:27 -05:00
Pablo Galindo Salgado 650720a0cf
Fix the caret position in some syntax errors in interactive mode (GH-30718) 2022-01-20 15:34:13 +00:00
Pablo Galindo Salgado 8c2fd09f36
bpo-46339: Include clarification on assert in 'get_error_line_from_tokenizer_buffers' (#30545) 2022-01-18 11:13:00 +00:00
Pablo Galindo Salgado cedec19be8
bpo-46339: Fix crash in the parser when computing error text for multi-line f-strings (GH-30529)
Automerge-Triggered-By: GH:pablogsal
2022-01-11 08:30:39 -08:00
Pablo Galindo Salgado 6fa8b2ceee
bpo-46237: Fix the line number of tokenizer errors inside f-strings (GH-30463) 2022-01-08 00:23:40 +00:00
Batuhan Taskaya d382f7ee0b
bpo-46289: Make conversion of FormattedValue not optional on ASDL (GH-30467)
Automerge-Triggered-By: GH:isidentical
2022-01-07 13:05:28 -08:00
Pablo Galindo Salgado 70f415fb8b
bpo-46240: Correct the error for unclosed parentheses when the tokenizer is not finished (GH-30378) 2022-01-04 10:41:22 +00:00
Pablo Galindo Salgado dd6c35761a
bpo-46110: Restore commit e9898bf153
This restores commit e9898bf153 .
2022-01-03 19:54:06 +00:00
Pablo Galindo Salgado 9d35dedc5e
Revert "bpo-46110: Add a recursion check to avoid stack overflow in the PEG parser (GH-30177)" (GH-30363)
This reverts commit e9898bf153 temporarily as we want to confirm if this commit is the cause of a slowdown at startup time.
2022-01-03 18:29:18 +00:00
Pablo Galindo Salgado e9898bf153
bpo-46110: Add a recursion check to avoid stack overflow in the PEG parser (GH-30177)
Co-authored-by: Batuhan Taskaya <isidentical@gmail.com>
2021-12-20 15:43:26 +00:00
Irit Katriel d60457a667
bpo-45292: [PEP-654] add except* (GH-29581) 2021-12-14 16:48:15 +00:00
Kumar Aditya 41026c3155
bpo-45855: Replaced deprecated PyImport_ImportModuleNoBlock with PyImport_ImportModule (GH-30046) 2021-12-12 10:45:20 +02:00
Pablo Galindo Salgado 4325a766f5
bpo-46054: Fix parsing error when parsing non-utf8 characters in source files (GH-30068) 2021-12-12 07:06:50 +00:00
Weipeng Hong 28179aac79
bpo-42918: Improve build-in function compile() in mode 'single' (GH-29934)
Co-authored-by: Alex Waygood <Alex.Waygood@Gmail.com>
2021-12-11 00:44:26 +01:00
Pablo Galindo Salgado 1c7a1c3be0
bpo-46004: Fix error location for loops with invalid targets (GH-29959) 2021-12-07 13:02:15 +00:00
Victor Stinner 253b7a0a9f
bpo-45866: pegen strips directory of "generated from" header (GH-29777)
"make regen-all" now produces the same output when run from a
directory other than the source tree: when building Python out of the
source tree.
2021-11-26 11:50:34 +01:00
Pablo Galindo Salgado 24c10d2943
bpo-45727: Only trigger the 'did you forgot a comma' error suggestion if inside parentheses (GH-29757) 2021-11-24 22:21:23 +00:00
Pablo Galindo Salgado 4f006a789a
Ensure the str member of the tokenizer is always initialised (GH-29681) 2021-11-21 02:06:39 +00:00
Pablo Galindo Salgado c9c4444d9f
Refactor parser compilation units into specific components (GH-29676) 2021-11-21 01:08:50 +00:00
Pablo Galindo Salgado 81f4e116ef
bpo-45811: Improve error message when source code contains invisible control characters (GH-29654) 2021-11-20 18:28:28 +00:00
Pablo Galindo Salgado 7a1d932528
bpo-45450: Improve syntax error for parenthesized arguments (GH-28906) 2021-11-20 18:27:40 +00:00
Pablo Galindo Salgado 79ff0d1687
bpo-45494: Fix error location in EOF tokenizer errors (GH-29108) 2021-11-20 17:40:59 +00:00
Pablo Galindo Salgado fdcc46d955
bpo-45848: Allow the parser to get error lines from encoded files (GH-29646) 2021-11-20 15:36:07 +01:00
Pablo Galindo Salgado 546cefcda7
bpo-45727: Make the syntax error for missing comma more consistent (GH-29427) 2021-11-19 23:11:57 +00:00
Pablo Galindo Salgado da20d7401d
bpo-45822: Respect PEP 263's coding cookies in the parser even if flags are not provided (GH-29582) 2021-11-16 12:30:47 -08:00
Pablo Galindo Salgado df4ae55e66
bpo-45820: Fix a segfault when the parser fails without reading any input (GH-29580) 2021-11-16 19:51:52 +00:00
Pablo Galindo Salgado 25835c518a
bpo-45738: Fix computation of error location for invalid continuation (GH-29550)
characters in the parser
2021-11-14 01:06:41 +00:00