Commit graph

55 commits

Author SHA1 Message Date
Pablo Galindo
261a452a13
bpo-25643: Refactor the C tokenizer into smaller, logical units (GH-25050) 2021-03-28 23:48:05 +01:00
Pablo Galindo
cd8dcbc851
bpo-43410: Fix crash in the parser when producing syntax errors when reading from stdin (GH-24763) 2021-03-14 04:38:40 +01:00
Pablo Galindo
d6d6371447
bpo-42864: Improve error messages regarding unclosed parentheses (GH-24161) 2021-01-19 23:59:33 +00:00
Lysandros Nikolaou
e5fe509054
bpo-42827: Fix crash on SyntaxError in multiline expressions (GH-24140)
When trying to extract the error line for the error message there
are two distinct cases:

1. The input comes from a file, which means that we can extract the
   error line by using `PyErr_ProgramTextObject` and which we already
   do.
2. The input does not come from a file, at which point we need to get
   the source code from the tokenizer:
   * If the tokenizer's current line number is the same with the line
     of the error, we get the line from `tok->buf` and we're ready.
   * Else, we can extract the error line from the source code in the
     following two ways:
     * If the input comes from a string we have all the input
       in `tok->str` and we can extract the error line from it.
     * If the input comes from stdin, i.e. the interactive prompt, we
       do not have access to the previous line. That's why a new
       field `tok->stdin_content` is added which holds the whole input for the
       current (multiline) statement or expression. We can then extract the
       error line from `tok->stdin_content` like we do in the string case above.

Co-authored-by: Pablo Galindo <Pablogsal@gmail.com>
2021-01-14 21:36:30 +00:00
Andy Lester
384f3c536d
closes bpo-39721: Fix constness of members of tok_state struct. (GH-18600)
The function PyTokenizer_FromUTF8 from Parser/tokenizer.c had a comment:

    /* XXX: constify members. */

This patch addresses that.

In the tok_state struct:
    * end and start were non-const but could be made const
    * str and input were const but should have been non-const

Changes to support this include:
    * decode_str() now returns a char * since it is allocated.
    * PyTokenizer_FromString() and PyTokenizer_FromUTF8() each creates a
        new char * for an allocate string instead of reusing the input
        const char *.
    * PyTokenizer_Get() and tok_get() now take const char ** arguments.
    * Various local vars are const or non-const accordingly.

I was able to remove five casts that cast away constness.
2020-02-27 18:44:52 -08:00
Pablo Galindo
f2cf1e3e28
bpo-36623: Clean parser headers and include files (GH-12253)
After the removal of pgen, multiple header and function prototypes that lack implementation or are unused are still lying around.
2019-04-13 17:05:14 +01:00
Guido van Rossum
495da29225 bpo-35975: Support parsing earlier minor versions of Python 3 (GH-12086)
This adds a `feature_version` flag to `ast.parse()` (documented) and `compile()` (hidden) that allow tweaking the parser to support older versions of the grammar. In particular if `feature_version` is 5 or 6, the hacks for the `async` and `await` keyword from PEP 492 are reinstated. (For 7 or higher, these are unconditionally treated as keywords, but they are still special tokens rather than `NAME` tokens that the parser driver recognizes.)



https://bugs.python.org/issue35975
2019-03-07 12:38:08 -08:00
Pablo Galindo
1f24a719e7
bpo-35808: Retire pgen and use pgen2 to generate the parser (GH-11814)
Pgen is the oldest piece of technology in the CPython repository, building it requires various #if[n]def PGEN hacks in other parts of the code and it also depends more and more on CPython internals. This commit removes the old pgen C code and replaces it for a new version implemented in pure Python. This is a modified and adapted version of lib2to3/pgen2 that can generate grammar files compatibles with the current parser.

This commit also eliminates all the #ifdef and code branches related to pgen, simplifying the code and making it more maintainable. The regen-grammar step now uses $(PYTHON_FOR_REGEN) that can be any version of the interpreter, so the new pgen code maintains compatibility with older versions of the interpreter (this also allows regenerating the grammar with the current CI solution that uses Python3.5). The new pgen Python module also makes use of the Grammar/Tokens file that holds the token specification, so is always kept in sync and avoids having to maintain duplicate token definitions.
2019-03-01 15:34:44 -08:00
Guido van Rossum
dcfcd146f8 bpo-35766: Merge typed_ast back into CPython (GH-11645) 2019-01-31 12:40:27 +01:00
Anthony Sottile
995d9b9297 bpo-16806: Fix lineno and col_offset for multi-line string tokens (GH-10021) 2019-01-13 13:05:13 +09:00
Serhiy Storchaka
94cf308ee2
bpo-33306: Improve SyntaxError messages for unbalanced parentheses. (GH-6516) 2018-12-17 17:34:14 +02:00
Victor Stinner
f2ddc6ac93
tokenizer: Remove unused tabs options (#4422)
Remove the following fields from tok_state structure which are now
used unused:

* altwarning: "Issue warning if alternate tabs don't match"
* alterror: "Issue error if alternate tabs don't match"
* alttabsize: "Alternate tab spacing"

Replace alttabsize variable with ALTTABSIZE define.
2017-11-17 01:25:47 -08:00
Jelle Zijlstra
ac317700ce bpo-30406: Make async and await proper keywords (#1669)
Per PEP 492, 'async' and 'await' should become proper keywords in 3.7.
2017-10-05 23:24:46 -04:00
Jim Fasarakis-Hilliard
cf1958af4c Remove obsolete declaration in tokenizer.h (#962) 2017-04-03 19:18:32 +03:00
Yury Selivanov
96ec934e75 Issue #24619: Simplify async/await tokenization.
This commit simplifies async/await tokenization in tokenizer.c,
tokenize.py & lib2to3/tokenize.py.  Previous solution was to keep
a stack of async-def & def blocks, whereas the new approach is just
to remember position of the outermost async-def block.

This change won't bring any parsing performance improvements, but
it makes the code much easier to read and validate.
2015-07-23 15:01:58 +03:00
Yury Selivanov
8fb307cd65 Issue #24619: New approach for tokenizing async/await.
This commit fixes how one-line async-defs and defs are tracked
by tokenizer.  It allows to correctly parse invalid code such
as:

>>> async def f():
...     def g(): pass
...     async = 10

and valid code such as:

>>> async def f():
...     async def g(): pass
...     await z

As a consequence, is is now possible to have one-line
'async def foo(): await ..' functions:

>>> async def foo(): return await bar()
2015-07-22 13:33:45 +03:00
Yury Selivanov
7544508f02 PEP 0492 -- Coroutines with async and await syntax. Issue #24017. 2015-05-11 22:57:16 -04:00
Serhiy Storchaka
c679227e31 Issue #1772673: The type of char* arguments now changed to const char*. 2013-10-19 21:03:34 +03:00
Victor Stinner
fe7c5b5bdf Issue #9319: Include the filename in "Non-UTF8 code ..." syntax error. 2011-04-05 01:48:03 +02:00
Victor Stinner
7f2fee3640 Issue #10785: Store the filename as Unicode in the Python parser. 2011-04-05 00:39:01 +02:00
Georg Brandl
2b15bd810d #10222: fix for overzealous AIX compiler. 2010-10-29 04:54:13 +00:00
Victor Stinner
4c7c8c3023 Issue #9713, #10114: Parser functions (eg. PyParser_ASTFromFile) expects
filenames encoded to the filesystem encoding with surrogateescape error handler
(to support undecodable bytes), instead of UTF-8 in strict mode.
2010-10-16 13:14:10 +00:00
Victor Stinner
22a351aabf Issue #10095: fp_setreadl() doesn't reopen the file, reuse instead the file
descriptor.
2010-10-14 12:04:34 +00:00
Antoine Pitrou
f95a1b3c53 Recorded merge of revisions 81029 via svnmerge from
svn+ssh://pythondev@svn.python.org/python/trunk

........
  r81029 | antoine.pitrou | 2010-05-09 16:46:46 +0200 (dim., 09 mai 2010) | 3 lines

  Untabify C files. Will watch buildbots.
........
2010-05-09 15:52:27 +00:00
Benjamin Peterson
aeaa592516 Merged revisions 76230 via svnmerge from
svn+ssh://pythondev@svn.python.org/python/trunk

........
  r76230 | benjamin.peterson | 2009-11-12 17:39:44 -0600 (Thu, 12 Nov 2009) | 2 lines

  fix several compile() issues by translating newlines in the tokenizer
........
2009-11-13 00:17:59 +00:00
Benjamin Peterson
f5b52246ed ignore the coding cookie in compile(), exec(), and eval() if the source is a string #4626 2009-03-02 23:31:26 +00:00
Brett Cannon
da78043237 Latin-1 source code was not being properly decoded when passed through
compile(). This was due to left-over special-casing before UTF-8 became the
default source encoding.

Closes issue #3574. Thanks to Victor Stinner for help with the patch.
2008-10-17 03:38:50 +00:00
Guido van Rossum
40d20bcf1f Issue 1267, continued.
Additional patch by Christian Heimes to deal more cleanly with the
FILE* vs file-descriptor issues.
I cleaned up his code a bit, and moved the lseek() call into import.c.
2007-10-22 00:09:51 +00:00
Guido van Rossum
ce3a72aec6 Patch 1267 by Christian Heimes.
Move the initialization of sys.std{in,out,err} and __builtin__.open
to C code.
This solves the problem that "python -S" wouldn't work.
2007-10-19 23:16:50 +00:00
Neil Schemenauer
3f993c3b52 Use an enum for decoding_state. It makes the code a little more
understandable.
2007-09-21 20:50:26 +00:00
Thomas Wouters
89d996e5c2 Merged revisions 57778-58052 via svnmerge from
svn+ssh://pythondev@svn.python.org/python/trunk

........
  r57820 | georg.brandl | 2007-08-31 08:59:27 +0200 (Fri, 31 Aug 2007) | 2 lines

  Document new shorthand notation for index entries.
........
  r57827 | georg.brandl | 2007-08-31 10:47:51 +0200 (Fri, 31 Aug 2007) | 2 lines

  Fix subitem markup.
........
  r57833 | martin.v.loewis | 2007-08-31 12:01:07 +0200 (Fri, 31 Aug 2007) | 1 line

  Mark registry components as 64-bit on Win64.
........
  r57854 | bill.janssen | 2007-08-31 21:02:23 +0200 (Fri, 31 Aug 2007) | 1 line

  deprecate use of FakeSocket
........
  r57855 | bill.janssen | 2007-08-31 21:02:46 +0200 (Fri, 31 Aug 2007) | 1 line

  remove mentions of socket.ssl in comments
........
  r57856 | bill.janssen | 2007-08-31 21:03:31 +0200 (Fri, 31 Aug 2007) | 1 line

  remove use of non-existent SSLFakeSocket in apparently untested code
........
  r57859 | martin.v.loewis | 2007-09-01 08:36:03 +0200 (Sat, 01 Sep 2007) | 3 lines

  Bug #1737210: Change Manufacturer of Windows installer to PSF.
  Will backport to 2.5.
........
  r57865 | georg.brandl | 2007-09-01 09:51:24 +0200 (Sat, 01 Sep 2007) | 2 lines

  Fix RST link (backport from Py3k).
........
  r57876 | georg.brandl | 2007-09-01 17:49:49 +0200 (Sat, 01 Sep 2007) | 2 lines

  Document sets' ">" and "<" operations (backport from py3k).
........
  r57878 | skip.montanaro | 2007-09-01 19:40:03 +0200 (Sat, 01 Sep 2007) | 4 lines

  Added a note and examples to explain that re.split does not split on an
  empty pattern match. (issue 852532).
........
  r57879 | walter.doerwald | 2007-09-01 20:18:09 +0200 (Sat, 01 Sep 2007) | 2 lines

  Fix wrong function names.
........
  r57880 | walter.doerwald | 2007-09-01 20:34:05 +0200 (Sat, 01 Sep 2007) | 2 lines

  Fix typo.
........
  r57889 | andrew.kuchling | 2007-09-01 22:31:59 +0200 (Sat, 01 Sep 2007) | 1 line

  Markup fix
........
  r57892 | andrew.kuchling | 2007-09-01 22:43:36 +0200 (Sat, 01 Sep 2007) | 1 line

  Add various items
........
  r57895 | andrew.kuchling | 2007-09-01 23:17:58 +0200 (Sat, 01 Sep 2007) | 1 line

  Wording change
........
  r57896 | andrew.kuchling | 2007-09-01 23:18:31 +0200 (Sat, 01 Sep 2007) | 1 line

  Add more items
........
  r57904 | ronald.oussoren | 2007-09-02 11:46:07 +0200 (Sun, 02 Sep 2007) | 3 lines

  Macosx: this patch ensures that the value of MACOSX_DEPLOYMENT_TARGET used
  by the Makefile is also used at configure-time.
........
  r57925 | georg.brandl | 2007-09-03 09:16:46 +0200 (Mon, 03 Sep 2007) | 2 lines

  Fix #883466: don't allow Unicode as arguments to quopri and uu codecs.
........
  r57936 | matthias.klose | 2007-09-04 01:33:04 +0200 (Tue, 04 Sep 2007) | 2 lines

  - Added support for linking the bsddb module against BerkeleyDB 4.6.x.
........
  r57954 | mark.summerfield | 2007-09-04 10:16:15 +0200 (Tue, 04 Sep 2007) | 3 lines

  Added cross-references plus a note about dict & list shallow copying.
........
  r57958 | martin.v.loewis | 2007-09-04 11:51:57 +0200 (Tue, 04 Sep 2007) | 3 lines

  Document that we rely on the OS to release the crypto
  context. Fixes #1626801.
........
  r57960 | martin.v.loewis | 2007-09-04 15:13:14 +0200 (Tue, 04 Sep 2007) | 3 lines

  Patch #1388440: Add set_completion_display_matches_hook and
  get_completion_type to readline.
........
  r57961 | martin.v.loewis | 2007-09-04 16:19:28 +0200 (Tue, 04 Sep 2007) | 3 lines

  Patch #1031213: Decode source line in SyntaxErrors back to its original
  source encoding. Will backport to 2.5.
........
  r57972 | matthias.klose | 2007-09-04 20:17:36 +0200 (Tue, 04 Sep 2007) | 3 lines

  - Makefile.pre.in(buildbottest): Run an optional script pybuildbot.identify
    to include some information about the build environment.
........
  r57973 | matthias.klose | 2007-09-04 21:05:38 +0200 (Tue, 04 Sep 2007) | 2 lines

  - Makefile.pre.in(buildbottest): Remove whitespace at eol.
........
  r57975 | matthias.klose | 2007-09-04 22:46:02 +0200 (Tue, 04 Sep 2007) | 2 lines

  - Fix libffi configure for hppa*-*-linux* | parisc*-*-linux*.
........
  r57980 | bill.janssen | 2007-09-05 02:46:27 +0200 (Wed, 05 Sep 2007) | 1 line

  SSL certificate distinguished names should be represented by tuples
........
  r57985 | martin.v.loewis | 2007-09-05 08:39:17 +0200 (Wed, 05 Sep 2007) | 3 lines

  Patch #1105: Explain that one needs to build the solution
  to get dependencies right.
........
  r57987 | armin.rigo | 2007-09-05 09:51:21 +0200 (Wed, 05 Sep 2007) | 4 lines

  PyDict_GetItem() returns a borrowed reference.
  There are probably a number of places that are open to attacks
  such as the following one, in bltinmodule.c:min_max().
........
  r57991 | martin.v.loewis | 2007-09-05 13:47:34 +0200 (Wed, 05 Sep 2007) | 3 lines

  Patch #786737: Allow building in a tree of symlinks pointing to
  a readonly source.
........
  r57993 | georg.brandl | 2007-09-05 15:36:44 +0200 (Wed, 05 Sep 2007) | 2 lines

  Backport from Py3k: Bug #1684991: explain lookup semantics for __special__ methods (new-style classes only).
........
  r58004 | armin.rigo | 2007-09-06 10:30:51 +0200 (Thu, 06 Sep 2007) | 4 lines

  Patch #1733973 by peaker:
  ptrace_enter_call() assumes no exception is currently set.
  This assumption is broken when throwing into a generator.
........
  r58006 | armin.rigo | 2007-09-06 11:30:38 +0200 (Thu, 06 Sep 2007) | 4 lines

  PyDict_GetItem() returns a borrowed reference.
  This attack is against ceval.c:IMPORT_NAME, which calls an
  object (__builtin__.__import__) without holding a reference to it.
........
  r58013 | georg.brandl | 2007-09-06 16:49:56 +0200 (Thu, 06 Sep 2007) | 2 lines

  Backport from 3k: #1116: fix reference to old filename.
........
  r58021 | thomas.heller | 2007-09-06 22:26:20 +0200 (Thu, 06 Sep 2007) | 1 line

  Fix typo:  c_float represents to C float type.
........
  r58022 | skip.montanaro | 2007-09-07 00:29:06 +0200 (Fri, 07 Sep 2007) | 3 lines

  If this is correct for py3k branch and it's already in the release25-maint
  branch, seems like it ought to be on the trunk as well.
........
  r58023 | gregory.p.smith | 2007-09-07 00:59:59 +0200 (Fri, 07 Sep 2007) | 4 lines

  Apply the fix from Issue1112 to make this test more robust and keep
  windows happy.
........
  r58031 | brett.cannon | 2007-09-07 05:17:50 +0200 (Fri, 07 Sep 2007) | 4 lines

  Make uuid1 and uuid4 tests conditional on whether ctypes can be imported;
  implementation of either function depends on ctypes but uuid as a whole does
  not.
........
  r58032 | brett.cannon | 2007-09-07 06:18:30 +0200 (Fri, 07 Sep 2007) | 6 lines

  Fix a crasher where Python code managed to infinitely recurse in C code without
  ever going back out to Python code in PyObject_Call().  Required introducing a
  static RuntimeError instance so that normalizing an exception there is no
  reliance on a recursive call that would put the exception system over the
  recursion check itself.
........
  r58034 | thomas.heller | 2007-09-07 08:32:17 +0200 (Fri, 07 Sep 2007) | 1 line

  Add a 'c_longdouble' type to the ctypes module.
........
  r58035 | thomas.heller | 2007-09-07 11:30:40 +0200 (Fri, 07 Sep 2007) | 1 line

  Remove unneeded #include.
........
  r58036 | thomas.heller | 2007-09-07 11:33:24 +0200 (Fri, 07 Sep 2007) | 6 lines

  Backport from py3k branch:

  Add a workaround for a strange bug on win64, when _ctypes is compiled
  with the SDK compiler.  This should fix the failing
  Lib\ctypes\test\test_as_parameter.py test.
........
  r58037 | georg.brandl | 2007-09-07 16:14:40 +0200 (Fri, 07 Sep 2007) | 2 lines

  Fix a wrong indentation for sublists.
........
  r58043 | georg.brandl | 2007-09-07 22:10:49 +0200 (Fri, 07 Sep 2007) | 2 lines

  #1095: ln -f doesn't work portably, fix in Makefile.
........
  r58049 | skip.montanaro | 2007-09-08 02:34:17 +0200 (Sat, 08 Sep 2007) | 1 line

  be explicit about the actual location of the missing file
........
2007-09-08 17:39:28 +00:00
Martin v. Löwis
85bcc66bb4 Convert code from sys.stdin.encoding to UTF-8 in
interactive mode. Fixes #1100.
2007-09-04 09:18:06 +00:00
Martin v. Löwis
49c5da1d88 Patch #1440601: Add col_offset attribute to AST nodes. 2006-03-01 22:49:05 +00:00
Martin v. Löwis
66485ae571 Remove unused field. 2006-03-01 04:04:20 +00:00
Martin v. Löwis
95292d6caa Constify filenames and scripts. Fixes #651362. 2002-12-11 14:04:59 +00:00
Martin v. Löwis
f62a89b1e0 Ignore encoding declarations inside strings. Fixes #603509. 2002-09-03 11:52:44 +00:00
Martin v. Löwis
1ee99d31d9 Make pgen compile with pydebug. Duplicate normalized names, as it may
be longer than the old string.
2002-08-04 20:10:29 +00:00
Martin v. Löwis
00f1e3f5a5 Patch #534304: Implement phase 1 of PEP 263. 2002-08-04 17:29:52 +00:00
Guido van Rossum
8586991099 REMOVED all CWI, CNRI and BeOpen copyright markings.
This should match the situation in the 1.6b1 tree.
2000-09-01 23:29:29 +00:00
Tim Peters
dbd9ba6a6c Nuke all remaining occurrences of Py_PROTO and Py_FPROTO. 2000-07-09 03:09:57 +00:00
Guido van Rossum
ffcc3813d8 Change copyright notice - 2nd try. 2000-06-30 23:58:06 +00:00
Guido van Rossum
fd71b9e9d4 Change copyright notice. 2000-06-30 23:50:40 +00:00
Guido van Rossum
926f13a081 Add checking for inconsistent tab usage 1998-04-09 21:38:06 +00:00
Guido van Rossum
86bea46b3d Another directory quickly renamed. 1997-04-29 21:03:06 +00:00
Guido van Rossum
d266eb460e New permission notice, includes CNRI. 1996-10-25 14:44:06 +00:00
Guido van Rossum
b9f8d6e54d Added 1995 to copyright message. 1995-01-04 19:08:09 +00:00
Guido van Rossum
1d5735e846 Merge back to main trunk 1994-08-30 08:27:36 +00:00
Guido van Rossum
248a50c168 * Grammar: corrected old typo (class instead of 'class')
* dosmodule.c: MSDOS specific stuff from posixmodule.c.
* posixmodule.c: removed all MSDOS specific stuff.
* tokenizer.h, parsetok.h: in prototypes, don't mix named and unnamed
  parameters (MSC doesn't like this).
1993-12-20 12:53:10 +00:00
Guido van Rossum
a3309960a5 * Added support for X11 modules.
* Makefile: change location of FORMS library.
* posixmodule.c: turn #if 0 into #ifdef MSDOS (stuff in unistd.h or not)
* Almost all .h files: added CPP magic to avoid duplicate inclusions and
  to support inclusion from C++.
1993-07-28 09:05:47 +00:00
Guido van Rossum
a849b834f1 * selectmodule.c: fix (another!) two memory leaks -- this time in list2set
* tokenizer.[ch]: allow continuation without \ inside () [] {}.
1993-05-12 11:35:44 +00:00