Commit graph

162 commits

Author SHA1 Message Date
Barney Gale a33ce66dca
GH-87695: Fix OSError from pathlib.Path.glob() (GH-104292)
Fix issue where `pathlib.Path.glob()` raised `OSError` when it encountered
a symlink to an overly long path.
2023-05-10 17:17:08 +00:00
Barney Gale c0ece3dc97
GH-102613: Improve performance of pathlib.Path.rglob() (GH-104244)
Stop de-duplicating results in `_RecursiveWildcardSelector`. A new
`_DoubleRecursiveWildcardSelector` class is introduced which performs
de-duplication, but this is used _only_ for patterns with multiple
non-adjacent `**` segments, such as `path.glob('**/foo/**')`. By avoiding
the use of a set, `PurePath.__hash__()` is not called, and so paths do not
need to be stringified and case-normalised.

Also merge adjacent '**' segments in patterns.
2023-05-07 22:12:50 +01:00
Barney Gale e8d77b03e0
GH-89812: Churn pathlib.Path methods (GH-104243)
Re-arrange `pathlib.Path` methods in source code. No other changes.

The methods are arranged as follows:

1. `stat()` and dependants (`exists()`, `is_dir()`, etc)
2. `open()` and dependants (`read_text()`, `write_bytes()`, etc)
3. `iterdir()` and dependants (`glob()`, `walk()`, etc)
4. All other `Path` methods

This patch prepares the ground for a new `_AbstractPath` class, which will
support the methods in groups 1, 2 and 3 above. By churning the methods
here, subsequent patches will be easier to review and less likely to break
things.
2023-05-07 20:07:07 +01:00
Barney Gale de7f694e3c
GH-103548: Improve performance of pathlib.Path.[is_]absolute() (GH-103549)
Improve performance of `pathlib.Path.absolute()` and `cwd()` by joining paths only when necessary. Also improve
performance of `PurePath.is_absolute()` on Posix by skipping path parsing and normalization.
2023-05-06 18:03:07 +00:00
Barney Gale d00d942149
GH-100479: Add pathlib.PurePath.with_segments() (GH-103975)
Add `pathlib.PurePath.with_segments()`, which creates a path object from arguments. This method is called whenever a derivative path is created, such as from `pathlib.PurePath.parent`. Subclasses may override this method to share information between path objects.

Co-authored-by: Alex Waygood <Alex.Waygood@Gmail.com>
2023-05-05 19:04:53 +00:00
Barney Gale 8100be5535
GH-81079: Add case_sensitive argument to pathlib.Path.glob() (GH-102710)
This argument allows case-sensitive matching to be enabled on Windows, and
case-insensitive matching to be enabled on Posix.

Co-authored-by: Steve Dower <steve.dower@microsoft.com>
2023-05-04 16:44:36 +00:00
Barney Gale da1980afcb
GH-104114: Fix pathlib.WindowsPath.glob() use of literal pattern segment case (GH-104116)
We now use `_WildcardSelector` to evaluate literal pattern segments, which
allows us to retrieve the real filesystem case.

This change is necessary in order to implement a *case_sensitive* argument
(see GH-81079) and a *follow_symlinks* argument (see GH-77609).
2023-05-03 17:28:44 +01:00
andrei kulakov af886ffa06
GH-89769: pathlib.Path.glob(): do not follow symlinks when checking for precise match (GH-29655)
Co-authored-by: Barney Gale <barney.gale@gmail.com>
2023-05-03 04:50:10 +01:00
Barney Gale 65a49c6553
GH-104102: Optimize pathlib.Path.glob() handling of ../ pattern segments (GH-104103)
These segments do not require a `stat()` call, as the selector's
`_select_from()` method is called after we've established that the
parent is a directory.
2023-05-02 23:16:04 +00:00
Barney Gale 47770a1e91
GH-104104: Optimize pathlib.Path.glob() by avoiding repeated calls to os.path.normcase() (GH-104105)
Use `re.IGNORECASE` to implement case-insensitive matching. This
restores behaviour from before GH-31691.
2023-05-02 22:51:18 +01:00
Barney Gale 8611e7bf5c
GH-103525: Improve exception message from pathlib.PurePath() (GH-103526)
Check that arguments are strings before calling `os.path.join()`.

Also improve performance of `PurePath(PurePath(...))` while we're in the
area: we now use the *unnormalized* string path of such arguments.

Co-authored-by: Terry Jan Reedy <tjreedy@udel.edu>
2023-05-02 19:08:19 +01:00
Barney Gale 8af8f52d17
GH-78079: Fix UNC device path root normalization in pathlib (GH-102003)
We no longer add a root to device paths such as `//./PhysicalDrive0`,
`//?/BootPartition` and `//./c:` while normalizing. We also avoid adding a
root to incomplete UNC share paths, like `//`, `//a` and `//a/`.

Co-authored-by: Eryk Sun <eryksun@gmail.com>
2023-04-14 21:55:41 +01:00
Barney Gale 2c673d5e93
GH-101362: Omit path anchor from pathlib.PurePath()._parts (GH-102476)
Improve performance of path construction by skipping the addition of the path anchor (`drive + root`) to the internal `_parts` list. Rename this attribute to `_tail` for clarity.
2023-04-09 18:40:03 +01:00
Barney Gale 11c302055a
GH-76846, GH-85281: Call __new__() and __init__() on pathlib subclasses (GH-102789)
Fix an issue where `__new__()` and `__init__()` were not called on subclasses of `pathlib.PurePath` and `Path` in some circumstances.

Paths are now normalized on-demand. This speeds up path construction, `p.joinpath(q)`, and `p / q`.

Co-authored-by: Steve Dower <steve.dower@microsoft.com>
2023-04-03 19:57:11 +01:00
Stanislav Zmiev 713df2c534
GH-89727: Fix pathlib.Path.walk RecursionError on deep trees (GH-100282)
Use a stack to implement `pathlib.Path.walk()` iteratively instead of recursively to avoid hitting recursion limits on deeply nested trees.

Co-authored-by: Barney Gale <barney.gale@gmail.com>
Co-authored-by: Brett Cannon <brett@python.org>
2023-03-22 14:45:25 +00:00
Barney Gale 90f1d77717
GH-80486: Fix handling of NTFS alternate data streams in pathlib (GH-102454)
Co-authored-by: Maor Kleinberger <kmaork@gmail.com>
2023-03-10 17:29:04 +00:00
Barney Gale 6716254e71
GH-101362: Optimise PurePath(PurePath(...)) (GH-101667)
The previous `_parse_args()` method pulled the `_parts` out of any supplied `PurePath` objects; these were subsequently joined in `_from_parts()` using `os.path.join()`. This is actually a slower form of joining than calling `fspath()` on the path object, because it doesn't take advantage of the fact that the contents of `_parts` is normalized!

This reduces the time taken to run `PurePath("foo", "bar")` by ~20%, and the time taken to run `PurePath(p, "cheese")`, where `p = PurePath("/foo", "bar", "baz")`, by ~40%.

Automerge-Triggered-By: GH:AlexWaygood
2023-03-05 15:50:21 -08:00
Barney Gale 3e60e0213e
GH-101362: Check pathlib.Path flavour compatibility at import time (GH-101664)
This saves a comparison in `pathlib.Path.__new__()` and reduces the time taken to run `Path()` by ~5%.

Automerge-Triggered-By: GH:AlexWaygood
2023-03-05 14:46:45 -08:00
Barney Gale 3572c861d8
GH-101362: Call join() only when >1 argument supplied to pathlib.PurePath() (#101665)
GH-101362: Call join() only when >1 argument supplied to pathlib.PurePath

This reduces the time taken to run `PurePath("foo")` by ~15%
2023-03-05 22:00:56 +00:00
Barney Gale 072011b3c3
gh-100809: Fix handling of drive-relative paths in pathlib.Path.absolute() (GH-100812)
Resolving the drive independently uses the OS API, which ensures it starts from the current directory on that drive.
2023-02-17 14:08:14 +00:00
Barney Gale d401b20630
gh-101360: Fix anchor matching in pathlib.PureWindowsPath.match() (GH-101363)
Use `fnmatch` to match path and pattern anchors, just as we do for other
path parts. This allows patterns such as `'*:/Users/*'` to be matched.
2023-02-17 14:05:38 +00:00
Barney Gale e5b08ddddf
gh-101000: Add os.path.splitroot() (#101002)
Co-authored-by: Eryk Sun <eryksun@gmail.com>
Co-authored-by: Alex Waygood <Alex.Waygood@Gmail.com>
2023-01-27 00:28:27 +00:00
Yurii Karabas 080cb27829
gh-74033: Fix bug when Path takes and ignores **kwargs (GH-19632)
Fix a bug where `Path` takes and ignores `**kwargs` by adding to `PurePath`  class `__init__` method which can take only positional arguments.

Automerge-Triggered-By: GH:brettcannon
2023-01-13 16:05:43 -08:00
Barney Gale 7fba99eadb
gh-100562: improve performance of pathlib.Path.absolute() (GH-100563)
Increase performance of the `absolute()` method by calling `os.getcwd()` directly, rather than using the `Path.cwd()` class method. This avoids constructing an extra `Path` object (and the parsing/normalization that comes with it).

Decrease performance of the `cwd()` class method by calling the `Path.absolute()` method, rather than using `os.getcwd()` directly. This involves constructing an extra `Path` object. We do this to maintain a longstanding pattern where `os` functions are called from only one place, which allows them to be more readily replaced by users. As `cwd()` is generally called at most once within user programs, it's a good bargain.

```shell
# before
$ ./python -m timeit -s 'from pathlib import Path; p = Path("foo", "bar")' 'p.absolute()'
50000 loops, best of 5: 9.04 usec per loop
# after
$ ./python -m timeit -s 'from pathlib import Path; p = Path("foo", "bar")' 'p.absolute()'
50000 loops, best of 5: 5.02 usec per loop
```

Automerge-Triggered-By: GH:AlexWaygood
2023-01-05 14:11:50 -08:00
Barney Gale a68e585c8b
gh-68320, gh-88302 - Allow for private pathlib.Path subclassing (GH-31691)
Users may wish to define subclasses of `pathlib.Path` to add or modify
existing methods. Before this change, attempting to instantiate a subclass
raised an exception like:

    AttributeError: type object 'PPath' has no attribute '_flavour'

Previously the `_flavour` attribute was assigned as follows:

    PurePath._flavour        = xxx not set!! xxx
    PurePosixPath._flavour   = _PosixFlavour()
    PureWindowsPath._flavour = _WindowsFlavour()

This change replaces it with a `_pathmod` attribute, set as follows:

    PurePath._pathmod        = os.path
    PurePosixPath._pathmod   = posixpath
    PureWindowsPath._pathmod = ntpath

Functionality from `_PosixFlavour` and `_WindowsFlavour` is moved into
`PurePath` as underscored-prefixed classmethods. Flavours are removed.

Co-authored-by: Alex Waygood <Alex.Waygood@Gmail.com>
Co-authored-by: Brett Cannon <brett@python.org>
Co-authored-by: Adam Turner <9087854+AA-Turner@users.noreply.github.com>
Co-authored-by: Eryk Sun <eryksun@gmail.com>
2022-12-23 14:52:23 -08:00
Barney Gale 5a991da329
gh-78707: deprecate passing >1 argument to PurePath.[is_]relative_to() (GH-94469)
This brings `relative_to()` and `is_relative_to()` more in line with other pathlib methods like `rename()` and `symlink_to()`.

Resolves #78707.
2022-12-16 16:14:27 -08:00
Barney Gale ae234fbc5c
gh-99029: Fix handling of PureWindowsPath('C:\<blah>').relative_to('C:') (GH-99031)
`relative_to()` now treats naked drive paths as relative. This brings its
behaviour in line with other parts of pathlib, and with `ntpath.relpath()`,
and so allows us to factor out the pathlib-specific implementation.
2022-11-25 11:15:57 -08:00
Charles Machalow 1b2de89bce
gh-99547: Add isjunction methods for checking if a path is a junction (GH-99548) 2022-11-22 17:19:34 +00:00
Nikita Sobolev 87f5180cd7
gh-98832: Change wording in docstring of pathlib.Path.iterdir (GH-98833)
Found while working on https://github.com/python/cpython/issues/98829

Automerge-Triggered-By: GH:AlexWaygood
2022-11-09 14:05:07 -08:00
Nikita Sobolev e3b9832e57
gh-98884: [pathlib] remove hasattr check for lru_cache (#98885) 2022-11-03 17:14:12 +00:00
domragusa e089f23bbb
gh-84538: add strict argument to pathlib.PurePath.relative_to (GH-19813)
By default, :meth:`pathlib.PurePath.relative_to` doesn't deal with paths that are not a direct prefix of the other, raising an exception in that instance. This change adds a *walk_up* parameter that can be set to allow for using ``..`` to calculate the relative path.

example:
```
>>> p = PurePosixPath('/etc/passwd')
>>> p.relative_to('/etc')
PurePosixPath('passwd')
>>> p.relative_to('/usr')
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "pathlib.py", line 940, in relative_to
    raise ValueError(error_message.format(str(self), str(formatted)))
ValueError: '/etc/passwd' does not start with '/usr'
>>> p.relative_to('/usr', strict=False)
PurePosixPath('../etc/passwd')
```


https://bugs.python.org/issue40358

Automerge-Triggered-By: GH:brettcannon
2022-10-28 16:20:14 -07:00
Barney Gale 187949ebf2
gh-94909: fix joining of absolute and relative Windows paths in pathlib (GH-95450)
Have pathlib use `os.path.join()` to join arguments to the `PurePath` initialiser, which fixes a minor bug when handling relative paths with drives.

Previously:

```python
>>> from pathlib import PureWindowsPath
>>> a = 'C:/a/b'
>>> b = 'C:x/y'
>>> PureWindowsPath(a, b)
PureWindowsPath('C:x/y')
```

Now:

```python
>>> PureWindowsPath(a, b)
PureWindowsPath('C:/a/b/x/y')
```
2022-08-12 14:23:41 -07:00
Barney Gale 29650fea96
gh-86943: implement pathlib.WindowsPath.is_mount() (GH-31458)
Have `pathlib.WindowsPath.is_mount()` call `ntpath.ismount()`. Previously it raised `NotImplementedError` unconditionally.


https://bugs.python.org/issue42777
2022-08-05 15:37:44 -07:00
Stanislav Zmiev c1e929858a
gh-90385: Add pathlib.Path.walk() method (GH-92517)
Automerge-Triggered-By: GH:brettcannon
2022-07-22 16:55:46 -07:00
Barney Gale fd4a42d890
gh-82116: add comment explaining use of list(scandir_it) in pathlib. (GH-94939)
Automerge-Triggered-By: GH:brettcannon
2022-07-20 14:34:13 -07:00
Samuel Sloniker afd6a37ad1
gh-93654: Add module docstring to pathlib (GH-94611)
Issue: gh-93654
2022-07-07 12:59:29 -07:00
Barney Gale 2ba0fd5767
gh-81790: support "UNC" device paths in ntpath.splitdrive() (GH-91882) 2022-06-10 16:59:55 +01:00
Barney Gale f32e6b48d1
gh-93156 - fix negative indexing into absolute pathlib.PurePath().parents (GH-93273)
When a `_PathParents` object has a drive or a root, the length of the
object is *one less* than than the length of `self._parts`, which resulted
in an off-by-one error when `path.parents[-n]` was fed through to
`self._parts[:-n - 1]`. In particular, `path.parents[-1]` was a malformed
path object with spooky properties.

This is addressed by adding `len(self)` to negative indices.
2022-06-03 14:33:20 -07:00
Serhiy Storchaka 87f849c775
gh-92550: Fix pathlib.Path.rglob() for empty pattern (GH-92604) 2022-05-11 07:43:04 +03:00
Serhiy Storchaka b1c4368824
Revert "gh-92550 - Fix regression in pathlib.Path.rglob() (GH-92583)" (GH-92598)
This reverts commit dcdf250d2d.
2022-05-11 07:14:25 +03:00
Gregory P. Smith 07b34926d3
gh-84131: Remove the deprecated pathlib.Path.link_to method. (#92505)
Co-authored-by: Barney Gale <barney.gale@gmail.com>
2022-05-10 12:31:41 -07:00
Barney Gale dcdf250d2d
gh-92550 - Fix regression in pathlib.Path.rglob() (GH-92583)
We could try to remedy this by taking a slice, but we then run into an issue where the empty string will match altsep on POSIX. That rabbit hole could keep getting deeper.

A proper fix for the original issue involves making pathlib's path normalisation more configurable - in this case we want to retain trailing slashes, but in other we might want to preserve `./` prefixes, or elide `../` segments when we're sure we won't encounter symlinks.

This reverts commit ea2f5bcda1.
2022-05-09 17:12:16 -07:00
Eisuke Kawashima ea2f5bcda1
bpo-22276: Change pathlib.Path.glob not to ignore trailing path separator (GH-10349)
Now pathlib.Path.glob() **only** matches directories when the pattern ends in a path separator.

Co-authored-by: Serhiy Storchaka <storchaka@gmail.com>
2022-04-28 12:45:03 -07:00
Barney Gale 06e1701ad3
bpo-46556: emit DeprecationWarning from pathlib.Path.__enter__() (GH-30971)
In Python 3.9, Path.__exit__() was made a no-op and has never been documented.

Co-authored-by: Brett Cannon <brett@python.org>
2022-02-08 13:01:37 -08:00
Nikita Sobolev 7ffe7ba30f
bpo-46483: Remove __class_getitem__ from pathlib.PurePath (GH-30848)
Co-authored-by: Kumar Aditya <59607654+kumaraditya303@users.noreply.github.com>
2022-02-03 11:25:10 +02:00
Barney Gale 08f8301b21
bpo-43012: remove pathlib._Accessor (GH-25701)
Per Pitrou:

> The original intent for the “accessor” thing was to have a variant that did all accesses under a filesystem tree in a race condition-free way using openat and friends. It turned out to be much too hairy to actually implement, so was entirely abandoned, but the accessor abstraction was left there.

https://discuss.python.org/t/make-pathlib-extensible/3428/2

Accessors are:

- Lacking any internal purpose - '_NormalAccessor' is the only implementation
- Lacking any firm conceptual difference to `Path` objects themselves (inc. subclasses)
- Non-public, i.e. underscore prefixed - '_Accessor' and '_NormalAccessor' 
- Unofficially used to implement customized `Path` objects, but once once [bpo-24132]() is addressed there will be a supported route for that.

This patch preserves all existing behaviour.
2022-02-02 04:38:25 -08:00
Barney Gale 18cb2ef46c
bpo-29688: document and test pathlib.Path.absolute() (GH-26153)
Co-authored-by: Brett Cannon <brett@python.org>
Co-authored-by: Brian Helba <brian.helba@kitware.com>
2022-01-28 15:40:55 -08:00
Nikita Sobolev 1f715d5bd3
bpo-46483: change PurePath.__class_getitem__ to return GenericAlias (GH-30822) 2022-01-23 17:48:43 +03:00
Barney Gale a1c8841492
bpo-46316: optimize pathlib.Path.iterdir() (GH-30501)
`os.listdir()` doesn't return entries for `.` or `..`, so we don't need to
check for them here.
2022-01-20 13:20:00 -06:00
andrei kulakov 8d7644fa64
bpo-45853: Fix misspelling and unused import in pathlib (GH-30292) 2021-12-30 09:45:06 +02:00