The main readdir loop in read_directory_recursive() is replaced with a
new one that checks if cached results of a directory is still valid.
If a file is added or removed from the index, the containing directory
is invalidated (but not its subdirs). If directory's mtime is changed,
the same happens. If a .gitignore is updated, the containing directory
and all subdirs are invalidated recursively. If dir_struct#flags or
other conditions change, the cache is ignored.
If a directory is invalidated, we opendir/readdir/closedir and run the
exclude machinery on that directory listing as usual. If untracked
cache is also enabled, we'll update the cache along the way. If a
directory is validated, we simply pull the untracked listing out from
the cache. The cache also records the list of direct subdirs that we
have to recurse in. Fully excluded directories are seen as "untracked
files".
In the best case when no dirs are invalidated, read_directory()
becomes a series of
stat(dir), open(.gitignore), fstat(), read(), close() and optionally
hash_sha1_file()
For comparison, standard read_directory() is a sequence of
opendir(), readdir(), open(.gitignore), fstat(), read(), close(), the
expensive last_exclude_matching() and closedir().
We already try not to open(.gitignore) if we know it does not exist,
so open/fstat/read/close sequence does not apply to every
directory. The sequence could be reduced further, as noted in
prep_exclude() in another patch. So in theory, the entire best-case
read_directory sequence could be reduced to a series of stat() and
nothing else.
This is not a silver bullet approach. When you compile a C file, for
example, the old .o file is removed and a new one with the same name
created, effectively invalidating the containing directory's cache
(but not its subdirectories). If your build process touches every
directory, this cache adds extra overhead for nothing, so it's a good
idea to separate generated files from tracked files.. Editors may use
the same strategy for saving files. And of course you're out of luck
running your repo on an unsupported filesystem and/or operating system.
Helped-by: Eric Sunshine <sunshine@sunshineco.com>
Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Make sure the starting conditions and all global exclude files are
good to go. If not, either disable untracked cache completely, or wipe
out the cache and start fresh.
Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The idea is if we can capture all input and (non-rescursive) output of
read_directory_recursive(), and can verify later that all the input is
the same, then the second r_d_r() should produce the same output as in
the first run.
The requirement for this to work is stat info of a directory MUST
change if an entry is added to or removed from that directory (and
should not change often otherwise). If your OS and filesystem do not
meet this requirement, untracked cache is not for you. Most file
systems on *nix should be fine. On Windows, NTFS is fine while FAT may
not be [1] even though FAT on Linux seems to be fine.
The list of input of r_d_r() is in the big comment block in dir.h. In
short, the output of a directory (not counting subdirs) mainly depends
on stat info of the directory in question, all .gitignore leading to
it and the check_only flag when r_d_r() is called recursively. This
patch records all this info (and the output) as r_d_r() runs.
Two hash_sha1_file() are required for $GIT_DIR/info/exclude and
core.excludesfile unless their stat data matches. hash_sha1_file() is
only needed when .gitignore files in the worktree are modified,
otherwise their SHA-1 in index is used (see the previous patch).
We could store stat data for .gitignore files so we don't have to
rehash them if their content is different from index, but I think
.gitignore files are rarely modified, so not worth extra cache data
(and hashing penalty read-cache.c:verify_hdr(), as we will be storing
this as an index extension).
The implication is, if you change .gitignore, you better add it to the
index soon or you lose all the benefit of untracked cache because a
modified .gitignore invalidates all subdirs recursively. This is
especially bad for .gitignore at root.
This cached output is about untracked files only, not ignored files
because the number of tracked files is usually small, so small cache
overhead, while the number of ignored files could go really high
(e.g. *.o files mixing with source code).
[1] "Description of NTFS date and time stamps for files and folders"
http://support.microsoft.com/kb/299648
Helped-by: Torsten Bögershausen <tboegi@web.de>
Helped-by: David Turner <dturner@twopensource.com>
Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
This is not used anywhere yet. But the goal is to compare quickly if a
.gitignore file has changed when we have the SHA-1 of both old (cached
somewhere) and new (from index or a tree) versions.
Helped-by: Junio C Hamano <gitster@pobox.com>
Helped-by: Torsten Bögershausen <tboegi@web.de>
Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
This fixes a segfault in git-status with long paths on Windows,
where PATH_MAX is only 260.
This also fixes the problem of silently ignoring .gitignore if the
full path exceeds PATH_MAX. Now add_excludes_from_file() will report
if it gets ENAMETOOLONG.
Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
Signed-off-by: Karsten Blees <blees@dcon.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
There is no actual nested struct here. Move it out for clarity.
Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
Signed-off-by: Karsten Blees <blees@dcon.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
This patch activates the DO_MATCH_DIRECTORY code in m_p_i(), which
makes "git diff HEAD submodule/" and "git diff HEAD submodule" produce
the same output. Previously only the version without trailing slash
returns the difference (if any).
That's the effect of new ce_path_match(). dir_path_match() is not
executed by the new tests. And it should not introduce regressions.
Previously if path "dir/" is passed in with pathspec "dir/", they
obviously match. With new dir_path_match(), the path becomes
_directory_ "dir" vs pathspec "dir/", which is not executed by the old
code path in m_p_i(). The new code path is executed and produces the
same result.
The other case is pathspec "dir" and path "dir/" is now turned to
"dir" (with DO_MATCH_DIRECTORY). Still the same result before or after
the patch.
So why change? Because of the next patch about clean.c.
Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
A long time ago, for some reason I was not happy with
match_pathspec(). I created a better version, match_pathspec_depth()
that was suppose to replace match_pathspec()
eventually. match_pathspec() has finally been gone since 6 months
ago. Use the shorter name for match_pathspec_depth().
Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
This helps reduce the number of match_pathspec_depth() call sites and
show how m_p_d() is used. And it usage is:
- match against an index entry (ce_path_match or match_pathspec_depth
in ls-files)
- match against a dir_entry from read_directory (dir_path_match and
match_pathspec_depth in clean.c, which will be converted later)
- resolve-undo (rerere.c and ls-files.c)
Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
This helps reduce the number of match_pathspec_depth() call sites and
show how match_pathspec_depth() is used.
Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
"git ls-files -k" needs to crawl only the part of the working tree
that may overlap the paths in the index to find killed files, but
shared code with the logic to find all the untracked files, which
made it unnecessarily inefficient.
* jc/ls-files-killed-optim:
dir.c::test_one_path(): work around directory_exists_in_index_icase() breakage
t3010: update to demonstrate "ls-files -k" optimization pitfalls
ls-files -k: a directory only can be killed if the index has a non-directory
dir.c: use the cache_* macro to access the current index
"ls-files -o" and "ls-files -k" both traverse the working tree down
to find either all untracked paths or those that will be "killed"
(removed from the working tree to make room) when the paths recorded
in the index are checked out. It is necessary to traverse the
working tree fully when enumerating all the "other" paths, but when
we are only interested in "killed" paths, we can take advantage of
the fact that paths that do not overlap with entries in the index
can never be killed.
The treat_one_path() helper function, which is called during the
recursive traversal, is the ideal place to implement an
optimization.
When we are looking at a directory P in the working tree, there are
three cases:
(1) P exists in the index. Everything inside the directory P in
the working tree needs to go when P is checked out from the
index.
(2) P does not exist in the index, but there is P/Q in the index.
We know P will stay a directory when we check out the contents
of the index, but we do not know yet if there is a directory
P/Q in the working tree to be killed, so we need to recurse.
(3) P does not exist in the index, and there is no P/Q in the index
to require P to be a directory, either. Only in this case, we
know that everything inside P will not be killed without
recursing.
Note that this helper is called by treat_leading_path() that decides
if we need to traverse only subdirectories of a single common
leading directory, which is essential for this optimization to be
correct. This caller checks each level of the leading path
component from shallower directory to deeper ones, and that is what
allows us to only check if the path appears in the index. If the
call to treat_one_path() weren't there, given a path P/Q/R, the real
traversal may start from directory P/Q/R, even when the index
records P as a regular file, and we would end up having to check if
any leading subpath in P/Q/R, e.g. P, appears in the index.
Signed-off-by: Junio C Hamano <gitster@pobox.com>
:(glob)path differs from plain pathspec that it uses wildmatch with
WM_PATHNAME while the other uses fnmatch without FNM_PATHNAME. The
difference lies in how '*' (and '**') is processed.
With the introduction of :(glob) and :(literal) and their global
options --[no]glob-pathspecs, the user can:
- make everything literal by default via --noglob-pathspecs
--literal-pathspecs cannot be used for this purpose as it
disables _all_ pathspec magic.
- individually turn on globbing with :(glob)
- make everything globbing by default via --glob-pathspecs
- individually turn off globbing with :(literal)
The implication behind this is, there is no way to gain the default
matching behavior (i.e. fnmatch without FNM_PATHNAME). You either get
new globbing or literal. The old fnmatch behavior is considered
deprecated and discouraged to use.
Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
match_pathspec_depth was created to replace match_pathspec (see
61cf282 (pathspec: add match_pathspec_depth() - 2010-12-15). It took
more than two years, but the replacement finally happens :-)
Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The code now takes advantage of nowildcard_len field.
Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Currently to fill a struct pathspec, we do:
const char **paths;
paths = get_pathspec(prefix, argv);
...
init_pathspec(&pathspec, paths);
"paths" can only carry bare strings, which loses information from
command line arguments such as pathspec magic or the prefix part's
length for each argument.
parse_pathspec() is introduced to combine the two calls into one. The
plan is gradually replace all get_pathspec() and init_pathspec() with
parse_pathspec(). get_pathspec() now becomes a thin wrapper of
parse_pathspec().
parse_pathspec() allows the caller to reject the pathspec magics that
it does not support. When a new pathspec magic is introduced, we can
enable it per command after making sure that all underlying code has no
problem with the new magic.
"flags" parameter is currently unused. But it would allow callers to
pass certain instructions to parse_pathspec, for example forcing
literal pathspec when no magic is used.
With the introduction of parse_pathspec, there are now two functions
that can initialize struct pathspec: init_pathspec and
parse_pathspec. Any semantic changes in struct pathspec must be
reflected in both functions. init_pathspec() will be phased out in
favor of parse_pathspec().
Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
'git-status --ignored' still scans the work tree twice to collect
untracked and ignored files, respectively.
fill_directory / read_directory already supports collecting untracked and
ignored files in a single directory scan. However, the DIR_COLLECT_IGNORED
flag to enable this has some git-add specific side-effects (e.g. it
doesn't recurse into ignored directories, so listing ignored files with
--untracked=all doesn't work).
The DIR_SHOW_IGNORED flag doesn't list untracked files and returns ignored
files in dir_struct.entries[] (instead of dir_struct.ignored[] as
DIR_COLLECT_IGNORED). DIR_SHOW_IGNORED is used all throughout git.
We don't want to break the existing API, so lets introduce a new flag
DIR_SHOW_IGNORED_TOO that lists untracked as well as ignored files similar
to DIR_COLLECT_FILES, but will recurse into sub-directories based on the
other flags as DIR_SHOW_IGNORED does.
In dir.c::read_directory_recursive, add ignored files to either
dir_struct.entries[] or dir_struct.ignored[] based on the flags. Also move
the DIR_COLLECT_IGNORED case here so that filling result lists is in a
common place.
In wt-status.c::wt_status_collect_untracked, use the new flag and read
results from dir_struct.ignored[]. Remove the extra fill_directory call.
builtin/check-ignore.c doesn't call fill_directory, setting the git-add
specific DIR_COLLECT_IGNORED flag has no effect here. Remove for clarity.
Update API documentation to reflect the changes.
Performance: with this patch, 'git-status --ignored' is typically as fast
as 'git-status'.
Signed-off-by: Karsten Blees <blees@dcon.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The is_excluded and is_path_excluded APIs are very similar, except for a
few noteworthy differences:
is_excluded doesn't handle ignored directories, results for paths within
ignored directories are incorrect. This is probably based on the premise
that recursive directory scans should stop at ignored directories, which
is no longer true (in certain cases, read_directory_recursive currently
calls is_excluded *and* is_path_excluded to get correct ignored state).
is_excluded caches parsed .gitignore files of the last directory in struct
dir_struct. If the directory changes, it finds a common parent directory
and is very careful to drop only as much state as necessary. On the other
hand, is_excluded will also read and parse .gitignore files in already
ignored directories, which are completely irrelevant.
is_path_excluded correctly handles ignored directories by checking if any
component in the path is excluded. As it uses is_excluded internally, this
unfortunately forces is_excluded to drop and re-read all .gitignore files,
as there is no common parent directory for the root dir.
is_path_excluded tracks state in a separate struct path_exclude_check,
which is essentially a wrapper of dir_struct with two more fields. However,
as is_path_excluded also modifies dir_struct, it is not possible to e.g.
use multiple path_exclude_check structures with the same dir_struct in
parallel. The additional structure just unnecessarily complicates the API.
Teach is_excluded / prep_exclude about ignored directories: whenever
entering a new directory, first check if the entire directory is excluded.
Remember the excluded state in dir_struct. Don't traverse into already
ignored directories (i.e. don't read irrelevant .gitignore files).
Directories could also be excluded by exclude patterns specified on the
command line or .git/info/exclude, so we cannot simply skip prep_exclude
entirely if there's no .gitignore file name (dir_struct.exclude_per_dir).
Move this check to just before actually reading the file.
is_path_excluded is now equivalent to is_excluded, so we can simply
redirect to it (the public API is cleaned up in the next patch).
The performance impact of the additional ignored check per directory is
hardly noticeable when reading directories recursively (e.g. 'git status').
However, performance of git commands using the is_path_excluded API (e.g.
'git ls-files --cached --ignored --exclude-standard') is greatly improved
as this no longer re-reads .gitignore files on each call.
Here's some performance data from the linux and WebKit repos (best of 10
runs on a Debian Linux on SSD, core.preloadIndex=true):
| ls-files -ci | status | status --ignored
| linux | WebKit | linux | WebKit | linux | WebKit
-------+-------+--------+-------+--------+-------+---------
before | 0.506 | 6.539 | 0.212 | 1.555 | 0.323 | 2.541
after | 0.080 | 1.191 | 0.218 | 1.583 | 0.321 | 2.579
gain | 6.325 | 5.490 | 0.972 | 0.982 | 1.006 | 0.985
Signed-off-by: Karsten Blees <blees@dcon.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Add a new command "git check-ignore" for debugging .gitignore
files.
The variable names may want to get cleaned up but that can be done
in-tree.
* as/check-ignore:
clean.c, ls-files.c: respect encapsulation of exclude_list_groups
t0008: avoid brace expansion
add git-check-ignore sub-command
setup.c: document get_pathspec()
add.c: extract new die_if_path_beyond_symlink() for reuse
add.c: extract check_path_for_gitlink() from treat_gitlinks() for reuse
pathspec.c: rename newly public functions for clarity
add.c: move pathspec matchers into new pathspec.c for reuse
add.c: remove unused argument from validate_pathspec()
dir.c: improve docs for match_pathspec() and match_pathspec_depth()
dir.c: provide clear_directory() for reclaiming dir_struct memory
dir.c: keep track of where patterns came from
dir.c: use a single struct exclude_list per source of excludes
Conflicts:
builtin/ls-files.c
dir.c
Refactor and generally clean up the directory traversal API
implementation.
* as/dir-c-cleanup:
dir.c: rename free_excludes() to clear_exclude_list()
dir.c: refactor is_path_excluded()
dir.c: refactor is_excluded()
dir.c: refactor is_excluded_from_list()
dir.c: rename excluded() to is_excluded()
dir.c: rename excluded_from_list() to is_excluded_from_list()
dir.c: rename path_excluded() to is_path_excluded()
dir.c: rename cryptic 'which' variable to more consistent name
Improve documentation and comments regarding directory traversal API
api-directory-listing.txt: update to match code
Fix a grammatical issue in the description of these functions, and
make it more obvious how and why seen[] can be reused across multiple
invocations.
Signed-off-by: Adam Spiers <git@adamspiers.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
By the end of a directory traversal, a dir_struct instance will
typically contains pointers to various data structures on the heap.
clear_directory() provides a convenient way to reclaim that memory.
Signed-off-by: Adam Spiers <git@adamspiers.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
For exclude patterns read in from files, the filename is stored in the
exclude list, and the originating line number is stored in the
individual exclude (counting starting at 1).
For exclude patterns provided on the command line, a string describing
the source of the patterns is stored in the exclude list, and the
sequence number assigned to each exclude pattern is negative, with
counting starting at -1. So for example the 2nd pattern provided via
--exclude would be numbered -2. This allows any future consumers of
that data to easily distinguish between exclude patterns from files
vs. from the CLI.
Signed-off-by: Adam Spiers <git@adamspiers.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Previously each exclude_list could potentially contain patterns
from multiple sources. For example dir->exclude_list[EXC_FILE]
would typically contain patterns from .git/info/exclude and
core.excludesfile, and dir->exclude_list[EXC_DIRS] could contain
patterns from multiple per-directory .gitignore files during
directory traversal (i.e. when dir->exclude_stack was more than
one item deep).
We split these composite exclude_lists up into three groups of
exclude_lists (EXC_CMDL / EXC_DIRS / EXC_FILE as before), so that each
exclude_list now contains patterns from a single source. This will
allow us to cleanly track the origin of each pattern simply by adding
a src field to struct exclude_list, rather than to struct exclude,
which would make memory management of the source string tricky in the
EXC_DIRS case where its contents are dynamically generated.
Similarly, by moving the filebuf member from struct exclude_stack to
struct exclude_list, it allows us to track and subsequently free
memory buffers allocated during the parsing of all exclude files,
rather than only tracking buffers allocated for files in the EXC_DIRS
group.
Signed-off-by: Adam Spiers <git@adamspiers.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
It is clearer to use a 'clear_' prefix for functions which empty
and deallocate the contents of a data structure without freeing
the structure itself, and a 'free_' prefix for functions which
also free the structure itself.
http://article.gmane.org/gmane.comp.version-control.git/206128
Signed-off-by: Adam Spiers <git@adamspiers.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
In a similar way to the previous commit, this extracts a new helper
function last_exclude_matching_path() which return the last
exclude_list element which matched, or NULL if no match was found.
is_path_excluded() becomes a wrapper around this, and just returns 0
or 1 depending on whether any matching exclude_list element was found.
This allows callers to find out _why_ a given path was excluded,
rather than just whether it was or not, paving the way for a new git
sub-command which allows users to test their exclude lists from the
command line.
Signed-off-by: Adam Spiers <git@adamspiers.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Continue adopting clearer names for exclude functions. This is_*
naming pattern for functions returning booleans was discussed here:
http://thread.gmane.org/gmane.comp.version-control.git/204661/focus=204924
Signed-off-by: Adam Spiers <git@adamspiers.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Continue adopting clearer names for exclude functions. This 'is_*'
naming pattern for functions returning booleans was discussed here:
http://thread.gmane.org/gmane.comp.version-control.git/204661/focus=204924
Also adjust their callers as necessary.
Signed-off-by: Adam Spiers <git@adamspiers.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Start adopting clearer names for exclude functions. This 'is_*'
naming pattern for functions returning booleans was agreed here:
http://thread.gmane.org/gmane.comp.version-control.git/204661/focus=204924
Signed-off-by: Adam Spiers <git@adamspiers.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
'el' is only *slightly* less cryptic, but is already used as the
variable name for a struct exclude_list pointer in numerous other
places, so this reduces the number of cryptic variable names in use by
one :-)
Signed-off-by: Adam Spiers <git@adamspiers.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
traversal API has a few potentially confusing properties. These
comments clarify a few key aspects and will hopefully make it easier
to understand for other newcomers in the future.
Signed-off-by: Adam Spiers <git@adamspiers.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
When a pattern contains only a single asterisk as wildcard,
e.g. "foo*bar", after literally comparing the leading part "foo" with
the string, we can compare the tail of the string and make sure it
matches "bar", instead of running fnmatch() on "*bar" against the
remainder of the string.
-O2 build on linux-2.6, without the patch:
$ time git rev-list --quiet HEAD -- '*.c'
real 0m40.770s
user 0m40.290s
sys 0m0.256s
With the patch
$ time ~/w/git/git rev-list --quiet HEAD -- '*.c'
real 0m34.288s
user 0m33.997s
sys 0m0.205s
The above command is not supposed to be widely popular. It's chosen
because it exercises pathspec matching a lot. The point is it cuts
down matching time for popular patterns like *.c, which could be used
as pathspec in other places.
Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
.gitattributes and .gitignore share the same pattern syntax but has
separate matching implementation. Over the years, ignore's
implementation accumulates more optimizations while attr's stays the
same.
This patch reuses the core matching functions that are also used by
excluded_from_list. excluded_from_list and path_matches can't be
merged due to differences in exclude and attr, for example:
* "!pattern" syntax is forbidden in .gitattributes. As an attribute
can be unset (i.e. set to a special value "false") or made back to
unspecified (i.e. not even set to "false"), "!pattern attr" is unclear
which one it means.
* we support attaching attributes to directories, but git-core
internally does not currently make use of attributes on
directories.
Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
This function can later be reused by attr.c. Also turn to_exclude
field into a flag.
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
* commit 'f9f6e2c':
exclude: do strcmp as much as possible before fnmatch
dir.c: get rid of the wildcard symbol set in no_wildcard()
Unindent excluded_from_list()
"git ls-files --exclude=t -i" did not consider anything under t/ as
excluded, as it did not pay attention to exclusion of leading paths
while walking the index. Other two users of excluded() are also
updated.
* jc/ls-files-i-dir:
dir.c: make excluded() file scope static
unpack-trees.c: use path_excluded() in check_ok_to_remove()
builtin/add.c: use path_excluded()
path_excluded(): update API to less cache-entry centric
ls-files -i: micro-optimize path_excluded()
ls-files -i: pay attention to exclusion of leading paths
this also avoids calling fnmatch() if the non-wildcard prefix is
longer than basename
Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
It was stupid of me to make the API too much cache-entry specific;
the caller may want to check arbitrary pathname without having a
corresponding cache-entry to see if a path is ignored.
Signed-off-by: Junio C Hamano <gitster@pobox.com>
"git ls-files --exclude=t/ -i" does not show paths in directory t/
that have been added to the index, but it should.
The excluded() API was designed for callers who walk the tree from
the top, checking each level of the directory hierarchy as it
descends if it is excluded, and not even bothering to recurse into
an excluded directory. This would allow us optimize for a common
case by not having to check if the exclude pattern "foo/" matches
when looking at "foo/bar", because the caller should have noticed
that "foo" is excluded and did not even bother to read "foo/bar"
out of opendir()/readdir() to call it.
The code for "ls-files -i" however walks the index linearly, feeding
paths without checking if the leading directory is already excluded.
Introduce a helper function path_excluded() to let this caller
properly call excluded() check for higher hierarchies as necessary.
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Add the REMOVE_DIR_KEEP_TOPLEVEL flag to remove_dir_recursively() for
deleting everything inside the given directory, but _not_ the given
directory itself.
Note that this does not pass the REMOVE_DIR_KEEP_NESTED_GIT flag, if set,
to the recursive invocations of remove_dir_recursively(). It is likely to
be a a bug that has been present since REMOVE_DIR_KEEP_NESTED_GIT was
introduced (a0f4afb), but this commit keeps the same behaviour for now.
Signed-off-by: Johan Herland <johan@herland.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Also make common_prefix_len() static as this refactoring makes dir.c
itself the only caller of this helper function.
Signed-off-by: Clemens Buchacher <drizzd@aon.at>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The implementation from pathspec_prefix (slightly modified) replaces the
current common_prefix, because it also respects glob characters.
Based on a patch by Clemens Buchacher.
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Function dir_inside_of() does something similar (correctly), but looks
easier to understand and does not bundle cwd to its business. Given
get_relative_cwd's only user is is_inside_dir, we can kill it for
good.
Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The same old problem reappears after setup code is reworked. We tend
to assume there is at least one path component in a path and forget
that path can be simply '/'.
Reported-by: Matthijs Kooijman <matthijs@stdin.nl>
Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
match_pathspec_depth() is a clone of match_pathspec() except that it
can take depth limit. Computation is a bit lighter compared to
match_pathspec() because it's usually precomputed and stored in struct
pathspec.
In long term, match_pathspec() and match_one() should be removed in
favor of this function.
Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
This is needed to replace pathspec_matches() in builtin/grep.c.
max_depth == -1 means infinite depth. Depth limit is only effective
when pathspec.recursive == 1. When pathspec.recursive == 0, the
behavior depends on match functions: non-recursive for
tree_entry_interesting() and recursive for match_pathspec{,_depth}
Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
* nd/maint-fix-add-typo-detection:
Revert "excluded_1(): support exclude files in index"
unpack-trees: fix sparse checkout's "unable to match directories"
unpack-trees: move all skip-worktree checks back to unpack_trees()
dir.c: add free_excludes()
cache.h: realign and use (1 << x) form for CE_* constants
Multiple locations within this patch series alter a case sensitive
string comparison call such as strcmp() to be a call to a string
comparison call that selects case comparison based on the global
ignore_case variable. Behaviorally, when core.ignorecase=false, the
*_icase() versions are functionally equivalent to their C runtime
counterparts. When core.ignorecase=true, the *_icase() versions perform
a case insensitive comparison.
Like Linus' earlier ignorecase patch, these may ignore filename
conventions on certain file systems. By isolating filename comparisons
to certain functions, support for those filename conventions may be more
easily met.
Signed-off-by: Joshua Jensen <jjensen@workspacewhiz.com>
Signed-off-by: Johannes Sixt <j6t@kdbg.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Sometimes it is useful to know if a file or directory will be ignored
before it is added to the work tree. An example is "git submodule add",
where it would be really nice to be able to fail with an appropriate
error message before the submodule is cloned and checked out.
Signed-off-by: Jens Lehmann <Jens.Lehmann@web.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
* nd/sparse: (25 commits)
t7002: test for not using external grep on skip-worktree paths
t7002: set test prerequisite "external-grep" if supported
grep: do not do external grep on skip-worktree entries
commit: correctly respect skip-worktree bit
ie_match_stat(): do not ignore skip-worktree bit with CE_MATCH_IGNORE_VALID
tests: rename duplicate t1009
sparse checkout: inhibit empty worktree
Add tests for sparse checkout
read-tree: add --no-sparse-checkout to disable sparse checkout support
unpack-trees(): ignore worktree check outside checkout area
unpack_trees(): apply $GIT_DIR/info/sparse-checkout to the final index
unpack-trees(): "enable" sparse checkout and load $GIT_DIR/info/sparse-checkout
unpack-trees.c: generalize verify_* functions
unpack-trees(): add CE_WT_REMOVE to remove on worktree alone
Introduce "sparse checkout"
dir.c: export excluded_1() and add_excludes_from_file_1()
excluded_1(): support exclude files in index
unpack-trees(): carry skip-worktree bit over in merged_entry()
Read .gitignore from index if it is skip-worktree
Avoid writing to buffer in add_excludes_from_file_1()
...
Conflicts:
.gitignore
Documentation/config.txt
Documentation/git-update-index.txt
Makefile
entry.c
t/t7002-grep.sh
These functions are used to handle .gitignore. They are now exported
so that sparse checkout can reuse.
Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
When you have an embedded git work tree in your work tree (be it
an orphaned submodule, or an independent checkout of an unrelated
project), "git clean -d -f" blindly descended into it and removed
everything. This is rarely what the user wants.
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Stop the insanity with separate 'path' and 'base' arguments that must
match. We don't need that crazy interface any more, since we cleaned up
handling of 'path' in commit da4b3e8c28.
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Most of the users of "read_directory()" actually want a much simpler
interface than the whole complex (but rather powerful) one.
In fact 'git add' had already largely abstracted out the core interface
issues into a private "fill_directory()" function that was largely
applicable almost as-is to a number of callers. Yes, 'git add' wants to
do some extra work of its own, specific to the add semantics, but we can
easily split that out, and use the core as a generic function.
This function does exactly that, and now that much simplified
'fill_directory()' function can be shared with a number of callers,
while also ensuring that the rather more complex calling conventions of
read_directory() are used by fewer call-sites.
This also makes the 'common_prefix()' helper function private to dir.c,
since all callers are now in that file.
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
By having flags represented as bits in the new member variable 'flags',
it will be easier to use parse_options when dir_struct is involved.
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The die() message updated accordingly.
The previous behaviour was to only allow cloning when the destination
directory doesn't exist.
[jc: added trivial tests]
Signed-off-by: Alexander Potashev <aspotashev@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
A new inline function is_dot_or_dotdot is used to check if the
directory name is either "." or "..". It returns a non-zero value if
the given string is "." or "..". It's applicable to a lot of Git
source code.
Signed-off-by: Alexander Potashev <aspotashev@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
These functions are not used by any other file.
Signed-off-by: Nanako Shiraishi <nanako3@lavabit.com>
Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
The function has two potential users which both managed to get wrong
their implementations (the one in builtin-rm.c one has a memleak, and
builtin-merge-recursive.c scribles over its const argument).
Signed-off-by: Alex Riesen <raa.lkml@gmail.com>
Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
When we process "foo/" entries in gitignore files on a system
that does not have d_type member in "struct dirent", the earlier
implementation ran lstat(2) separately when matching with
entries that came from the command line, in-tree .gitignore
files, and $GIT_DIR/info/excludes file.
This optimizes it by delaying the lstat(2) call until it becomes
absolutely necessary.
The initial idea for this change was by Jeff King, but I
optimized it further to pass pointers to around.
Signed-off-by: Junio C Hamano <gitster@pobox.com>
A pattern "foo/" in the exclude list did not match directory
"foo", but a pattern "foo" did. This attempts to extend the
exclude mechanism so that it would while not matching a regular
file or a symbolic link "foo". In order to differentiate a
directory and non directory, this passes down the type of path
being checked to excluded() function.
A downside is that the recursive directory walk may need to run
lstat(2) more often on systems whose "struct dirent" do not give
the type of the entry; earlier it did not have to do so for an
excluded path, but we now need to figure out if a path is a
directory before deciding to exclude it. This is especially bad
because an idea similar to the earlier CE_UPTODATE optimization
to reduce number of lstat(2) calls would by definition not apply
to the codepaths involved, as (1) directories will not be
registered in the index, and (2) excluded paths will not be in
the index anyway.
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Operations that walk directories or trees, which potentially need to
consult the .gitignore files, used to always try to open the .gitignore
file every time they entered a new directory, even when they ended up
not needing to call excluded() function to see if a path in the
directory is ignored. This was done by push/pop exclude_per_directory()
functions that managed the data in a stack.
This changes the directory walking API to remove the need to call these
two functions. Instead, the directory walk data structure caches the
data used by excluded() function the last time, and lazily reuses it as
much as possible. Among the data the last check used, the ones from
deeper directories that the path we are checking is outside are
discarded, data from the common leading directories are reused, and then
the directories between the common directory and the directory the path
being checked is in are checked for .gitignore file. This is very
similar to the way gitattributes are handled.
This API change also fixes "ls-files -c -i", which called excluded()
without setting up the gitignore data via the old push/pop functions.
Signed-off-by: Junio C Hamano <gitster@pobox.com>
There are inconsistencies in the way commands currently handle
the core.excludesfile configuration variable. The problem is
the variable is too new to be noticed by anything other than
git-add and git-status.
* git-ls-files does not notice any of the "ignore" files by
default, as it predates the standardized set of ignore files.
The calling scripts established the convention to use
.git/info/exclude, .gitignore, and later core.excludesfile.
* git-add and git-status know about it because they call
add_excludes_from_file() directly with their own notion of
which standard set of ignore files to use. This is just a
stupid duplication of code that need to be updated every time
the definition of the standard set of ignore files is
changed.
* git-read-tree takes --exclude-per-directory=<gitignore>,
not because the flexibility was needed. Again, this was
because the option predates the standardization of the ignore
files.
* git-merge-recursive uses hardcoded per-directory .gitignore
and nothing else. git-clean (scripted version) does not
honor core.* because its call to underlying ls-files does not
know about it. git-clean in C (parked in 'pu') doesn't either.
We probably could change git-ls-files to use the standard set
when no excludes are specified on the command line and ignore
processing was asked, or something like that, but that will be a
change in semantics and might break people's scripts in a subtle
way. I am somewhat reluctant to make such a change.
On the other hand, I think it makes perfect sense to fix
git-read-tree, git-merge-recursive and git-clean to follow the
same rule as other commands. I do not think of a valid use case
to give an exclude-per-directory that is nonstandard to
read-tree command, outside a "negative" test in the t1004 test
script.
This patch is the first step to untangle this mess.
The next step would be to teach read-tree, merge-recursive and
clean (in C) to use setup_standard_excludes().
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Try to avoid a lot of work scanning for excluded files,
by caching some more information when setting up the exclusion
data structure.
Speeds up 'git runstatus' on a repository containing the Qt sources by 30% and
reduces the amount of instructions executed (as measured by valgrind) by a
factor of 2.
Signed-off-by: Junio C Hamano <gitster@pobox.com>
There was a function called remove_empty_dir_recursive() buried
in refs.c. Expose a slightly enhanced version in dir.h: it can now
optionally remove a non-empty directory.
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The function get_relative_cwd() works just as getcwd(), only that it
takes an absolute path as additional parameter, returning the prefix
of the current working directory relative to the given path. If the
cwd is no subdirectory of the given path, it returns NULL.
is_inside_dir() is just a trivial wrapper over get_relative_cwd().
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Previously, the code would always set up the excludes, and then manually
pick through the pathspec we were given, assuming that non-added but
existing paths were just ignored. This was mostly correct, but would
erroneously mark a totally empty directory as 'ignored'.
Instead, we now use the collect_ignored option of dir_struct, which
unambiguously tells us whether a path was ignored. This simplifies the
code, and means empty directories are now just not mentioned at all.
Furthermore, we now conditionally ask dir_struct to respect excludes,
depending on whether the '-f' flag has been set. This means we don't have
to pick through the result, checking for an 'ignored' flag; ignored entries
were either added or not in the first place.
We can safely get rid of the special 'ignored' flags to dir_entry, which
were not used anywhere else.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Jonas Fonseca <fonseca@diku.dk>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
When set, this option will cause read_directory to keep
track of which entries were ignored. While this shouldn't
effect functionality in most cases, it can make warning
messages to the user much more useful.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
This is the promised cleaned-up version of teaching directory traversal
(ie the "read_directory()" logic) about subprojects. That makes "git add"
understand to add/update subprojects.
It now knows to look at the index file to see if a directory is marked as
a subproject, and use that as information as whether it should be recursed
into or not.
It also generally cleans up the handling of directory entries when
traversing the working tree, by splitting up the decision-making process
into small functions of their own, and adding a fair number of comments.
Finally, it teaches "add_file_to_cache()" that directory names can have
slashes at the end, since the directory traversal adds them to make the
difference between a file and a directory clear (it always did that, but
my previous too-ugly-to-apply subproject patch had a totally different
path for subproject directories and avoided the slash for that case).
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Junio C Hamano <junkio@cox.net>
The way things are set up, you can now pass a "pathspec" to the
"read_directory()" function. If you pass NULL, it acts exactly
like it used to do (read everything). If you pass a non-NULL
pointer, it will simplify it into a "these are the prefixes
without any special characters", and stop any readdir() early if
the path in question doesn't match any of the prefixes.
NOTE! This does *not* obviate the need for the caller to do the *exact*
pathspec match later. It's a first-level filter on "read_directory()", but
it does not do the full pathspec thing. Maybe it should. But in the
meantime, builtin-add.c really does need to do first
read_directory(dir, .., pathspec);
if (pathspec)
prune_directory(dir, pathspec, baselen);
ie the "prune_directory()" part will do the *exact* pathspec pruning,
while the "read_directory()" will use the pathspec just to do some quick
high-level pruning of the directories it will recurse into.
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Junio C Hamano <junkio@cox.net>
When '*.ig' is ignored, and you have two files f.ig and d.ig/foo
in the working tree,
$ git add .
correctly ignored f.ig but failed to ignore d.ig/foo. This was
caused by a thinko in an earlier commit 4888c534, when we tried
to allow adding otherwise ignored files.
After reverting that commit, this takes a much simpler approach.
When we have an unmatched pathspec that talks about an existing
pathname, we know it is an ignored path the user tried to add,
so we include it in the set of paths directory walker returned.
This does not let you say "git add -f D" on an ignored directory
D and add everything under D. People can submit a patch to
further allow it if they want to, but I think it is a saner
behaviour to require explicit paths to be spelled out in such a
case.
Signed-off-by: Junio C Hamano <junkio@cox.net>
This updates the return value from match_pathspec() so that the
caller can tell cases between exact match, leading pathname
match (i.e. file "foo/bar" matches a pathspec "foo"), or
filename glob match. This can be used to prevent "rm dir" from
removing "dir/file" without explicitly asking for recursive
behaviour with -r flag, for example.
Signed-off-by: Junio C Hamano <junkio@cox.net>
This follows up commit ed93b449 where we removed overcautious
"working file will be lost" check.
A new option "--exclude-per-directory=.gitignore" can be used to
tell the "git-read-tree" command that the user does not mind
losing contents in untracked files in the working tree, if they
need to be overwritten by a merge (either a two-way "switch
branches" merge, or a three-way merge).
Signed-off-by: Junio C Hamano <junkio@cox.net>
This creates a new git-runstatus which should do roughly the same thing
as the run_status function from git-commit.sh. Except for color support,
the main focus has been to keep the output identical, so that it can be
verified as correct and then used as a C platform for other improvements to
the status printing code.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <junkio@cox.net>
This moves the code to add the per-directory ignore files for the base
directory into the library routine.
That not only allows us to turn the function push_exclude_per_directory()
static again, it also simplifies the library interface a lot (the caller
no longer needs to worry about any of the per-directory exclude files at
all).
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Signed-off-by: Junio C Hamano <junkio@cox.net>
This moves the core directory traversal and filename exclusion logic
into the general git library, making it available for other users
directly.
If we ever want to do "git commit" or "git add" as a built-in (and we
do), we want to be able to handle most of git-ls-files as a library.
NOTE! Not all of git-ls-files is libified by this. The index matching
and pathspec prefix calculation is still in ls-files.c, but this is a
big part of it.
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Signed-off-by: Junio C Hamano <junkio@cox.net>