Update various codepaths to avoid manually-counted malloc().
* jk/tighten-alloc: (22 commits)
ewah: convert to REALLOC_ARRAY, etc
convert ewah/bitmap code to use xmalloc
diff_populate_gitlink: use a strbuf
transport_anonymize_url: use xstrfmt
git-compat-util: drop mempcpy compat code
sequencer: simplify memory allocation of get_message
test-path-utils: fix normalize_path_copy output buffer size
fetch-pack: simplify add_sought_entry
fast-import: simplify allocation in start_packfile
write_untracked_extension: use FLEX_ALLOC helper
prepare_{git,shell}_cmd: use argv_array
use st_add and st_mult for allocation size computation
convert trivial cases to FLEX_ARRAY macros
use xmallocz to avoid size arithmetic
convert trivial cases to ALLOC_ARRAY
convert manual allocations to argv_array
argv-array: add detach function
add helpers for allocating flex-array structs
harden REALLOC_ARRAY and xcalloc against size_t overflow
tree-diff: catch integer overflow in combine_diff_path allocation
...
We perform unchecked additions when computing the size of a
"struct ondisk_untracked_cache". This is unlikely to have an
integer overflow in practice, but we'd like to avoid this
dangerous pattern to make further audits easier.
Note that there's one subtlety here, though. We protect
ourselves against a NULL exclude_per_dir entry in our
source, and avoid calling strlen() on it, keeping "len" at
0. But later, we unconditionally memcpy "len + 1" bytes to
get the trailing NUL byte. If we did have a NULL
exclude_per_dir, we would read from bogus memory.
As it turns out, though, we always create this field
pointing to a string literal, so there's no bug. We can just
get rid of the pointless extra conditional.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
If our size computation overflows size_t, we may allocate a
much smaller buffer than we expected and overflow it. It's
probably impossible to trigger an overflow in most of these
sites in practice, but it is easy enough convert their
additions and multiplications into overflow-checking
variants. This may be fixing real bugs, and it makes
auditing the code easier.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Using FLEX_ARRAY macros reduces the amount of manual
computation size we have to do. It also ensures we don't
overflow size_t, and it makes sure we write the same number
of bytes that we allocated.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
We frequently allocate strings as xmalloc(len + 1), where
the extra 1 is for the NUL terminator. This can be done more
simply with xmallocz, which also checks for integer
overflow.
There's no case where switching xmalloc(n+1) to xmallocz(n)
is wrong; the result is the same length, and malloc made no
guarantees about what was in the buffer anyway. But in some
cases, we can stop manually placing NUL at the end of the
allocated buffer. But that's only safe if it's clear that
the contents will always fill the buffer.
In each case where this patch does so, I manually examined
the control flow, and I tried to err on the side of caution.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Each of these cases can be converted to use ALLOC_ARRAY or
REALLOC_ARRAY, which has two advantages:
1. It automatically checks the array-size multiplication
for overflow.
2. It always uses sizeof(*array) for the element-size,
so that it can never go out of sync with the declared
type of the array.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
If there is a pattern "!foo/bar", this patch makes it not exclude
"foo" right away. This gives us a chance to examine "foo" and
re-include "foo/bar".
Helped-by: brian m. carlson <sandals@crustytoothpaste.net>
Helped-by: Micha Wiedenmann <mw-u2@gmx.de>
Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Given path "a" and the pattern "a", it's matched. But if we throw path
"a/b" to pattern "a", the code fails to realize that if "a" matches
"a" then "a/b" should also be matched.
When the pattern is matched the first time, we can mark it "sticky", so
that all files and dirs inside the matched path also matches. This is a
simpler solution than modify all match scenarios to fix that.
Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Given the pattern "1/2/3/4" and the path "1/2/3/4/f", the pattern
prefix is "1/2/3/4". We will compare and remove the prefix from both
pattern and path and come to this code
/*
* If the whole pattern did not have a wildcard,
* then our prefix match is all we need; we
* do not need to call fnmatch at all.
*/
if (!patternlen && !namelen)
return 1;
where patternlen is zero (full pattern consumed) and the remaining
path in "name" is "/f". We fail to realize it's matched in this case
and fall back to fnmatch(), which also fails to catch it. Fix it.
Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The ignore mechanism saw a few regressions around untracked file
listing and sparse checkout selection areas in 2.7.0; the change
that is responsible for the regression has been reverted.
* nd/exclusion-regression-fix:
Revert "dir.c: don't exclude whole dir prematurely if neg pattern may match"
The "exclude_list" structure has the usual "alloc, nr" pair of
fields to be used by ALLOC_GROW(), but clear_exclude_list() forgot
to reset 'alloc' to 0 when it cleared 'nr' to discard the managed
array.
* nd/dir-exclude-cleanup:
dir.c: clean the entire struct in clear_exclude_list()
It is not a good idea to compare kernel versions and disable
the untracked cache if it changes, as people may upgrade and
still want the untracked cache to work. So let's just
compare work tree locations and kernel name to decide if we
should disable it.
Also storing many locations in the ident field and comparing
to any of them can be dangerous if GIT_WORK_TREE is used with
different values. So let's just store one location, the
location of the current work tree.
The downside is that untracked cache can only be used by one
type of OS for now. Exporting a git repo to different clients
via a network to e.g. Linux and Windows means that only one
can use the untracked cache.
If the location changed in the ident field and we still want
an untracked cache, let's delete the cache and recreate it.
Note that if an untracked cache has been created by a
previous Git version, then the kernel version is stored in
the ident field. As we now compare with just the kernel
name the comparison will fail and the untracked cache will
be disabled until it's recreated.
Helped-by: Torsten Bögershausen <tboegi@web.de>
Signed-off-by: Christian Couder <chriscool@tuxfamily.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Factor out code into remove_untracked_cache(), which will be used
in a later commit.
Signed-off-by: Christian Couder <chriscool@tuxfamily.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Factor out code into new_untracked_cache() and
add_untracked_cache(), which will be used
in later commits.
Helped-by: Eric Sunshine <sunshine@sunshineco.com>
Signed-off-by: Christian Couder <chriscool@tuxfamily.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The ignore mechanism saw a few regressions around untracked file
listing and sparse checkout selection areas in 2.7.0; the change
that is responsible for the regression has been reverted.
* nd/exclusion-regression-fix:
Revert "dir.c: don't exclude whole dir prematurely if neg pattern may match"
The "exclude_list" structure has the usual "alloc, nr" pair of
fields to be used by ALLOC_GROW(), but clear_exclude_list() forgot
to reset 'alloc' to 0 when it cleared 'nr'to discard the managed
array.
* nd/dir-exclude-cleanup:
dir.c: clean the entire struct in clear_exclude_list()
This reverts commit 57534ee77d. The
feature added in that commit requires that patterns behave the same way
from anywhere. But some patterns can behave differently depending on
current "working" directory. The conditions to catch and avoid these
patterns are too loose. The untracked listing[1] and sparse-checkout
selection[2] can become incorrect as a result.
[1] http://article.gmane.org/gmane.comp.version-control.git/283520
[2] http://article.gmane.org/gmane.comp.version-control.git/283532
Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Make sure "el" can be reuseable again. The problem was el->alloc is
not cleared and may cause segfaults next time because add_exclude()
thinks el->excludes (being NULL) has enough space. Just clear the
entire struct to be safe.
Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The name-hash subsystem that is used to cope with case insensitive
filesystems keeps track of directories and their on-filesystem
cases for all the paths in the index by holding a pointer to a
randomly chosen cache entry that is inside the directory (for its
ce->ce_name component). This pointer was not updated even when the
cache entry was removed from the index, leading to use after free.
This was fixed by recording the path for each directory instead of
borrowing cache entries and restructuring the API somewhat.
* dt/name-hash-dir-entry-fix:
name-hash: don't reuse cache_entry in dir_entry
Stop reusing cache_entry in dir_entry; doing so causes a
use-after-free bug.
During merges, we free entries that we no longer need in the
destination index. But those entries might have also been stored in
the dir_entry cache, and when a later call to add_to_index found them,
they would be used after being freed.
To prevent this, change dir_entry to store a copy of the name instead
of a pointer to a cache_entry. This entails some refactoring of code
that expects the cache_entry.
Keith McGuigan <kmcguigan@twitter.com> diagnosed this bug and wrote
the initial patch, but this version does not use any of Keith's code.
Helped-by: Keith McGuigan <kmcguigan@twitter.com>
Helped-by: Junio C Hamano <gitster@pobox.com>
Signed-off-by: David Turner <dturner@twopensource.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Many allocations that is manually counted (correctly) that are
followed by strcpy/sprintf have been replaced with a less error
prone constructs such as xstrfmt.
Macintosh-specific breakage was noticed and corrected in this
reroll.
* jk/war-on-sprintf: (70 commits)
name-rev: use strip_suffix to avoid magic numbers
use strbuf_complete to conditionally append slash
fsck: use for_each_loose_file_in_objdir
Makefile: drop D_INO_IN_DIRENT build knob
fsck: drop inode-sorting code
convert strncpy to memcpy
notes: document length of fanout path with a constant
color: add color_set helper for copying raw colors
prefer memcpy to strcpy
help: clean up kfmclient munging
receive-pack: simplify keep_arg computation
avoid sprintf and strcpy with flex arrays
use alloc_ref rather than hand-allocating "struct ref"
color: add overflow checks for parsing colors
drop strcpy in favor of raw sha1_to_hex
use sha1_to_hex_r() instead of strcpy
daemon: use cld->env_array when re-spawning
stat_tracking_info: convert to argv_array
http-push: use an argv_array for setup_revisions
fetch-pack: use argv_array for index-pack / unpack-objects
...
On a case insensitive filesystems, setting GIT_WORK_TREE variable
using a random cases that does not agree with what the filesystem
thinks confused Git that it wasn't inside the working tree.
* js/icase-wt-detection:
setup: fix "inside work tree" detection on case-insensitive filesystems
Allow a later "!/abc/def" to override an earlier "/abc" that
appears in the same .gitignore file to make it easier to express
"everything in /abc directory is ignored, except for ...".
* nd/ignore-then-not-ignore:
dir.c: don't exclude whole dir prematurely if neg pattern may match
dir.c: make last_exclude_matching_from_list() run til the end
When working with paths in strbufs, we frequently want to
ensure that a directory contains a trailing slash before
appending to it. We can shorten this code (and make the
intent more obvious) by calling strbuf_complete.
Most of these cases are trivially identical conversions, but
there are two things to note:
- in a few cases we did not check that the strbuf is
non-empty (which would lead to an out-of-bounds memory
access). These were generally not triggerable in
practice, either from earlier assertions, or typically
because we would have just fed the strbuf to opendir(),
which would choke on an empty path.
- in a few cases we indexed the buffer with "original_len"
or similar, rather than the current sb->len, and it is
not immediately obvious from the diff that they are the
same. In all of these cases, I manually verified that
the strbuf does not change between the assignment and
the strbuf_complete call.
This does not convert cases which look like:
if (sb->len && !is_dir_sep(sb->buf[sb->len - 1]))
strbuf_addch(sb, '/');
as those are obviously semantically different. Some of these
cases arguably should be doing that, but that is out of
scope for this change, which aims purely for cleanup with no
behavior change (and at least it will make such sites easier
to find and examine in the future, as we can grep for
strbuf_complete).
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Git has a config variable to indicate that it is operating on a file
system that is case-insensitive: core.ignoreCase. But the
`dir_inside_of()` function did not respect that. As a result, if Git's
idea of the current working directory disagreed in its upper/lower case
with the `GIT_WORK_TREE` variable (e.g. `C:\test` vs `c:\test`) the
user would be greeted by the error message
fatal: git-am cannot be used without a working tree.
when trying to run a rebase.
This fixes https://github.com/git-for-windows/git/issues/402 (reported by
Daniel Harding).
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
If there is a pattern "!foo/bar", this patch makes it not exclude "foo"
right away. This gives us a chance to examine "foo" and re-include
"foo/bar".
In order for it to detect that the directory under examination should
not be excluded right away, in other words it is a parent directory of a
negative pattern, the "directory path" of the negative pattern must be
literal. Patterns like "!f?o/bar" can't stop "foo" from being excluded.
Basename matching (i.e. "no slashes in the pattern") or must-be-dir
matching (i.e. "trailing slash in the pattern") does not work well with
this. For example, if we descend in "foo" and are examining "foo/abc",
current code for "foo/" pattern will check if path "foo/abc", not "foo",
is a directory. The same problem with basename matching. These may need
big code reorg to make it work.
Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The next patch adds some post processing to the result value before it's
returned to the caller. Keep all branches reach the end of the function,
so we can do it all in one place.
Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The experimental untracked-cache feature were buggy when paths with
a few levels of subdirectories are involved.
* dt/untracked-subdir:
untracked cache: fix entry invalidation
untracked-cache: fix subdirectory handling
t7063: use --force-untracked-cache to speed up a bit
untracked-cache: support sparse checkout
The experimental untracked-cache feature were buggy when paths with
a few levels of subdirectories are involved.
* dt/untracked-subdir:
untracked cache: fix entry invalidation
untracked-cache: fix subdirectory handling
git_path() and mkpath() are handy helper functions but it is easy
to misuse, as the callers need to be careful to keep the number of
active results below 4. Their uses have been reduced.
* jk/git-path:
memoize common git-path "constant" files
get_repo_path: refactor path-allocation
find_hook: keep our own static buffer
refs.c: remove_empty_directories can take a strbuf
refs.c: avoid git_path assignment in lock_ref_sha1_basic
refs.c: avoid repeated git_path calls in rename_tmp_log
refs.c: simplify strbufs in reflog setup and writing
path.c: drop git_path_submodule
refs.c: remove extra git_path calls from read_loose_refs
remote.c: drop extraneous local variable from migrate_file
prefer mkpathdup to mkpath in assignments
prefer git_pathdup to git_path in some possibly-dangerous cases
add_to_alternates_file: don't add duplicate entries
t5700: modernize style
cache.h: complete set of git_path_submodule helpers
cache.h: clarify documentation for git_path, et al
An experimental "untracked cache" feature used uname(2) in a
slightly unportable way.
* cb/uname-in-untracked:
untracked: fix detection of uname(2) failure
First, the current code in untracked_cache_invalidate_path() is wrong
because it can only handle paths "a" or "a/b", not "a/b/c" because
lookup_untracked() only looks for entries directly under the given
directory. In the last case, it will look for the entry "b/c" in
directory "a" instead. This means if you delete or add an entry in a
subdirectory, untracked cache may become out of date because it does not
invalidate properly. This is noticed by David Turner.
The second problem is about invalidation inside a fully untracked/excluded
directory. In this case we may have to invalidate back to root. See the
comment block for detail.
Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Previously, some calls lookup_untracked would pass a full path. But
lookup_untracked assumes that the portion of the path up to and
including to the untracked_cache_dir has been removed. So
lookup_untracked would be looking in the untracked_cache for 'foo' for
'foo/bar' (instead of just looking for 'bar'). This would cause
untracked cache corruption.
Instead, treat_directory learns to track the base length of the parent
directory, so that only the last path component is passed to
lookup_untracked.
Helped-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
Signed-off-by: David Turner <dturner@twopensource.com>
Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Allow untracked cache (experimental) to be used when sparse
checkout (experimental) is also in use.
* dt/untracked-sparse:
untracked-cache: support sparse checkout
One of the most common uses of git_path() is to pass a
constant, like git_path("MERGE_MSG"). This has two
drawbacks:
1. The return value is a static buffer, and the lifetime
is dependent on other calls to git_path, etc.
2. There's no compile-time checking of the pathname. This
is OK for a one-off (after all, we have to spell it
correctly at least once), but many of these constant
strings appear throughout the code.
This patch introduces a series of functions to "memoize"
these strings, which are essentially globals for the
lifetime of the program. We compute the value once, take
ownership of the buffer, and return the cached value for
subsequent calls. cache.h provides a helper macro for
defining these functions as one-liners, and defines a few
common ones for global use.
Using a macro is a little bit gross, but it does nicely
document the purpose of the functions. If we need to touch
them all later (e.g., because we learned how to change the
git_dir variable at runtime, and need to invalidate all of
the stored values), it will be much easier to have the
complete list.
Note that the shared-global functions have separate, manual
declarations. We could do something clever with the macros
(e.g., expand it to a declaration in some places, and a
declaration _and_ a definition in path.c). But there aren't
that many, and it's probably better to stay away from
too-magical macros.
Likewise, if we abandon the C preprocessor in favor of
generating these with a script, we could get much fancier.
E.g., normalizing "FOO/BAR-BAZ" into "git_path_foo_bar_baz".
But the small amount of saved typing is probably not worth
the resulting confusion to readers who want to grep for the
function's definition.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
An experimental "untracked cache" feature used uname(2) in a
slightly unportable way.
* cb/uname-in-untracked:
untracked: fix detection of uname(2) failure
Remove a check that would disable the untracked cache for sparse
checkouts. Add tests that ensure that the untracked cache works with
sparse checkouts -- specifically considering the case that a file
foo/bar is checked out, but foo/.gitignore is not.
Signed-off-by: David Turner <dturner@twopensource.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
According to POSIX specification uname(2) must return -1 on failure
and a non-negative value on success. Although many implementations
do return 0 on success it is valid to return any positive value for
success. In particular, Solaris returns 1.
Signed-off-by: Charles Bailey <cbailey32@bloomberg.net>
Reviewed-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Code clean-up.
* rs/janitorial:
dir: remove unused variable sb
clean: remove unused variable buf
use file_exists() to check if a file exists in the worktree
Teach the index to optionally remember already seen untracked files
to speed up "git status" in a working tree with tons of cruft.
* nd/untracked-cache: (24 commits)
git-status.txt: advertisement for untracked cache
untracked cache: guard and disable on system changes
mingw32: add uname()
t7063: tests for untracked cache
update-index: test the system before enabling untracked cache
update-index: manually enable or disable untracked cache
status: enable untracked cache
untracked-cache: temporarily disable with $GIT_DISABLE_UNTRACKED_CACHE
untracked cache: mark index dirty if untracked cache is updated
untracked cache: print stats with $GIT_TRACE_UNTRACKED_STATS
untracked cache: avoid racy timestamps
read-cache.c: split racy stat test to a separate function
untracked cache: invalidate at index addition or removal
untracked cache: load from UNTR index extension
untracked cache: save to an index extension
ewah: add convenient wrapper ewah_serialize_strbuf()
untracked cache: don't open non-existent .gitignore
untracked cache: mark what dirs should be recursed/saved
untracked cache: record/validate dir mtime and reuse cached output
untracked cache: make a wrapper around {open,read,close}dir()
...
core.excludesfile (defaulting to $XDG_HOME/git/ignore) is supposed
to be overridden by repository-specific .git/info/exclude file, but
the order was swapped from the beginning. This belatedly fixes it.
* jc/gitignore-precedence:
ignore: info/exclude should trump core.excludesfile
Teach the codepaths that read .gitignore and .gitattributes files
that these files encoded in UTF-8 may have UTF-8 BOM marker at the
beginning; this makes it in line with what we do for configuration
files already.
* cn/bom-in-gitignore:
attr: skip UTF8 BOM at the beginning of the input file
config: use utf8_bom[] from utf.[ch] in git_parse_source()
utf8-bom: introduce skip_utf8_bom() helper
add_excludes_from_file: clarify the bom skipping logic
dir: allow a BOM at the beginning of exclude files
Code clean-up for xdg configuration path support.
* pt/xdg-config-path:
path.c: remove home_config_paths()
git-config: replace use of home_config_paths()
git-commit: replace use of home_config_paths()
credential-store.c: replace home_config_paths() with xdg_config_home()
dir.c: replace home_config_paths() with xdg_config_home()
attr.c: replace home_config_paths() with xdg_config_home()
path.c: implement xdg_config_home()
Since only the xdg excludes file path is required, simplify the code by
replacing use of home_config_paths() with xdg_config_home().
Signed-off-by: Paul Tan <pyokagan@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Teach the codepaths that read .gitignore and .gitattributes files
that these files encoded in UTF-8 may have UTF-8 BOM marker at the
beginning; this makes it in line with what we do for configuration
files already.
* cn/bom-in-gitignore:
attr: skip UTF8 BOM at the beginning of the input file
config: use utf8_bom[] from utf.[ch] in git_parse_source()
utf8-bom: introduce skip_utf8_bom() helper
add_excludes_from_file: clarify the bom skipping logic
dir: allow a BOM at the beginning of exclude files
$GIT_DIR/info/exclude and core.excludesfile (which falls back to
$XDG_HOME/git/ignore) are both ways to override the ignore pattern
lists given by the project in .gitignore files. The former, which
is per-repository personal preference, should take precedence over
the latter, which is a personal preference default across different
repositories that are accessed from that machine. The existing
documentation also agrees.
However, the precedence order was screwed up between these two from
the very beginning when 896bdfa2 (add: Support specifying an
excludes file with a configuration variable, 2007-02-27) introduced
core.excludesfile variable.
Noticed-by: Yohei Endo <yoheie@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
With the recent change to ignore the UTF8 BOM at the beginning of
.gitignore files, we now have two codepaths that do such a skipping
(the other one is for reading the configuration files).
Introduce utf8_bom[] constant string and skip_utf8_bom() helper
and teach .gitignore code how to use it.
Signed-off-by: Junio C Hamano <gitster@pobox.com>