Earlier, 47cfc9bd (attr: add flag `--source` to work with tree-ish,
2023-01-14) taught "git check-attr" the "--source=<tree>" option to
allow it to read attribute files from a tree-ish, but did so only
for the command. Just like "check-attr" users wanted a way to use
attributes from a tree-ish and not from the working tree files,
users of other commands (like "git diff") would benefit from the
same.
Undo most of the UI change the commit made, while keeping the
internal logic to read attributes from a given tree-ish. Expose the
internal logic via a new "--attr-source=<tree>" command line option
given to "git", so that it can be used with any git command that
runs as part of the main git process.
Additionally, add an environment variable GIT_ATTR_SOURCE that is set
when --attr-source is passed in, so that subprocesses use the same value
for the attributes source tree.
Signed-off-by: John Cai <johncai86@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
"git check-attr" learned to take an optional tree-ish to read the
.gitattributes file from.
* kn/attr-from-tree:
attr: add flag `--source` to work with tree-ish
t0003: move setup for `--all` into new block
The command `dd bs=101M count=1` is not portable,
e.g. dd shipped with MacOs does not understand the 'M'.
Use `dd bs=1048576 count=101`, which achives the same, instead.
Signed-off-by: Torsten Bögershausen <tboegi@web.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The contents of the .gitattributes files may evolve over time, but "git
check-attr" always checks attributes against them in the working tree
and/or in the index. It may be beneficial to optionally allow the users
to check attributes taken from a commit other than HEAD against paths.
Add a new flag `--source` which will allow users to check the
attributes against a commit (actually any tree-ish would do). When the
user uses this flag, we go through the stack of .gitattributes files but
instead of checking the current working tree and/or in the index, we
check the blobs from the provided tree-ish object. This allows the
command to also be used in bare repositories.
Since we use a tree-ish object, the user can pass "--source
HEAD:subdirectory" and all the attributes will be looked up as if
subdirectory was the root directory of the repository.
We cannot simply use the `<rev>:<path>` syntax without the `--source`
flag, similar to how it is used in `git show` because any non-flag
parameter before `--` is treated as an attribute and any parameter after
`--` is treated as a pathname.
The change involves creating a new function `read_attr_from_blob`, which
given the path reads the blob for the path against the provided source and
parses the attributes line by line. This function is plugged into
`read_attr()` function wherein we go through the stack of attributes
files.
Signed-off-by: Karthik Nayak <karthik.188@gmail.com>
Signed-off-by: Toon Claes <toon@iotcl.com>
Co-authored-by: toon@iotcl.com
Signed-off-by: Junio C Hamano <gitster@pobox.com>
There is some setup code which is used by multiple tests being setup in
`attribute test: --all option`. This means when we run "sh
./t0003-attributes.sh --run=setup,<num>" there is a chance of failing
since we missed this setup block.
So to ensure that setups are independent of test logic, move this to a
new setup block.
Signed-off-by: Karthik Nayak <karthik.188@gmail.com>
Co-authored-by: toon@iotcl.com
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Similar as with the preceding commit, start ignoring gitattributes files
that are overly large to protect us against out-of-bounds reads and
writes caused by integer overflows. Unfortunately, we cannot just define
"overly large" in terms of any preexisting limits in the codebase.
Instead, we choose a very conservative limit of 100MB. This is plenty of
room for specifying gitattributes, and incidentally it is also the limit
for blob sizes for GitHub. While we don't want GitHub to dictate limits
here, it is still sensible to use this fact for an informed decision
given that it is hosting a huge set of repositories. Furthermore, over
at GitLab we scanned a subset of repositories for their root-level
attribute files. We found that 80% of them have a gitattributes file
smaller than 100kB, 99.99% have one smaller than 1MB, and only a single
repository had one that was almost 3MB in size. So enforcing a limit of
100MB seems to give us ample of headroom.
With this limit in place we can be reasonably sure that there is no easy
way to exploit the gitattributes file via integer overflows anymore.
Furthermore, it protects us against resource exhaustion caused by
allocating the in-memory data structures required to represent the
parsed attributes.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
There are two different code paths to read gitattributes: once via a
file, and once via the index. These two paths used to behave differently
because when reading attributes from a file, we used fgets(3P) with a
buffer size of 2kB. Consequentially, we silently truncate line lengths
when lines are longer than that and will then parse the remainder of the
line as a new pattern. It goes without saying that this is entirely
unexpected, but it's even worse that the behaviour depends on how the
gitattributes are parsed.
While this is simply wrong, the silent truncation saves us with the
recently discovered vulnerabilities that can cause out-of-bound writes
or reads with unreasonably long lines due to integer overflows. As the
common path is to read gitattributes via the worktree file instead of
via the index, we can assume that any gitattributes file that had lines
longer than that is already broken anyway. So instead of lifting the
limit here, we can double down on it to fix the vulnerabilities.
Introduce an explicit line length limit of 2kB that is shared across all
paths that read attributes and ignore any line that hits this limit
while printing a warning.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
When reading attributes from a file we use fgets(3P) with a buffer size
of 2048 bytes. This means that as soon as a line exceeds the buffer size
we split it up into multiple parts and parse each of them as a separate
pattern line. This is of course not what the user intended, and even
worse the behaviour is inconsistent with how we read attributes from the
index.
Fix this bug by converting the code to use `strbuf_getline()` instead.
This will indeed read in the whole line, which may theoretically lead to
an out-of-memory situation when the gitattributes file is huge. We're
about to reject any gitattributes files larger than 100MB in the next
commit though, which makes this less of a concern.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Change those tests that assumed that a .git/info directory would be
created for them when writing .git/info/attributes to explicitly
create the directory by setting "TEST_CREATE_REPO_NO_TEMPLATE=1"
before sourcing test-lib.sh, and using the "--template=" argument to
"git clone".
The change here in here in t7814-grep-recurse-submodules.sh would
continue "succeeding" with only the "TEST_CREATE_REPO_NO_TEMPLATE=1"
part of this change. That's because those tests use
"test_expect_failure", so they'd "pass" without this change, as
"test_expect_failure" by design isn't discerning about what failure
conditions it'll accept.
But as we're fixing these sorts of issues across the test suite let's
fix this one too. This issue was spotted with a local merge with
another topic of mine[1], which introduces a stricter alternative to
"test_expect_failure".
1. https://lore.kernel.org/git/cover-0.7-00000000000-20220318T002951Z-avarab@gmail.com/
Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Pipes ignore error codes of LHS command and thus we should not use them
with Git in tests. As an alternative, use a 'tmp' file to write the Git
output so we can test the exit code.
Signed-off-by: Shubham Mishra <shivam828787@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Mark various existing tests in t00*.sh that invoke git built-ins with
TEST_PASSES_SANITIZE_LEAK=true as passing when git is compiled with
SANITIZE=leak.
They'll now be listed as running under the
"GIT_TEST_PASSING_SANITIZE_LEAK=true" test mode (the "linux-leaks" CI
target).
Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The attributes system may sometimes read in-tree files from the
filesystem, and sometimes from the index. In the latter case, we do not
resolve symbolic links (and are not likely to ever start doing so).
Let's open filesystem links with O_NOFOLLOW so that the two cases behave
consistently.
As a bonus, this means that git will not follow such symlinks to read
and parse out-of-tree paths. In some cases this could have security
implications, as a malicious repository can cause Git to open and read
arbitrary files. It could already feed arbitrary content to the parser,
but in certain setups it might be able to exfiltrate data from those
paths (e.g., if an automated service operating on the malicious repo
reveals its stderr to an attacker).
Note that O_NOFOLLOW only prevents following links for the path itself,
not intermediate directories in the path. At first glance, it seems
like
ln -s /some/path in-repo
might still look at "in-repo/.gitattributes", following the symlink to
"/some/path/.gitattributes". However, if "in-repo" is a symbolic link,
then we know that it has no git paths below it, and will never look at
its .gitattributes file.
We will continue to support out-of-tree symbolic links (e.g., in
$GIT_DIR/info/attributes); this just affects in-tree links. When a
symbolic link is encountered, the contents are ignored and a warning is
printed. POSIX specifies ELOOP in this case, so the user would generally
see something like:
warning: unable to access '.gitattributes': Too many levels of symbolic links
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
In an effort to remove test_must_fail for all invocations not related to
git or test-tool, replace invocations of `test_must_fail attr_check`
with a plain attr_check call with the $expect argument set to the
actual value output by git.
Signed-off-by: Denton Liu <liu.denton@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
In several places, we used `test_line_count = 0` to check for an empty
file. Although this is correct, it's overkill. Use test_must_be_empty()
instead because it's more suited for this purpose.
Signed-off-by: Denton Liu <liu.denton@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
We had named the parameters in attr_check() but $2 was being used
instead of $expect. Make all variable accesses in attr_check() use named
variables instead of numbered arguments for clarity.
While we're at it, add variable assignments to the &&-chain. These
aren't ever expected to fail but if a future developer ever adds some
code above the assignments and they could fail in some way, the intact
&&-chain will ensure that the failure is caught.
Signed-off-by: Denton Liu <liu.denton@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Since 60a12722ac (attr: remove maybe-real, maybe-macro from git_attr,
2017-01-27), we will always mark an attribute macro (e.g., "binary")
that is specifically queried for as "unspecified", even though listing
_all_ attributes would display it at set. E.g.:
$ echo "* binary" >.gitattributes
$ git check-attr -a file
file: binary: set
file: diff: unset
file: merge: unset
file: text: unset
$ git check-attr binary file
file: binary: unspecified
The problem stems from an incorrect conversion of the optimization from
06a604e670 (attr: avoid heavy work when we know the specified attr is
not defined, 2014-12-28). There we tried in collect_some_attrs() to
avoid even looking at the attr_stack when the user has asked for "foo"
and we know that "foo" did not ever appear in any .gitattributes file.
It used a flag "maybe_real" in each attribute struct, where "real" meant
that the attribute appeared in an actual file (we have to make this
distinction because we also create an attribute struct for any names
that are being queried). But as explained in that commit message, the
meaning of "real" was tangled with some special cases around macros.
When 60a12722ac later refactored the macro code, it dropped maybe_real
entirely. This missed the fact that "maybe_real" could be unset for two
reasons: because of a macro, or because it was never found during
parsing. This had two results:
- the optimization in collect_some_attrs() ceased doing anything
meaningful, since it no longer kept track of "was it found during
parsing"
- worse, it actually kicked in when the caller _did_ ask about a macro
by name, causing us to mark it as unspecified
It should be possible to salvage this optimization, but let's start with
just removing the remnants. It hasn't been doing anything (except
creating bugs) since 60a12722ac, and nobody seems to have noticed the
performance regression. It's more important to fix the correctness
problem clearly first.
I've added two tests here. The second one actually shows off the bug.
The test of "check-attr -a" is not strictly necessary, but we currently
do not test attribute macros much, and the builtin "binary" not at all.
So this increases our general test coverage, as well as making sure we
didn't mess up this related case.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Test fixes.
* sg/test-must-be-empty:
tests: use 'test_must_be_empty' instead of 'test_cmp <empty> <out>'
tests: use 'test_must_be_empty' instead of 'test_cmp /dev/null <out>'
tests: use 'test_must_be_empty' instead of 'test ! -s'
tests: use 'test_must_be_empty' instead of '! test -s'
Using 'test_must_be_empty' is shorter and more idiomatic than
>empty &&
test_cmp empty out
as it saves the creation of an empty file. Furthermore, sometimes the
expected empty file doesn't have such a descriptive name like 'empty',
and its creation is far away from the place where it's finally used
for comparison (e.g. in 't7600-merge.sh', where two expected empty
files are created in the 'setup' test, but are used only about 500
lines later).
These cases were found by instrumenting 'test_cmp' to error out the
test script when it's used to compare empty files, and then converted
manually.
Note that even after this patch there still remain a lot of cases
where we use 'test_cmp' to check empty files:
- Sometimes the expected output is not hard-coded in the test, but
'test_cmp' is used to ensure that two similar git commands produce
the same output, and that output happens to be empty, e.g. the
test 'submodule update --merge - ignores --merge for new
submodules' in 't7406-submodule-update.sh'.
- Repetitive common tasks, including preparing the expected results
and running 'test_cmp', are often extracted into a helper
function, and some of this helper's callsites expect no output.
- For the same reason as above, the whole 'test_expect_success'
block is within a helper function, e.g. in 't3070-wildmatch.sh'.
- Or 'test_cmp' is invoked in a loop, e.g. the test 'cvs update
(-p)' in 't9400-git-cvsserver-server.sh'.
Signed-off-by: SZEDER Gábor <szeder.dev@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Full pattern must be quoted. So 'pat"t"ern attr' will give exactly
'pat"t"ern', not 'pattern'. Also clarify that leading whitespaces are
not part of the pattern and document comment syntax.
Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Signed-off-by: Stefan Beller <sbeller@google.com>
Signed-off-by: Brandon Williams <bmwill@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
"git check-attr" when (trying to) work on a repository with a
working tree did not work well when the working tree was specified
via --work-tree (and obviously with --git-dir).
The command also works in a bare repository but it reads from the
(possibly stale, irrelevant and/or nonexistent) index, which may
need to be fixed to read from HEAD, but that is a completely
separate issue. As a related tangent to this separate issue, we
may want to also fix "check-ignore", which refuses to work in a
bare repository, to also operate in a bare one.
* jc/check-attr-honor-working-tree:
check-attr: move to the top of working tree when in non-bare repository
t0003: do not chdir the whole test process
Lasse Makholm noticed that running "git check-attr" from a place
totally unrelated to $GIT_DIR and $GIT_WORK_TREE does not give
expected results. I think it is because the command does not say it
wants to call setup_work_tree().
We still need to support use cases where only a bare repository is
involved, so unconditionally requiring a working tree would not work
well. Instead, make a call only in a non-bare repository.
We may want to see if we want to do a similar fix in the opposite
direction to check-ignore. The command unconditionally requires a
working tree, but it should be usable in a bare repository just like
check-attr attempts to be.
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Moving to some other directory and letting the remainder of the test
pieces to expect that they start there is a bad practice. The test
that contains chdir itself may fail (or by mistake skipped via the
GIT_SKIP_TESTS mechanism) in which case the remainder may operate on
files in unexpected places.
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Before 82dce99 (attr: more matching optimizations from .gitignore,
2012-10-15), .gitattributes did not have any special treatment of a
leading '!'. The docs, however, always said
The rules how the pattern matches paths are the same as in
`.gitignore` files; see linkgit:gitignore[5].
By those rules, leading '!' means pattern negation. So 82dce99
correctly determined that this kind of line makes no sense and should
be disallowed.
However, users who actually had a rule for files starting with a '!'
are in a bad position: before 82dce99 '!' matched that literal
character, so it is conceivable that users have .gitattributes with
such lines in them. After 82dce99 the unescaped version was
disallowed in such a way that git outright refuses to run(!) most
commands in the presence of such a .gitattributes. It therefore
becomes very hard to fix, let alone work with, such repositories.
Let's at least allow the users to fix their repos: change the fatal
error into a warning.
Reported-by: mathstuf@gmail.com
Signed-off-by: Thomas Rast <trast@student.ethz.ch>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Allows pathname patterns in .gitignore and .gitattributes files
with double-asterisks "foo/**/bar" to match any number of directory
hierarchies.
* nd/wildmatch:
wildmatch: replace variable 'special' with better named ones
compat/fnmatch: respect NO_FNMATCH* even on glibc
wildmatch: fix "**" special case
t3070: Disable some failing fnmatch tests
test-wildmatch: avoid Windows path mangling
Support "**" wildcard in .gitignore and .gitattributes
wildmatch: make /**/ match zero or more directories
wildmatch: adjust "**" behavior
wildmatch: fix case-insensitive matching
wildmatch: remove static variable force_lower_case
wildmatch: make wildmatch's return value compatible with fnmatch
t3070: disable unreliable fnmatch tests
Integrate wildmatch to git
wildmatch: follow Git's coding convention
wildmatch: remove unnecessary functions
Import wildmatch from rsync
ctype: support iscntrl, ispunct, isxdigit and isprint
ctype: make sane_ctype[] const array
Conflicts:
Makefile
Start laying the foundation to build the "wildmatch" after we can
agree on its desired semantics.
* nd/attr-match-optim-more:
attr: more matching optimizations from .gitignore
gitignore: make pattern parsing code a separate function
exclude: split pathname matching code into a separate function
exclude: fix a bug in prefix compare optimization
exclude: split basename matching code into a separate function
exclude: stricten a length check in EXC_FLAG_ENDSWITH case
.gitattributes and .gitignore share the same pattern syntax but has
separate matching implementation. Over the years, ignore's
implementation accumulates more optimizations while attr's stays the
same.
This patch reuses the core matching functions that are also used by
excluded_from_list. excluded_from_list and path_matches can't be
merged due to differences in exclude and attr, for example:
* "!pattern" syntax is forbidden in .gitattributes. As an attribute
can be unset (i.e. set to a special value "false") or made back to
unspecified (i.e. not even set to "false"), "!pattern attr" is unclear
which one it means.
* we support attaching attributes to directories, but git-core
internally does not currently make use of attributes on
directories.
Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Case insensitivity plays a role in several tests and is tested in several
tests. Therefore, move the test from t003 into the test lib and use the
prerequisite in t0003.
Signed-off-by: Michael J Gruber <git@drmicha.warpmail.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
When we prepare the attribute stack for a lookup on a path,
we start with the cached stack from the previous lookup
(because it is common to do several lookups in the same
directory hierarchy). So the first thing we must do in
preparing the stack is to pop any entries that point to
directories we are no longer interested in.
For example, if our stack contains gitattributes for:
foo/bar/baz
foo/bar
foo
but we want to do a lookup in "foo/bar/bleep", then we want
to pop the top element, but retain the others.
To do this we walk down the stack from the top, popping
elements that do not match our lookup directory. However,
the test do this simply checked strncmp, meaning we would
mistake "foo/bar/baz" as a leading directory of
"foo/bar/baz_plus". We must also check that the character
after our match is '/', meaning we matched the whole path
component.
There are two special cases to consider:
1. The top of our attr stack has the empty path. So we
must not check for '/', but rather special-case the
empty path, which always matches.
2. Typically when matching paths in this way, you would
also need to check for a full string match (i.e., the
character after is '\0'). We don't need to do so in
this case, though, because our path string is actually
just the directory component of the path to a file
(i.e., we know that it terminates with "/", because the
filename comes after that).
Helped-by: Michael Haggerty <mhagger@alum.mit.edu>
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
When core.ignorecase is true, the file globs configured in the
.gitattributes file should be matched case-insensitively against the paths
in the working directory. Let's do so.
Plus, add some tests.
The last set of tests is performed only on a case-insensitive filesystem.
Those tests make sure that git handles the case where the .gitignore file
resides in a subdirectory and the user supplies a path that does not match
the case in the filesystem. In that case^H^H^H^Hsituation, part of the
path supplied by the user is effectively interpreted case-insensitively,
and part of it is dependent on the setting of core.ignorecase. git will
currently only match the portion of the path below the directory holding
the .gitignore file according to the setting of core.ignorecase.
This is also partly future-proofing. Currently, git builds the attr stack
based on the path supplied by the user, so we don't have to do anything
special (like use strcmp_icase) to handle the parts of that path that don't
match the filesystem with respect to case. If git instead built the attr
stack by scanning the repository, then the paths in the origin field would
not necessarily match the paths supplied by the user. If someone makes a
change like that in the future, these tests will notice.
Signed-off-by: Brandon Casey <drafnel@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
This option causes check-attr to consider .gitattributes only from
the index, ignoring .gitattributes from the working tree. This allows
the command to be used in situations where a working tree does not exist.
Signed-off-by: Jay Soffian <jaysoffian@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Normalize the path arguments (relative to the working tree root, if
applicable) before looking up their attributes. This requires passing
the prefix down the call chain.
This fixes two test cases for different reasons:
* "unnormalized paths" is fixed because the .gitattribute-file-seeking
code is not confused into reading the top-level file twice.
* "relative paths" is fixed because the canonical pathnames are passed
to get_check_attr() or get_all_attrs(), allowing them to match the
pathname patterns as expected.
Signed-off-by: Michael Haggerty <mhagger@alum.mit.edu>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
According to the git-check-attr synopsis, if the '--stdin' option is
used then no pathnames are expected on the command line. Change the
behavior to match this description; namely, if '--stdin' is used but
not '--', then treat all command-line arguments as attribute names.
Signed-off-by: Michael Haggerty <mhagger@alum.mit.edu>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Add new usage patterns
git check-attr [-a | --all] [--] pathname...
git check-attr --stdin [-a | --all] < <list-of-paths>
which display all attributes associated with the specified file(s).
Signed-off-by: Michael Haggerty <mhagger@alum.mit.edu>
Signed-off-by: Junio C Hamano <gitster@pobox.com>