SHA-256 transition continues.
* bc/sha-256-part-1-of-4: (22 commits)
fast-import: add options for rewriting submodules
fast-import: add a generic function to iterate over marks
fast-import: make find_marks work on any mark set
fast-import: add helper function for inserting mark object entries
fast-import: permit reading multiple marks files
commit: use expected signature header for SHA-256
worktree: allow repository version 1
init-db: move writing repo version into a function
builtin/init-db: add environment variable for new repo hash
builtin/init-db: allow specifying hash algorithm on command line
setup: allow check_repository_format to read repository format
t/helper: make repository tests hash independent
t/helper: initialize repository if necessary
t/helper/test-dump-split-index: initialize git repository
t6300: make hash algorithm independent
t6300: abstract away SHA-1-specific constants
t: use hash-specific lookup tables to define test constants
repository: require a build flag to use SHA-256
hex: add functions to parse hex object IDs in any algorithm
hex: introduce parsing variants taking hash algorithms
...
We `cat` files, but don't inspect or grab the contents in any way.
Unlike in an earlier commit, there is no reason to suspect that these
files could be missing, so `cat`-ing them is just wasted effort.
Signed-off-by: Martin Ågren <martin.agren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The transition plan anticipates that we will allow signatures using
multiple algorithms in a single commit. In order to do so, we need to
use a different header per algorithm so that it will be obvious over
which data to compute the signature.
The transition plan specifies that we should use "gpgsig-sha256", so
wire up the commit code such that it can write and parse the current
algorithm, and it can remove the headers for any algorithm when creating
a new commit. Add tests to ensure that we write using the right header
and that git fsck doesn't reject these commits.
Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
* maint-2.23: (44 commits)
Git 2.23.1
Git 2.22.2
Git 2.21.1
mingw: sh arguments need quoting in more circumstances
mingw: fix quoting of empty arguments for `sh`
mingw: use MSYS2 quoting even when spawning shell scripts
mingw: detect when MSYS2's sh is to be spawned more robustly
t7415: drop v2.20.x-specific work-around
Git 2.20.2
t7415: adjust test for dubiously-nested submodule gitdirs for v2.20.x
Git 2.19.3
Git 2.18.2
Git 2.17.3
Git 2.16.6
test-drop-caches: use `has_dos_drive_prefix()`
Git 2.15.4
Git 2.14.6
mingw: handle `subst`-ed "DOS drives"
mingw: refuse to access paths with trailing spaces or periods
mingw: refuse to access paths with illegal characters
...
* maint-2.22: (43 commits)
Git 2.22.2
Git 2.21.1
mingw: sh arguments need quoting in more circumstances
mingw: fix quoting of empty arguments for `sh`
mingw: use MSYS2 quoting even when spawning shell scripts
mingw: detect when MSYS2's sh is to be spawned more robustly
t7415: drop v2.20.x-specific work-around
Git 2.20.2
t7415: adjust test for dubiously-nested submodule gitdirs for v2.20.x
Git 2.19.3
Git 2.18.2
Git 2.17.3
Git 2.16.6
test-drop-caches: use `has_dos_drive_prefix()`
Git 2.15.4
Git 2.14.6
mingw: handle `subst`-ed "DOS drives"
mingw: refuse to access paths with trailing spaces or periods
mingw: refuse to access paths with illegal characters
unpack-trees: let merged_entry() pass through do_add_entry()'s errors
...
* maint-2.21: (42 commits)
Git 2.21.1
mingw: sh arguments need quoting in more circumstances
mingw: fix quoting of empty arguments for `sh`
mingw: use MSYS2 quoting even when spawning shell scripts
mingw: detect when MSYS2's sh is to be spawned more robustly
t7415: drop v2.20.x-specific work-around
Git 2.20.2
t7415: adjust test for dubiously-nested submodule gitdirs for v2.20.x
Git 2.19.3
Git 2.18.2
Git 2.17.3
Git 2.16.6
test-drop-caches: use `has_dos_drive_prefix()`
Git 2.15.4
Git 2.14.6
mingw: handle `subst`-ed "DOS drives"
mingw: refuse to access paths with trailing spaces or periods
mingw: refuse to access paths with illegal characters
unpack-trees: let merged_entry() pass through do_add_entry()'s errors
quote-stress-test: offer to test quoting arguments for MSYS2 sh
...
* maint-2.20: (36 commits)
Git 2.20.2
t7415: adjust test for dubiously-nested submodule gitdirs for v2.20.x
Git 2.19.3
Git 2.18.2
Git 2.17.3
Git 2.16.6
test-drop-caches: use `has_dos_drive_prefix()`
Git 2.15.4
Git 2.14.6
mingw: handle `subst`-ed "DOS drives"
mingw: refuse to access paths with trailing spaces or periods
mingw: refuse to access paths with illegal characters
unpack-trees: let merged_entry() pass through do_add_entry()'s errors
quote-stress-test: offer to test quoting arguments for MSYS2 sh
t6130/t9350: prepare for stringent Win32 path validation
quote-stress-test: allow skipping some trials
quote-stress-test: accept arguments to test via the command-line
tests: add a helper to stress test argument quoting
mingw: fix quoting of arguments
Disallow dubiously-nested submodule git directories
...
* maint-2.19: (34 commits)
Git 2.19.3
Git 2.18.2
Git 2.17.3
Git 2.16.6
test-drop-caches: use `has_dos_drive_prefix()`
Git 2.15.4
Git 2.14.6
mingw: handle `subst`-ed "DOS drives"
mingw: refuse to access paths with trailing spaces or periods
mingw: refuse to access paths with illegal characters
unpack-trees: let merged_entry() pass through do_add_entry()'s errors
quote-stress-test: offer to test quoting arguments for MSYS2 sh
t6130/t9350: prepare for stringent Win32 path validation
quote-stress-test: allow skipping some trials
quote-stress-test: accept arguments to test via the command-line
tests: add a helper to stress test argument quoting
mingw: fix quoting of arguments
Disallow dubiously-nested submodule git directories
protect_ntfs: turn on NTFS protection by default
path: also guard `.gitmodules` against NTFS Alternate Data Streams
...
* maint-2.18: (33 commits)
Git 2.18.2
Git 2.17.3
Git 2.16.6
test-drop-caches: use `has_dos_drive_prefix()`
Git 2.15.4
Git 2.14.6
mingw: handle `subst`-ed "DOS drives"
mingw: refuse to access paths with trailing spaces or periods
mingw: refuse to access paths with illegal characters
unpack-trees: let merged_entry() pass through do_add_entry()'s errors
quote-stress-test: offer to test quoting arguments for MSYS2 sh
t6130/t9350: prepare for stringent Win32 path validation
quote-stress-test: allow skipping some trials
quote-stress-test: accept arguments to test via the command-line
tests: add a helper to stress test argument quoting
mingw: fix quoting of arguments
Disallow dubiously-nested submodule git directories
protect_ntfs: turn on NTFS protection by default
path: also guard `.gitmodules` against NTFS Alternate Data Streams
is_ntfs_dotgit(): speed it up
...
* maint-2.17: (32 commits)
Git 2.17.3
Git 2.16.6
test-drop-caches: use `has_dos_drive_prefix()`
Git 2.15.4
Git 2.14.6
mingw: handle `subst`-ed "DOS drives"
mingw: refuse to access paths with trailing spaces or periods
mingw: refuse to access paths with illegal characters
unpack-trees: let merged_entry() pass through do_add_entry()'s errors
quote-stress-test: offer to test quoting arguments for MSYS2 sh
t6130/t9350: prepare for stringent Win32 path validation
quote-stress-test: allow skipping some trials
quote-stress-test: accept arguments to test via the command-line
tests: add a helper to stress test argument quoting
mingw: fix quoting of arguments
Disallow dubiously-nested submodule git directories
protect_ntfs: turn on NTFS protection by default
path: also guard `.gitmodules` against NTFS Alternate Data Streams
is_ntfs_dotgit(): speed it up
mingw: disallow backslash characters in tree objects' file names
...
The backslash character is not a valid part of a file name on Windows.
Hence it is dangerous to allow writing files that were unpacked from
tree objects, when the stored file name contains a backslash character:
it will be misinterpreted as directory separator.
This not only causes ambiguity when a tree contains a blob `a\b` and a
tree `a` that contains a blob `b`, but it also can be used as part of an
attack vector to side-step the careful protections against writing into
the `.git/` directory during a clone of a maliciously-crafted
repository.
Let's prevent that, addressing CVE-2019-1354.
Note: we guard against backslash characters in tree objects' file names
_only_ on Windows (because on other platforms, even on those where NTFS
volumes can be mounted, the backslash character is _not_ a directory
separator), and _only_ when `core.protectNTFS = true` (because users
might need to generate tree objects for other platforms, of course
without touching the worktree, e.g. using `git update-index
--cacheinfo`).
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Commit 90cf590f53 (fsck: optionally show more helpful info for broken
links, 2016-07-17) added a system for decorating objects with names. The
code is split across builtin/fsck.c (which gives the initial names) and
fsck.c (which adds to the names as it traverses the object graph). This
leads to some duplication, where both sites have near-identical
describe_object() functions (the difference being that the one in
builtin/fsck.c uses a circular array of buffers to allow multiple calls
in a single printf).
Let's provide a unified object_name API for fsck. That lets us drop the
duplication, as well as making the interface boundaries more clear
(which will let us refactor the implementation more in a future patch).
We'll leave describe_object() in builtin/fsck.c as a thin wrapper around
the new API, as it relies on a static global to make its many callers a
bit shorter.
We'll also convert the bare add_decoration() calls in builtin/fsck.c to
put_object_name(). This fixes two minor bugs:
1. We leak many small strings. add_decoration() has a last-one-wins
approach: it updates the decoration to the new string and returns
the old one. But we ignore the return value, leaking the old
string. This is quite common to trigger, since we look at reflogs:
the tip of any ref will be described both by looking at the actual
ref, as well as the latest reflog entry. So we'd always end up
leaking one of those strings.
2. The last-one-wins approach gives us lousy names. For instance, we
first look at all of the refs, and then all of the reflogs. So
rather than seeing "refs/heads/master", we're likely to overwrite
it with "HEAD@{12345678}". We're generally better off using the
first name we find.
And indeed, the test in t1450 expects this ugly HEAD@{} name. After
this patch, we've switched to using fsck_put_object_name()'s
first-one-wins semantics, and we output the more human-friendly
"refs/tags/julius" (and the test is updated accordingly).
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Some tests print a file before searching for a pattern using
test_i18ngrep. This is useful when debugging tests with --verbose when
the pattern is not found as expected.
Since 63b1a175ee (t: make 'test_i18ngrep' more informative on failure,
2018-02-08) test_i18ngrep already shows the contents of a file that
doesn't match the expected pattern, though.
So don't bother doing the same unconditionally up-front. The contents
are not interesting if the expected pattern is found, and showing it
twice if it doesn't match is of no use.
Signed-off-by: René Scharfe <l.s.r@web.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Replace several hard-coded full and partial object IDs with variables or
computed values. Create junk data to stuff inside an invalid tree that
can be either 20 or 32 bytes long. Compute a binary all-zeros object ID
instead of hard-coding a 20-byte length.
Additionally, compute various object IDs by using test_oid and
$EMPTY_BLOB so that this test works with multiple hash algorithms.
Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Code tightening against a "wrong" object appearing where an object
of a different type is expected, instead of blindly assuming that
the connection between objects are correctly made.
* tb/unexpected:
rev-list: detect broken root trees
rev-list: let traversal die when --missing is not in use
get_commit_tree(): return NULL for broken tree
list-objects.c: handle unexpected non-tree entries
list-objects.c: handle unexpected non-blob entries
t: introduce tests for unexpected object types
t: move 'hex2oct' into test-lib-functions.sh
The helper 'hex2oct' is used to convert base-16 encoded data into a
base-8 binary form, and is useful for preparing data for commands that
accept input in a binary format, such as 'git hash-object', via
'printf'.
This helper is defined identically in three separate places throughout
't'. Move the definition to test-lib-function.sh, so that it can be used
in new test suites, and its definition is not redundant.
This will likewise make our job easier in the subsequent commit, which
also uses 'hex2oct'.
Signed-off-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
"git fsck --connectivity-only" omits computation necessary to sift
the objects that are not reachable from any of the refs into
unreachable and dangling. This is now enabled when dangling
objects are requested (which is done by default, but can be
overridden with the "--no-dangling" option).
* jk/fsck-doc:
fsck: always compute USED flags for unreachable objects
doc/fsck: clarify --connectivity-only behavior
The --connectivity-only option avoids opening every object, and instead
just marks reachable objects with a flag and compares this to the set
of all objects. This strategy is discussed in more detail in 3e3f8bd608
(fsck: prepare dummy objects for --connectivity-check, 2017-01-17).
This means that we report _every_ unreachable object as dangling.
Whereas in a full fsck, we'd have actually opened and parsed each of
those unreachable objects, marking their child objects with the USED
flag, to mean "this was mentioned by another object". And thus we can
report only the tip of an unreachable segment of the object graph as
dangling.
You can see this difference with a trivial example:
tree=$(git hash-object -t tree -w /dev/null)
one=$(echo one | git commit-tree $tree)
two=$(echo two | git commit-tree -p $one $tree)
Running `git fsck` will report only $two as dangling, but with
--connectivity-only, both commits (and the tree) are reported. Likewise,
using --lost-found would write all three objects.
We can make --connectivity-only work like the normal case by taking a
separate pass over the unreachable objects, parsing them and marking
objects they refer to as USED. That still avoids parsing any blobs,
though we do pay the cost to access any unreachable commits and trees
(which may or may not be noticeable, depending on how many you have).
If neither --dangling nor --lost-found is in effect, then we can skip
this step entirely, just like we do now. That makes "--connectivity-only
--no-dangling" just as fast as the current "--connectivity-only". I.e.,
we do the correct thing always, but you can still tweak the options to
make it faster if you don't care about dangling objects.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
To future-proof ourselves against a change in the hash, let's use the
more generic "hash mismatch" to refer to integrity problems. Note that
we do advertise this exact string in git-fsck(1). However, the message
itself is marked for translation, meaning we do not expect it to be
machine-readable.
While we're touching that documentation, let's also update it for
grammar and clarity.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
More _("i18n") markings.
* nd/i18n:
fsck: mark strings for translation
fsck: reduce word legos to help i18n
parse-options.c: mark more strings for translation
parse-options.c: turn some die() to BUG()
parse-options: replace opterror() with optname()
repack: mark more strings for translation
remote.c: mark messages for translation
remote.c: turn some error() or die() to BUG()
reflog: mark strings for translation
read-cache.c: add missing colon separators
read-cache.c: mark more strings for translation
read-cache.c: turn die("internal error") to BUG()
attr.c: mark more string for translation
archive.c: mark more strings for translation
alias.c: mark split_cmdline_strerror() strings for translation
git.c: mark more strings for translation
The code to traverse objects for reachability, used to decide what
objects are unreferenced and expendable, have been taught to also
consider per-worktree refs of other worktrees as starting points to
prevent data loss.
* nd/per-worktree-ref-iteration:
git-worktree.txt: correct linkgit command name
reflog expire: cover reflog from all worktrees
fsck: check HEAD and reflog from other worktrees
fsck: move fsck_head_link() to get_default_heads() to avoid some globals
revision.c: better error reporting on ref from different worktrees
revision.c: correct a parameter name
refs: new ref types to make per-worktree refs visible to all worktrees
Add a place for (not) sharing stuff between worktrees
refs.c: indent with tabs, not spaces
Two die() are updated to start with lowercase to be consistent with
the rest.
Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
There are a couple other improvements on these strings as well:
- add missing colon (as separator)
- quote paths
- provide more information on error messages
- keep first word in lowercase
Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
This commit fixes an infinite loop when fscking large
truncated loose objects.
The check_stream_sha1() function takes an mmap'd loose
object buffer and streams 4k of output at a time, checking
its sha1. The loop quits when we've output enough bytes (we
know the size from the object header), or when zlib tells us
anything except Z_OK or Z_BUF_ERROR.
The latter is expected because zlib may run out of room in
our 4k buffer, and that is how it tells us to process the
output and loop again.
But Z_BUF_ERROR also covers another case: one in which zlib
cannot make forward progress because it needs more _input_.
This should never happen in this loop, because though we're
streaming the output, we have the entire deflated input
available in the mmap'd buffer. But since we don't check
this case, we'll just loop infinitely if we do see a
truncated object, thinking that zlib is asking for more
output space.
It's tempting to fix this by checking stream->avail_in as
part of the loop condition (and quitting if all of our bytes
have been consumed). But that assumes that once zlib has
consumed the input, there is nothing left to do. That's not
necessarily the case: it may have read our input into its
internal state, but still have bytes to output.
Instead, let's continue on Z_BUF_ERROR only when we see the
case we're expecting: the previous round filled our output
buffer completely. If it didn't (and we still saw
Z_BUF_ERROR), we know something is wrong and should break
out of the loop.
The bug comes from commit f6371f9210 (sha1_file: add
read_loose_object() function, 2017-01-13), which
reimplemented some of the existing loose object functions.
So it's worth checking if this bug was inherited from any of
those. The answers seems to be no. The two obvious
candidates are both OK:
1. unpack_sha1_rest(); this doesn't need to loop on
Z_BUF_ERROR at all, since it allocates the expected
output buffer in advance (which we can't do since we're
explicitly streaming here)
2. check_object_signature(); the streaming path relies on
the istream interface, which uses read_istream_loose()
for this case. That function uses a similar "is our
output buffer full" check with Z_BUF_ERROR (which is
where I stole it from for this patch!)
Reported-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Helped-by: Junio C Hamano <gitster@pobox.com>
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Commit cce044df7f (fsck: detect trailing garbage in all
object types, 2017-01-13) added two tests of trailing
garbage in a loose object file: one with a commit and one
with a blob. The point of having two is that blobs would
follow a different code path that streamed the contents,
instead of loading it into a buffer as usual.
At the time, merely being a blob was enough to trigger the
streaming code path. But since 7ac4f3a007 (fsck: actually
fsck blob data, 2018-05-02), we now only stream blobs that
are actually large. So since then, the streaming code path
is not tested at all for this case.
We can restore the original intent of the test by tweaking
core.bigFileThreshold to make our small blob seem large.
There's no easy way to externally verify that we followed
the streaming code path, but I did check before/after using
a temporary debug statement.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
fsck is a repo-wide operation and should check all references no
matter which worktree they are associated to.
Reported-by: Jeff King <peff@peff.net>
Helped-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Test fixes.
* sg/test-must-be-empty:
tests: use 'test_must_be_empty' instead of 'test_cmp <empty> <out>'
tests: use 'test_must_be_empty' instead of 'test_cmp /dev/null <out>'
tests: use 'test_must_be_empty' instead of 'test ! -s'
tests: use 'test_must_be_empty' instead of '! test -s'
Using 'test_must_be_empty' is shorter and more idiomatic than
>empty &&
test_cmp empty out
as it saves the creation of an empty file. Furthermore, sometimes the
expected empty file doesn't have such a descriptive name like 'empty',
and its creation is far away from the place where it's finally used
for comparison (e.g. in 't7600-merge.sh', where two expected empty
files are created in the 'setup' test, but are used only about 500
lines later).
These cases were found by instrumenting 'test_cmp' to error out the
test script when it's used to compare empty files, and then converted
manually.
Note that even after this patch there still remain a lot of cases
where we use 'test_cmp' to check empty files:
- Sometimes the expected output is not hard-coded in the test, but
'test_cmp' is used to ensure that two similar git commands produce
the same output, and that output happens to be empty, e.g. the
test 'submodule update --merge - ignores --merge for new
submodules' in 't7406-submodule-update.sh'.
- Repetitive common tasks, including preparing the expected results
and running 'test_cmp', are often extracted into a helper
function, and some of this helper's callsites expect no output.
- For the same reason as above, the whole 'test_expect_success'
block is within a helper function, e.g. in 't3070-wildmatch.sh'.
- Or 'test_cmp' is invoked in a loop, e.g. the test 'cvs update
(-p)' in 't9400-git-cvsserver-server.sh'.
Signed-off-by: SZEDER Gábor <szeder.dev@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Switch all uses of $_z40 to $ZERO_OID so that they work correctly with
larger hashes. This commit was created by using the following sed
command to modify all files in the t directory except t/test-lib.sh:
sed -i 's/\$_z40/$ZERO_OID/g'
Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
lookup_blob() and lookup_tree() can return NULL if they find an object
of an unexpected type. Accessing the object member is undefined in that
case. Cast the result to a struct object pointer instead; we can do
that because object is the first member of all object types. This trick
is already used in other places in the code.
An error message is already shown by object_as_type(), which is called
by the lookup functions. The walk callback functions are expected to
handle NULL object pointers passed to them, but put_object_name() needs
a valid object, so avoid calling it without one.
Suggested-by: SZEDER Gábor <szeder.dev@gmail.com>
Helped-by: Junio C Hamano <gitster@pobox.com>
Signed-off-by: Rene Scharfe <l.s.r@web.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
t1450-fsck.sh does not have a test that checks fsck's behavior when a
packfile is invalid. It does have a test for when an object in a
packfile is invalid, but in that test, the packfile itself is valid.
Add such a test.
Signed-off-by: Jonathan Tan <jonathantanmy@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
GNU grep allows "\(A\|B\)" as alternation in BRE, but this is an
extension not understood by some other implementations of grep
(Michael Kebe reported an breakage on Solaris).
Rewrite the offending test to ERE and use egrep instead.
Noticed-by: Michael Kebe <michael.kebe@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The previous step added a path zzzzzzzz to the index, and then used
"sed" to replace this string to yyyyyyyy to create a test case where
the checksum at the end of the file does not match the contents.
Unfortunately, use of "sed" on a non-text file is not portable.
Instead, use a Perl script that seeks to the end and modifies the
last byte of the file (where we _know_ stores the trailing
checksum).
Signed-off-by: Jeff Hostetler <jeffhost@microsoft.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Teach git to skip verification of the SHA1-1 checksum at the end of
the index file in verify_hdr() which is called from read_index()
unless the "force_verify_index_checksum" global variable is set.
Teach fsck to force this verification.
The checksum verification is for detecting disk corruption, and for
small projects, the time it takes to compute SHA-1 is not that
significant, but for gigantic repositories this calculation adds
significant time to every command.
These effect can be seen using t/perf/p0002-read-cache.sh:
Test HEAD~1 HEAD
--------------------------------------------------------------------------------------
0002.1: read_cache/discard_cache 1000 times 0.66(0.44+0.20) 0.30(0.27+0.02) -54.5%
Signed-off-by: Jeff Hostetler <jeffhost@microsoft.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
"git fsck --connectivity-check" was not working at all.
* jk/fsck-connectivity-check-fix:
fsck: lazily load types under --connectivity-only
fsck: move typename() printing to its own function
t1450: use "mv -f" within loose object directory
fsck: check HAS_OBJ more consistently
fsck: do not fallback "git fsck <bogus>" to "git fsck"
fsck: tighten error-checks of "git fsck <head>"
fsck: prepare dummy objects for --connectivity-check
fsck: report trees as dangling
t1450: clean up sub-objects in duplicate-entry test
The loose objects are created with mode 0444. That doesn't
prevent them being overwritten by rename(), but some
versions of "mv" will be extra careful and prompt the user,
even without "-i".
Reportedly macOS does this, at least in the Travis builds.
The prompt reads from /dev/null, defaulting to "no", and the
object isn't moved. Then to make matters even more
interesting, it still returns "0" and the rest of the test
proceeds, but with a broken setup.
We can work around it by using "mv -f" to override the
prompt. This should work as it's already used in t5504 for
the same purpose.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Since fsck tries to continue as much as it can after seeing
an error, we still do the reachability check even if some
heads we were given on the command-line are bogus. But if
_none_ of the heads is is valid, we fallback to checking all
refs and the index, which is not what the user asked for at
all.
Instead of checking "heads", the number of successful heads
we got, check "argc" (which we know only has non-options in
it, because parse_options removed the others).
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Instead of checking reachability from the refs, you can ask
fsck to check from a particular set of heads. However, the
error checking here is quite lax. In particular:
1. It claims lookup_object() will report an error, which
is not true. It only does a hash lookup, and the user
has no clue that their argument was skipped.
2. When either the name or sha1 cannot be resolved, we
continue to exit with a successful error code, even
though we didn't check what the user asked us to.
This patch fixes both of these cases.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Normally fsck makes a pass over all objects to check their
integrity, and then follows up with a reachability check to
make sure we have all of the referenced objects (and to know
which ones are dangling). The latter checks for the HAS_OBJ
flag in obj->flags to see if we found the object in the
first pass.
Commit 02976bf85 (fsck: introduce `git fsck --connectivity-only`,
2015-06-22) taught fsck to skip the initial pass, and to
fallback to has_sha1_file() instead of the HAS_OBJ check.
However, it converted only one HAS_OBJ check to use
has_sha1_file(). But there are many other places in
builtin/fsck.c that assume that the flag is set (or that
lookup_object() will return an object at all). This leads to
several bugs with --connectivity-only:
1. mark_object() will not queue objects for examination,
so recursively following links from commits to trees,
etc, did nothing. I.e., we were checking the
reachability of hardly anything at all.
2. When a set of heads is given on the command-line, we
use lookup_object() to see if they exist. But without
the initial pass, we assume nothing exists.
3. When loading reflog entries, we do a similar
lookup_object() check, and complain that the reflog is
broken if the object doesn't exist in our hash.
So in short, --connectivity-only is broken pretty badly, and
will claim that your repository is fine when it's not.
Presumably nobody noticed for a few reasons.
One is that the embedded test does not actually test the
recursive nature of the reachability check. All of the
missing objects are still in the index, and we directly
check items from the index. This patch modifies the test to
delete the index, which shows off breakage (1).
Another is that --connectivity-only just skips the initial
pass for loose objects. So on a real repository, the packed
objects were still checked correctly. But on the flipside,
it means that "git fsck --connectivity-only" still checks
the sha1 of all of the packed objects, nullifying its
original purpose of being a faster git-fsck.
And of course the final problem is that the bug only shows
up when there _is_ corruption, which is rare. So anybody
running "git fsck --connectivity-only" proactively would
assume it was being thorough, when it was not.
One possibility for fixing this is to find all of the spots
that rely on HAS_OBJ and tweak them for the connectivity-only
case. But besides the risk that we might miss a spot (and I
found three already, corresponding to the three bugs above),
there are other parts of fsck that _can't_ work without a
full list of objects. E.g., the list of dangling objects.
Instead, let's make the connectivity-only case look more
like the normal case. Rather than skip the initial pass
completely, we'll do an abbreviated one that sets up the
HAS_OBJ flag for each object, without actually loading the
object data.
That's simple and fast, and we don't have to care about the
connectivity_only flag in the rest of the code at all.
While we're at it, let's make sure we treat loose and packed
objects the same (i.e., setting up dummy objects for both
and skipping the actual sha1 check). That makes the
connectivity-only check actually fast on a real repo (40
seconds versus 180 seconds on my copy of linux.git).
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
After checking connectivity, fsck looks through the list of
any objects we've seen mentioned, and reports unreachable
and un-"used" ones as dangling. However, it skips any object
which is not marked as "parsed", as that is an object that
we _don't_ have (but that somebody mentioned).
Since 6e454b9a3 (clear parsed flag when we free tree
buffers, 2013-06-05), that flag can't be relied on, and the
correct method is to check the HAS_OBJ flag. The cleanup in
that commit missed this callsite, though. As a result, we
would generally fail to report dangling trees.
We never noticed because there were no tests in this area
(for trees or otherwise). Let's add some.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
This test creates a multi-level set of trees, but its
cleanup routine only removes the top-level tree. After the
test finishes, the inner tree and the blob it points to
remain, making the inner tree dangling.
A later test ("cleaned up") verifies that we've removed any
cruft and "git fsck" output is clean. This passes only
because of a bug in git-fsck which fails to notice dangling
trees.
In preparation for fixing the bug, let's teach this earlier
test to clean up after itself correctly. We have to remove
the inner tree (and therefore the blob, too, which becomes
dangling after removing that tree).
Since the setup code happens inside a subshell, we can't
just set a variable for each object. However, we can stuff
all of the sha1s into the $T output variable, which is not
used for anything except cleanup.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
When a loose tree or commit is read by fsck (or any git
program), unpack_sha1_rest() checks whether there is extra
cruft at the end of the object file, after the zlib data.
Blobs that are streamed, however, do not have this check.
For normal git operations, it's not a big deal. We know the
sha1 and size checked out, so we have the object bytes we
wanted. The trailing garbage doesn't affect what we're
trying to do.
But since the point of fsck is to find corruption or other
problems, it should be more thorough. This patch teaches its
loose-sha1 reader to detect extra bytes after the zlib
stream and complain.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
When we iterate over the list of loose objects to check, we
get the actual path of each object. But we then throw it
away and pass just the sha1 to fsck_sha1(), which will do a
fresh lookup. Usually it would find the same object, but it
may not if an object exists both as a loose and a packed
object. We may end up checking the packed object twice, and
never look at the loose one.
In practice this isn't too terrible, because if fsck doesn't
complain, it means you have at least one good copy. But
since the point of fsck is to look for corruption, we should
be thorough.
The new read_loose_object() interface can help us get the
data from disk, and then we replace parse_object() with
parse_object_buffer(). As a bonus, our error messages now
mention the path to a corrupted object, which should make it
easier to track down errors when they do happen.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The code paths in fsck for packed and loose objects are
quite different, and it is not immediately obvious that the
packed case behaves well. In particular:
1. The fsck_loose() function always returns "0" to tell the
iterator to keep checking more objects. Whereas
fsck_obj_buffer() (which handles packed objects)
returns -1. This is OK, because the callback machinery
for verify_pack() does not stop when it sees a non-zero
return.
2. The fsck_loose() function sets the ERROR_OBJECT bit
when fsck_obj() fails, whereas fsck_obj_buffer() sets it
only when it sees a corrupt object. This turns out not
to matter. We don't actually do anything with this bit
except exit the program with a non-zero code, and that
is handled already by the non-zero return from the
function.
So there are no bugs here, but it was certainly confusing to
me. And we do not test either of the properties in t1450
(neither that a non-corruption error will caused a non-zero
exit for a packed object, nor that we keep going after
seeing the first error). Let's test both of those
conditions, so that we'll notice if any of those assumptions
becomes invalid.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
When we fail to open a corrupt loose object, we report an
error and mention the filename via sha1_file_name().
However, that function will always give us a path in the
local repository, whereas the corrupt object may have come
from an alternate. The result is a very misleading error
message.
Teach the open_sha1_file() and stat_sha1_file() helpers to
pass back the path they found, so that we can report it
correctly.
Note that the pointers we return go to static storage (e.g.,
from sha1_file_name()), which is slightly dangerous.
However, these helpers are static local helpers, and the
names are used for immediately generating error messages.
The simplicity is an acceptable tradeoff for the danger.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Commit 90cf590f5 (fsck: optionally show more helpful info
for broken links, 2016-07-17) added a remove_loose_object()
helper, but we already had a remove_object() helper that did
the same thing. Let's combine these into one.
The implementations had a few subtle differences, so I've
tried to take the best of both:
- the original used "sed", but the newer version avoids
spawning an extra process
- the original processed "$*", which was nonsense, as it
assumed only a single sha1. Use "$1" to make that more
clear.
- the newer version ran an extra rev-parse, but it was not
necessary; it's sole caller already converted the
argument into a raw sha1
- the original used "rm -f", whereas the new one uses
"rm". The latter is better because it may notice a bug
or other unexpected failure in the test. (The original
does check that the object exists before we remove it,
which is good, but that's a subset of the possible
unexpected conditions).
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Instead of dying when fsck hits a malformed tree object, log the error
like any other and continue. Now fsck can tell the user which tree is
bad, too.
Signed-off-by: David Turner <dturner@twosigma.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
When reporting broken links between commits/trees/blobs, it would be
quite helpful at times if the user would be told how the object is
supposed to be reachable.
With the new --name-objects option, git-fsck will try to do exactly
that: name the objects in a way that shows how they are reachable.
For example, when some reflog got corrupted and a blob is missing that
should not be, the user might want to remove the corresponding reflog
entry. This option helps them find that entry: `git fsck` will now
report something like this:
broken link from tree b5eb6ff... (refs/stash@{<date>}~37:)
to blob ec5cf80...
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Even though a Git commit object is designed to be capable of storing
any binary data as its payload, in practice people use it to describe
the changes in textual form, and tools like "git log" are designed to
treat the payload as text.
Detect and warn when we see any commit object with a NUL byte in
it.
Note that a NUL byte in the header part is already detected as a
grave error. This change is purely about the message part.
Signed-off-by: Junio C Hamano <gitster@pobox.com>
We check the return value of verify_header() for commits already, so do
the same for tags as well.
Signed-off-by: Rene Scharfe <l.s.r@web.de>
Signed-off-by: Jeff King <peff@peff.net>
There were some classes of errors that "git fsck" diagnosed to its
standard error that did not cause it to exit with non-zero status.
* jc/fsck-dropped-errors:
fsck: exit with non-zero when problems are found
After finding some problems (e.g. a ref refs/heads/X points at an
object that is not a commit) and issuing an error message, the
program failed to signal the fact that it found an error by a
non-zero exit status.
Signed-off-by: Junio C Hamano <gitster@pobox.com>
This option avoids unpacking each and all blob objects, and just
verifies the connectivity. In particular with large repositories, this
speeds up the operation, at the expense of missing corrupt blobs,
ignoring unreachable objects and other fsck issues, if any.
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
We already have support in `git receive-pack` to deal with some legacy
repositories which have non-fatal issues.
Let's make `git fsck` itself useful with such repositories, too, by
allowing users to ignore known issues, or at least demote those issues
to mere warnings.
Example: `git -c fsck.missingEmail=ignore fsck` would hide
problems with missing emails in author, committer and tagger lines.
In the same spirit that `git receive-pack`'s usage of the fsck machinery
differs from `git fsck`'s – some of the non-fatal warnings in `git fsck`
are fatal with `git receive-pack` when receive.fsckObjects = true, for
example – we strictly separate the fsck.<msg-id> from the
receive.fsck.<msg-id> settings.
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Some repositories written by legacy code have objects with non-fatal
fsck issues. To allow the user to ignore those issues, let's print
out the ID (e.g. when encountering "missingEmail", the user might
want to call `git config --add receive.fsck.missingEmail=warn`).
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Our is_hfs_dotgit function relies on the hackily-implemented
next_hfs_char to give us the next character that an HFS+
filename comparison would look at. It's hacky because it
doesn't implement the full case-folding table of HFS+; it
gives us just enough to see if the path matches ".git".
At the end of next_hfs_char, we use tolower() to convert our
32-bit code point to lowercase. Our tolower() implementation
only takes an 8-bit char, though; it throws away the upper
24 bits. This means we can't have any false negatives for
is_hfs_dotgit. We only care about matching 7-bit ASCII
characters in ".git", and we will correctly process 'G' or
'g'.
However, we _can_ have false positives. Because we throw
away the upper bits, code point \u{0147} (for example) will
look like 'G' and get downcased to 'g'. It's not known
whether a sequence of code points whose truncation ends up
as ".git" is meaningful in any language, but it does not
hurt to be more accurate here. We can just pass out the full
32-bit code point, and compare it manually to the upper and
lowercase characters we care about.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
New tag object format validation added in 2.2 showed garbage
after a tagname it reported in its error message.
* js/fsck-tag-validation:
index-pack: terminate object buffers with NUL
fsck: properly bound "invalid tag name" error message
Now that the index can block pathnames that can be mistaken
to mean ".git" on NTFS and FAT32, it would be helpful for
fsck to notice such problematic paths. This lets servers
which use receive.fsckObjects block them before the damage
spreads.
Note that the fsck check is always on, even for systems
without core.protectNTFS set. This is technically more
restrictive than we need to be, as a set of users on ext4
could happily use these odd filenames without caring about
NTFS.
However, on balance, it's helpful for all servers to block
these (because the paths can be used for mischief, and
servers which bother to fsck would want to stop the spread
whether they are on NTFS themselves or not), and hardly
anybody will be affected (because the blocked names are
variants of .git or git~1, meaning mischief is almost
certainly what the tree author had in mind).
Ideally these would be controlled by a separate
"fsck.protectNTFS" flag. However, it would be much nicer to
be able to enable/disable _any_ fsck flag individually, and
any scheme we choose should match such a system. Given the
likelihood of anybody using such a path in practice, it is
not unreasonable to wait until such a system materializes.
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Now that the index can block pathnames that case-fold to
".git" on HFS+, it would be helpful for fsck to notice such
problematic paths. This lets servers which use
receive.fsckObjects block them before the damage spreads.
Note that the fsck check is always on, even for systems
without core.protectHFS set. This is technically more
restrictive than we need to be, as a set of users on ext4
could happily use these odd filenames without caring about
HFS+.
However, on balance, it's helpful for all servers to block
these (because the paths can be used for mischief, and
servers which bother to fsck would want to stop the spread
whether they are on HFS+ themselves or not), and hardly
anybody will be affected (because the blocked names are
variants of .git with invisible Unicode code-points mixed
in, meaning mischief is almost certainly what the tree
author had in mind).
Ideally these would be controlled by a separate
"fsck.protectHFS" flag. However, it would be much nicer to
be able to enable/disable _any_ fsck flag individually, and
any scheme we choose should match such a system. Given the
likelihood of anybody using such a path in practice, it is
not unreasonable to wait until such a system materializes.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
We complain about ".git" in a tree because it cannot be
loaded into the index or checked out. Since we now also
reject ".GIT" case-insensitively, fsck should notice the
same, so that errors do not propagate.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
We check that fsck notices and complains about confusing
paths in trees. However, there are a few shortcomings:
1. We check only for these paths as file entries, not as
intermediate paths (so ".git" and not ".git/foo").
2. We check "." and ".." together, so it is possible that
we notice only one and not the other.
3. We repeat a lot of boilerplate.
Let's use some loops to be more thorough in our testing, and
still end up with shorter code.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
When we detect an invalid tag-name header in a tag object,
like, "tag foo bar\n", we feed the pointer starting at "foo
bar" to a printf "%s" formatter. This shows the name, as we
want, but then it keeps printing the rest of the tag buffer,
rather than stopping at the end of the line.
Our tests did not notice because they look only for the
matching line, but the bug is that we print much more than
we wanted to. So we also adjust the test to be more exact.
Note that when fscking tags with "index-pack --strict", this
is even worse. index-pack does not add a trailing
NUL-terminator after the object, so we may actually read
past the buffer and print uninitialized memory. Running
t5302 with valgrind does notice the bug for that reason.
Signed-off-by: Jeff King <peff@peff.net>
Acked-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Using "hash-object --literally", test one of the new breakages
js/fsck-tag-validation topic teaches "fsck" to catch is caught.
* jc/hash-object-fsck-tag:
t1450: make sure fsck detects a malformed tagger line
Teach "git fsck" to inspect the contents of annotated tag objects.
* js/fsck-tag-validation:
Make sure that index-pack --strict checks tag objects
Add regression tests for stricter tag fsck'ing
fsck: check tag objects' headers
Make sure fsck_commit_buffer() does not run out of the buffer
fsck_object(): allow passing object data separately from the object itself
Refactor type_from_string() to allow continuing after detecting an error
With "hash-object --literally", write a tag object that is not
supposed to pass one of the new checks added to "fsck", and make
sure that the new check catches the breakage.
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Fsck tries hard to detect missing objects, and will complain
(and exit non-zero) about any inter-object links that are
missing. However, it will not exit non-zero for any missing
ref tips, meaning that a severely broken repository may
still pass "git fsck && echo ok".
The problem is that we use for_each_ref to iterate over the
ref tips, which hides broken tips. It does at least print an
error from the refs.c code, but fsck does not ever see the
ref and cannot note the problem in its exit code. We can solve
this by using for_each_rawref and noting the error ourselves.
In addition to adding tests for this case, we add tests for
all types of missing-object links (all of which worked, but
which we were not testing).
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The intent of the new test case is to catch general breakages in
the fsck_tag() function, not so much to test it extensively, trying to
strike the proper balance between thoroughness and speed.
While it *would* have been nice to test the code path where fsck_object()
encounters an invalid tag object, this is not possible using git fsck: tag
objects are parsed already before fsck'ing (and the parser already fails
upon such objects).
Even worse: we would not even be able write out invalid tag objects
because git hash-object parses those objects, too, unless we resorted to
really ugly hacks such as using something like this in the unit tests
(essentially depending on Perl *and* Compress::Zlib):
hash_invalid_object () {
contents="$(printf '%s %d\0%s' "$1" ${#2} "$2")" &&
sha1=$(echo "$contents" | test-sha1) &&
suffix=${sha1#??} &&
mkdir -p .git/objects/${sha1%$suffix} &&
echo "$contents" |
perl -MCompress::Zlib -e 'undef $/; print compress(<>)' \
> .git/objects/${sha1%$suffix}/$suffix &&
echo $sha1
}
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Upon finding a corrupt loose object, we forgot to note the error to
signal it with the exit status of the entire process.
[jc: adjusted t1450 and added another test]
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
When we check commit objects, we complain if commit->date is
ULONG_MAX, which is an indication that we saw integer
overflow when parsing it. However, we do not do any check at
all for author lines, which also contain a timestamp.
Let's actually check the timestamps on each ident line
with strtoul. This catches both author and committer lines,
and we can get rid of the now-redundant commit->date check.
Note that like the existing check, we compare only against
ULONG_MAX. Now that we are calling strtoul at the site of
the check, we could be slightly more careful and also check
that errno is set to ERANGE. However, this will make further
refactoring in future patches a little harder, and it
doesn't really matter in practice.
For 32-bit systems, one would have to create a commit at the
exact wrong second in 2038. But by the time we get close to
that, all systems will hopefully have moved to 64-bit (and
if they haven't, they have a real problem one second later).
For 64-bit systems, by the time we get close to ULONG_MAX,
all systems will hopefully have been consumed in the fiery
wrath of our expanding Sun.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Having a ".git" entry inside a tree can cause confusing
results on checkout. At the top-level, you could not
checkout such a tree, as it would complain about overwriting
the real ".git" directory. In a subdirectory, you might
check it out, but performing operations in the subdirectory
would confusingly consider the in-tree ".git" directory as
the repository.
The regular git tools already make it hard to accidentally
add such an entry to a tree, and do not allow such entries
to enter the index at all. Teaching fsck about it provides
an additional safety check, and let's us avoid propagating
any such bogosity when transfer.fsckObjects is on.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
A tree with meta-paths like '.' or '..' does not work well
with git; the index will refuse to load it or check it out
to the filesystem (and even if we did not have that safety,
it would look like we were overwriting an untracked
directory). For the same reason, it is difficult to create
such a tree with regular git.
Let's warn about these dubious entries during fsck, just in
case somebody has created a bogus tree (and this also lets
us prevent them from propagating when transfer.fsckObjects
is set).
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
When a tag T points at an object X that is of a type that is
different from what the tag records as, fsck should report it as an
error.
However, depending on the order X and T are checked individually,
the actual error message can be different. If X is checked first,
fsck remembers X's type and then when it checks T, it notices that T
records X as a wrong type (i.e. the complaint is about a broken tag
T). If T is checked first, on the other hand, fsck remembers that we
need to verify X is of the type tag records, and when it later
checks X, it notices that X is of a wrong type (i.e. the complaint
is about a broken object X).
The important thing is that fsck notices such an error and diagnoses
the issue on object X, but the test was expecting that we happen to
check objects in the order to make us detect issues with tag T, not
with object X. Remove this unwarranted assumption.
Signed-off-by: Junio C Hamano <gitster@pobox.com>
"git diff" had a confusion between taking data from a path in the
working tree and taking data from an object that happens to have
name 0{40} recorded in a tree.
* jk/maint-null-in-trees:
fsck: detect null sha1 in tree entries
do not write null sha1s to on-disk index
diff: do not use null sha1 as a sentinel value
Short of somebody happening to beat the 1 in 2^160 odds of
actually generating content that hashes to the null sha1, we
should never see this value in a tree entry. So let's have
fsck warn if it it seen.
As in the previous commit, we test both blob and submodule
entries to future-proof the test suite against the
implementation depending on connectivity to notice the
error.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The default output from "fsck" is often overwhelmed by informational
message on dangling objects, especially if you do not repack often, and a
real error can easily be buried.
Add "--no-dangling" option to omit them, and update the user manual to
demonstrate its use.
Based on a patch by Clemens Buchacher, but reverted the part to change
the default to --no-dangling, which is unsuitable for the first patch.
The usual three-step procedure to break the backward compatibility over
time needs to happen on top of this, if we were to go in that direction.
Signed-off-by: Junio C Hamano <gitster@pobox.com>
git rev-list passes rev_list_info, not rev_list objects. Without this
fix, rev-list enables or disables the --verify-objects option depending
on a read from an undefined memory location.
Signed-off-by: Clemens Buchacher <drizzd@aon.at>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
fsck allows a name with > character in it like "name> <email>". Also for
"name email>" fsck says "missing space before email".
More precisely, it seeks for a first '<', checks that ' ' preceeds it.
Then seeks to '<' or '>' and checks that it is the '>'. Missing space is
reported if either '<' is not found or it's not preceeded with ' '.
Change it to following. Seek to '<' or '>', check that it is '<' and is
preceeded with ' '. Seek to '<' or '>' and check that it is '>'. So now
"name> <email>" is rejected as "bad name". More strict name check is the
only change in what is accepted.
Report 'missing space' only if '<' is found and is not preceeded with a
space.
Signed-off-by: Dmitry Ivankov <divanorama@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
fsck reports "missing space before <email>" for committer string equal
to "name email>" or to "". It'd be nicer to say "missing email" for
the second string and "name is bad" (has > in it) for the first one.
Add a failing test for these messages.
For "name> <email>" no error is reported. Looks like a bug, so add
such a failing test."
Signed-off-by: Dmitry Ivankov <divanorama@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Breaks in a test assertion's && chain can potentially hide
failures from earlier commands in the chain.
Commands intended to fail should be marked with !, test_must_fail, or
test_might_fail. The examples in this patch do not require that.
Signed-off-by: Jonathan Nieder <jrnieder@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The fsck test is generally careful to remove the corrupt objects
it inserts, but dangling objects are left behind due to some typos
and omissions. It is better to clean up more completely, to
simplify the addition of later tests. So:
- guard setup and cleanup with test_expect_success to catch
typos and errors;
- check both stdout and stderr when checking for empty fsck
output;
- use test_cmp empty file in place of test $(wc -l <file) = 0,
for better debugging output when running tests with -v;
- add a remove_object () helper and use it to replace broken
object removal code that forgot about the fanout in
.git/objects;
- disable gc.auto, to avoid tripping up object removal if the
number of objects ever reaches that threshold.
- use test_when_finished to ensure cleanup tasks are run and
succeed when tests fail;
- add a new final test that no breakage or dangling objects
was left behind.
While at it, add a brief description to test_description of the
history that is expected to persist between tests.
Part of a campaign to clean up subshell usage in tests.
Cc: Thomas Rast <trast@student.ethz.ch>
Signed-off-by: Jonathan Nieder <jrnieder@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
daae1922 (fsck: check ident lines in commit objects, 2010-04-24)
taught fsck to expect commit objects to have the form
tree <object name>
<parents>
author <valid ident string>
committer <valid ident string>
log message
The check is overly strict: for example, it errors out with the
message “expected blank line” for perfectly valid commits with an
"encoding ISO-8859-1" line.
Later it might make sense to teach fsck about the rest of the header
and warn about unrecognized header lines, but for simplicity, let’s
accept arbitrary trailing lines for now.
Reported-by: Tuncer Ayaz <tuncer.ayaz@gmail.com>
Signed-off-by: Jonathan Nieder <jrnieder@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Check that email addresses do not contain <, >, or newline so they can
be quickly scanned without trouble. The copy() function in ident.c
already ensures that ordinary git commands will not write email
addresses without this property.
Signed-off-by: Jonathan Nieder <jrnieder@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Almost exactly a year ago in 02a6552 (Test fsck a bit harder), I
introduced two testcases that were expecting failure.
However, the only bug was that the testcases wrote *blobs* because I
forgot to pass -t tag to hash-object. Fix this, and then adjust the
rest of the test to properly check the result.
Signed-off-by: Thomas Rast <trast@student.ethz.ch>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
git-fsck, of all tools, has very few tests. This adds some more:
* a corrupted object;
* a branch pointing to a non-commit;
* a tag pointing to a nonexistent object;
* and a tag pointing to an object of a type other than what the tag
itself claims.
Only the first two are caught. At least the third probably should,
too, but currently slips through.
Signed-off-by: Thomas Rast <trast@student.ethz.ch>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
"git fsck" used to validate only loose objects that are local and nothing
else by default. This is not just too little when a repository is
borrowing objects from other object stores, but also caused the
connectivity check to mistakenly declare loose objects borrowed from them
to be missing.
The rationale behind the default mode that validates only loose objects is
because these objects are still young and more unlikely to have been
pushed to other repositories yet. That holds for loose objects borrowed
from alternate object stores as well.
Signed-off-by: Junio C Hamano <gitster@pobox.com>
By default we looked at all refs but not HEAD. The only thing that made
fsck not lose sight of commits that are only reachable from a detached
HEAD was the reflog for the HEAD.
This fixes it, with a new test.
Signed-off-by: Junio C Hamano <gitster@pobox.com>