development/git - HydraGit

mirror of https://github.com/git/git synced 2024-11-05 18:59:29 +00:00

Author	SHA1	Message	Date
Junio C Hamano	f8cb64e3d4	Merge branch 'bc/sha-256-part-1-of-4' SHA-256 transition continues. * bc/sha-256-part-1-of-4: (22 commits) fast-import: add options for rewriting submodules fast-import: add a generic function to iterate over marks fast-import: make find_marks work on any mark set fast-import: add helper function for inserting mark object entries fast-import: permit reading multiple marks files commit: use expected signature header for SHA-256 worktree: allow repository version 1 init-db: move writing repo version into a function builtin/init-db: add environment variable for new repo hash builtin/init-db: allow specifying hash algorithm on command line setup: allow check_repository_format to read repository format t/helper: make repository tests hash independent t/helper: initialize repository if necessary t/helper/test-dump-split-index: initialize git repository t6300: make hash algorithm independent t6300: abstract away SHA-1-specific constants t: use hash-specific lookup tables to define test constants repository: require a build flag to use SHA-256 hex: add functions to parse hex object IDs in any algorithm hex: introduce parsing variants taking hash algorithms ...	2020-03-26 17:11:20 -07:00
Martin Ågren	3c29e21eb0	t: drop debug `cat` calls We `cat` files, but don't inspect or grab the contents in any way. Unlike in an earlier commit, there is no reason to suspect that these files could be missing, so `cat`-ing them is just wasted effort. Signed-off-by: Martin Ågren <martin.agren@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2020-02-24 11:18:25 -08:00
brian m. carlson	42d4e1d112	commit: use expected signature header for SHA-256 The transition plan anticipates that we will allow signatures using multiple algorithms in a single commit. In order to do so, we need to use a different header per algorithm so that it will be obvious over which data to compute the signature. The transition plan specifies that we should use "gpgsig-sha256", so wire up the commit code such that it can write and parse the current algorithm, and it can remove the headers for any algorithm when creating a new commit. Add tests to ensure that we write using the right header and that git fsck doesn't reject these commits. Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2020-02-24 09:33:30 -08:00
Junio C Hamano	7034cd094b	Sync with Git 2.24.1	2019-12-09 22:17:55 -08:00
Johannes Schindelin	67af91c47a	Sync with 2.23.1 * maint-2.23: (44 commits) Git 2.23.1 Git 2.22.2 Git 2.21.1 mingw: sh arguments need quoting in more circumstances mingw: fix quoting of empty arguments for `sh` mingw: use MSYS2 quoting even when spawning shell scripts mingw: detect when MSYS2's sh is to be spawned more robustly t7415: drop v2.20.x-specific work-around Git 2.20.2 t7415: adjust test for dubiously-nested submodule gitdirs for v2.20.x Git 2.19.3 Git 2.18.2 Git 2.17.3 Git 2.16.6 test-drop-caches: use `has_dos_drive_prefix()` Git 2.15.4 Git 2.14.6 mingw: handle `subst`-ed "DOS drives" mingw: refuse to access paths with trailing spaces or periods mingw: refuse to access paths with illegal characters ...	2019-12-06 16:31:39 +01:00
Johannes Schindelin	7fd9fd94fb	Sync with 2.22.2 * maint-2.22: (43 commits) Git 2.22.2 Git 2.21.1 mingw: sh arguments need quoting in more circumstances mingw: fix quoting of empty arguments for `sh` mingw: use MSYS2 quoting even when spawning shell scripts mingw: detect when MSYS2's sh is to be spawned more robustly t7415: drop v2.20.x-specific work-around Git 2.20.2 t7415: adjust test for dubiously-nested submodule gitdirs for v2.20.x Git 2.19.3 Git 2.18.2 Git 2.17.3 Git 2.16.6 test-drop-caches: use `has_dos_drive_prefix()` Git 2.15.4 Git 2.14.6 mingw: handle `subst`-ed "DOS drives" mingw: refuse to access paths with trailing spaces or periods mingw: refuse to access paths with illegal characters unpack-trees: let merged_entry() pass through do_add_entry()'s errors ...	2019-12-06 16:31:30 +01:00
Johannes Schindelin	5421ddd8d0	Sync with 2.21.1 * maint-2.21: (42 commits) Git 2.21.1 mingw: sh arguments need quoting in more circumstances mingw: fix quoting of empty arguments for `sh` mingw: use MSYS2 quoting even when spawning shell scripts mingw: detect when MSYS2's sh is to be spawned more robustly t7415: drop v2.20.x-specific work-around Git 2.20.2 t7415: adjust test for dubiously-nested submodule gitdirs for v2.20.x Git 2.19.3 Git 2.18.2 Git 2.17.3 Git 2.16.6 test-drop-caches: use `has_dos_drive_prefix()` Git 2.15.4 Git 2.14.6 mingw: handle `subst`-ed "DOS drives" mingw: refuse to access paths with trailing spaces or periods mingw: refuse to access paths with illegal characters unpack-trees: let merged_entry() pass through do_add_entry()'s errors quote-stress-test: offer to test quoting arguments for MSYS2 sh ...	2019-12-06 16:31:23 +01:00
Johannes Schindelin	fc346cb292	Sync with 2.20.2 * maint-2.20: (36 commits) Git 2.20.2 t7415: adjust test for dubiously-nested submodule gitdirs for v2.20.x Git 2.19.3 Git 2.18.2 Git 2.17.3 Git 2.16.6 test-drop-caches: use `has_dos_drive_prefix()` Git 2.15.4 Git 2.14.6 mingw: handle `subst`-ed "DOS drives" mingw: refuse to access paths with trailing spaces or periods mingw: refuse to access paths with illegal characters unpack-trees: let merged_entry() pass through do_add_entry()'s errors quote-stress-test: offer to test quoting arguments for MSYS2 sh t6130/t9350: prepare for stringent Win32 path validation quote-stress-test: allow skipping some trials quote-stress-test: accept arguments to test via the command-line tests: add a helper to stress test argument quoting mingw: fix quoting of arguments Disallow dubiously-nested submodule git directories ...	2019-12-06 16:31:12 +01:00
Johannes Schindelin	d851d94151	Sync with 2.19.3 * maint-2.19: (34 commits) Git 2.19.3 Git 2.18.2 Git 2.17.3 Git 2.16.6 test-drop-caches: use `has_dos_drive_prefix()` Git 2.15.4 Git 2.14.6 mingw: handle `subst`-ed "DOS drives" mingw: refuse to access paths with trailing spaces or periods mingw: refuse to access paths with illegal characters unpack-trees: let merged_entry() pass through do_add_entry()'s errors quote-stress-test: offer to test quoting arguments for MSYS2 sh t6130/t9350: prepare for stringent Win32 path validation quote-stress-test: allow skipping some trials quote-stress-test: accept arguments to test via the command-line tests: add a helper to stress test argument quoting mingw: fix quoting of arguments Disallow dubiously-nested submodule git directories protect_ntfs: turn on NTFS protection by default path: also guard `.gitmodules` against NTFS Alternate Data Streams ...	2019-12-06 16:30:49 +01:00
Johannes Schindelin	7c9fbda6e2	Sync with 2.18.2 * maint-2.18: (33 commits) Git 2.18.2 Git 2.17.3 Git 2.16.6 test-drop-caches: use `has_dos_drive_prefix()` Git 2.15.4 Git 2.14.6 mingw: handle `subst`-ed "DOS drives" mingw: refuse to access paths with trailing spaces or periods mingw: refuse to access paths with illegal characters unpack-trees: let merged_entry() pass through do_add_entry()'s errors quote-stress-test: offer to test quoting arguments for MSYS2 sh t6130/t9350: prepare for stringent Win32 path validation quote-stress-test: allow skipping some trials quote-stress-test: accept arguments to test via the command-line tests: add a helper to stress test argument quoting mingw: fix quoting of arguments Disallow dubiously-nested submodule git directories protect_ntfs: turn on NTFS protection by default path: also guard `.gitmodules` against NTFS Alternate Data Streams is_ntfs_dotgit(): speed it up ...	2019-12-06 16:30:38 +01:00
Johannes Schindelin	14af7ed5a9	Sync with 2.17.3 * maint-2.17: (32 commits) Git 2.17.3 Git 2.16.6 test-drop-caches: use `has_dos_drive_prefix()` Git 2.15.4 Git 2.14.6 mingw: handle `subst`-ed "DOS drives" mingw: refuse to access paths with trailing spaces or periods mingw: refuse to access paths with illegal characters unpack-trees: let merged_entry() pass through do_add_entry()'s errors quote-stress-test: offer to test quoting arguments for MSYS2 sh t6130/t9350: prepare for stringent Win32 path validation quote-stress-test: allow skipping some trials quote-stress-test: accept arguments to test via the command-line tests: add a helper to stress test argument quoting mingw: fix quoting of arguments Disallow dubiously-nested submodule git directories protect_ntfs: turn on NTFS protection by default path: also guard `.gitmodules` against NTFS Alternate Data Streams is_ntfs_dotgit(): speed it up mingw: disallow backslash characters in tree objects' file names ...	2019-12-06 16:29:15 +01:00
Johannes Schindelin	e1d911dd4c	mingw: disallow backslash characters in tree objects' file names The backslash character is not a valid part of a file name on Windows. Hence it is dangerous to allow writing files that were unpacked from tree objects, when the stored file name contains a backslash character: it will be misinterpreted as directory separator. This not only causes ambiguity when a tree contains a blob `a\b` and a tree `a` that contains a blob `b`, but it also can be used as part of an attack vector to side-step the careful protections against writing into the `.git/` directory during a clone of a maliciously-crafted repository. Let's prevent that, addressing CVE-2019-1354. Note: we guard against backslash characters in tree objects' file names _only_ on Windows (because on other platforms, even on those where NTFS volumes can be mounted, the backslash character is _not_ a directory separator), and _only_ when `core.protectNTFS = true` (because users might need to generate tree objects for other platforms, of course without touching the worktree, e.g. using `git update-index --cacheinfo`). Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>	2019-12-04 13:20:05 +01:00
Jeff King	a59cfb3230	fsck: unify object-name code Commit `90cf590f53` (fsck: optionally show more helpful info for broken links, 2016-07-17) added a system for decorating objects with names. The code is split across builtin/fsck.c (which gives the initial names) and fsck.c (which adds to the names as it traverses the object graph). This leads to some duplication, where both sites have near-identical describe_object() functions (the difference being that the one in builtin/fsck.c uses a circular array of buffers to allow multiple calls in a single printf). Let's provide a unified object_name API for fsck. That lets us drop the duplication, as well as making the interface boundaries more clear (which will let us refactor the implementation more in a future patch). We'll leave describe_object() in builtin/fsck.c as a thin wrapper around the new API, as it relies on a static global to make its many callers a bit shorter. We'll also convert the bare add_decoration() calls in builtin/fsck.c to put_object_name(). This fixes two minor bugs: 1. We leak many small strings. add_decoration() has a last-one-wins approach: it updates the decoration to the new string and returns the old one. But we ignore the return value, leaking the old string. This is quite common to trigger, since we look at reflogs: the tip of any ref will be described both by looking at the actual ref, as well as the latest reflog entry. So we'd always end up leaking one of those strings. 2. The last-one-wins approach gives us lousy names. For instance, we first look at all of the refs, and then all of the reflogs. So rather than seeing "refs/heads/master", we're likely to overwrite it with "HEAD@{12345678}". We're generally better off using the first name we find. And indeed, the test in t1450 expects this ugly HEAD@{} name. After this patch, we've switched to using fsck_put_object_name()'s first-one-wins semantics, and we output the more human-friendly "refs/tags/julius" (and the test is updated accordingly). Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2019-10-28 14:05:17 +09:00
René Scharfe	f537485fa5	tests: remove "cat foo" before "test_i18ngrep bar foo" Some tests print a file before searching for a pattern using test_i18ngrep. This is useful when debugging tests with --verbose when the pattern is not found as expected. Since `63b1a175ee` (t: make 'test_i18ngrep' more informative on failure, 2018-02-08) test_i18ngrep already shows the contents of a file that doesn't match the expected pattern, though. So don't bother doing the same unconditionally up-front. The contents are not interesting if the expected pattern is found, and showing it twice if it doesn't match is of no use. Signed-off-by: René Scharfe <l.s.r@web.de> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2019-10-07 10:17:44 +09:00
brian m. carlson	8de6b383c1	t1450: make hash size independent Replace several hard-coded full and partial object IDs with variables or computed values. Create junk data to stuff inside an invalid tree that can be either 20 or 32 bytes long. Compute a binary all-zeros object ID instead of hard-coding a 20-byte length. Additionally, compute various object IDs by using test_oid and $EMPTY_BLOB so that this test works with multiple hash algorithms. Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2019-07-01 13:28:18 -07:00
Junio C Hamano	ea2dab1abb	Merge branch 'tb/unexpected' Code tightening against a "wrong" object appearing where an object of a different type is expected, instead of blindly assuming that the connection between objects are correctly made. * tb/unexpected: rev-list: detect broken root trees rev-list: let traversal die when --missing is not in use get_commit_tree(): return NULL for broken tree list-objects.c: handle unexpected non-tree entries list-objects.c: handle unexpected non-blob entries t: introduce tests for unexpected object types t: move 'hex2oct' into test-lib-functions.sh	2019-05-09 00:37:25 +09:00
Taylor Blau	5c07647d98	t: move 'hex2oct' into test-lib-functions.sh The helper 'hex2oct' is used to convert base-16 encoded data into a base-8 binary form, and is useful for preparing data for commands that accept input in a binary format, such as 'git hash-object', via 'printf'. This helper is defined identically in three separate places throughout 't'. Move the definition to test-lib-function.sh, so that it can be used in new test suites, and its definition is not redundant. This will likewise make our job easier in the subsequent commit, which also uses 'hex2oct'. Signed-off-by: Taylor Blau <me@ttaylorr.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2019-04-05 15:06:04 +09:00
Junio C Hamano	ea327760d3	Merge branch 'jk/fsck-doc' "git fsck --connectivity-only" omits computation necessary to sift the objects that are not reachable from any of the refs into unreachable and dangling. This is now enabled when dangling objects are requested (which is done by default, but can be overridden with the "--no-dangling" option). * jk/fsck-doc: fsck: always compute USED flags for unreachable objects doc/fsck: clarify --connectivity-only behavior	2019-03-20 15:16:06 +09:00
Jeff King	8d8c2a5aef	fsck: always compute USED flags for unreachable objects The --connectivity-only option avoids opening every object, and instead just marks reachable objects with a flag and compares this to the set of all objects. This strategy is discussed in more detail in `3e3f8bd608` (fsck: prepare dummy objects for --connectivity-check, 2017-01-17). This means that we report _every_ unreachable object as dangling. Whereas in a full fsck, we'd have actually opened and parsed each of those unreachable objects, marking their child objects with the USED flag, to mean "this was mentioned by another object". And thus we can report only the tip of an unreachable segment of the object graph as dangling. You can see this difference with a trivial example: tree=$(git hash-object -t tree -w /dev/null) one=$(echo one \| git commit-tree $tree) two=$(echo two \| git commit-tree -p $one $tree) Running `git fsck` will report only $two as dangling, but with --connectivity-only, both commits (and the tree) are reported. Likewise, using --lost-found would write all three objects. We can make --connectivity-only work like the normal case by taking a separate pass over the unreachable objects, parsing them and marking objects they refer to as USED. That still avoids parsing any blobs, though we do pay the cost to access any unreachable commits and trees (which may or may not be noticeable, depending on how many you have). If neither --dangling nor --lost-found is in effect, then we can skip this step entirely, just like we do now. That makes "--connectivity-only --no-dangling" just as fast as the current "--connectivity-only". I.e., we do the correct thing always, but you can still tweak the options to make it faster if you don't care about dangling objects. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2019-03-05 22:55:57 +09:00
Jeff King	01f8d5948a	prefer "hash mismatch" to "sha1 mismatch" To future-proof ourselves against a change in the hash, let's use the more generic "hash mismatch" to refer to integrity problems. Note that we do advertise this exact string in git-fsck(1). However, the message itself is marked for translation, meaning we do not expect it to be machine-readable. While we're touching that documentation, let's also update it for grammar and clarity. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2019-01-08 09:41:06 -08:00
Junio C Hamano	3813a89fae	Merge branch 'nd/i18n' More _("i18n") markings. * nd/i18n: fsck: mark strings for translation fsck: reduce word legos to help i18n parse-options.c: mark more strings for translation parse-options.c: turn some die() to BUG() parse-options: replace opterror() with optname() repack: mark more strings for translation remote.c: mark messages for translation remote.c: turn some error() or die() to BUG() reflog: mark strings for translation read-cache.c: add missing colon separators read-cache.c: mark more strings for translation read-cache.c: turn die("internal error") to BUG() attr.c: mark more string for translation archive.c: mark more strings for translation alias.c: mark split_cmdline_strerror() strings for translation git.c: mark more strings for translation	2019-01-04 13:33:31 -08:00
Junio C Hamano	e146cc97be	Merge branch 'nd/per-worktree-ref-iteration' The code to traverse objects for reachability, used to decide what objects are unreferenced and expendable, have been taught to also consider per-worktree refs of other worktrees as starting points to prevent data loss. * nd/per-worktree-ref-iteration: git-worktree.txt: correct linkgit command name reflog expire: cover reflog from all worktrees fsck: check HEAD and reflog from other worktrees fsck: move fsck_head_link() to get_default_heads() to avoid some globals revision.c: better error reporting on ref from different worktrees revision.c: correct a parameter name refs: new ref types to make per-worktree refs visible to all worktrees Add a place for (not) sharing stuff between worktrees refs.c: indent with tabs, not spaces	2018-11-13 22:37:26 +09:00
Nguyễn Thái Ngọc Duy	674ba34038	fsck: mark strings for translation Two die() are updated to start with lowercase to be consistent with the rest. Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2018-11-12 14:47:10 +09:00
Nguyễn Thái Ngọc Duy	9d0a9e9089	read-cache.c: mark more strings for translation There are a couple other improvements on these strings as well: - add missing colon (as separator) - quote paths - provide more information on error messages - keep first word in lowercase Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2018-11-12 14:47:09 +09:00
Junio C Hamano	18ad13e5b2	Adjust for 2.19.x series * jk/detect-truncated-zlib-input cat-file: handle streaming failures consistently check_stream_sha1(): handle input underflow t1450: check large blob in trailing-garbage test	2018-10-31 13:12:12 +09:00
Jeff King	ccdc4819d5	check_stream_sha1(): handle input underflow This commit fixes an infinite loop when fscking large truncated loose objects. The check_stream_sha1() function takes an mmap'd loose object buffer and streams 4k of output at a time, checking its sha1. The loop quits when we've output enough bytes (we know the size from the object header), or when zlib tells us anything except Z_OK or Z_BUF_ERROR. The latter is expected because zlib may run out of room in our 4k buffer, and that is how it tells us to process the output and loop again. But Z_BUF_ERROR also covers another case: one in which zlib cannot make forward progress because it needs more _input_. This should never happen in this loop, because though we're streaming the output, we have the entire deflated input available in the mmap'd buffer. But since we don't check this case, we'll just loop infinitely if we do see a truncated object, thinking that zlib is asking for more output space. It's tempting to fix this by checking stream->avail_in as part of the loop condition (and quitting if all of our bytes have been consumed). But that assumes that once zlib has consumed the input, there is nothing left to do. That's not necessarily the case: it may have read our input into its internal state, but still have bytes to output. Instead, let's continue on Z_BUF_ERROR only when we see the case we're expecting: the previous round filled our output buffer completely. If it didn't (and we still saw Z_BUF_ERROR), we know something is wrong and should break out of the loop. The bug comes from commit `f6371f9210` (sha1_file: add read_loose_object() function, 2017-01-13), which reimplemented some of the existing loose object functions. So it's worth checking if this bug was inherited from any of those. The answers seems to be no. The two obvious candidates are both OK: 1. unpack_sha1_rest(); this doesn't need to loop on Z_BUF_ERROR at all, since it allocates the expected output buffer in advance (which we can't do since we're explicitly streaming here) 2. check_object_signature(); the streaming path relies on the istream interface, which uses read_istream_loose() for this case. That function uses a similar "is our output buffer full" check with Z_BUF_ERROR (which is where I stole it from for this patch!) Reported-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com> Helped-by: Junio C Hamano <gitster@pobox.com> Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2018-10-31 13:05:26 +09:00
Jeff King	5632baf238	t1450: check large blob in trailing-garbage test Commit `cce044df7f` (fsck: detect trailing garbage in all object types, 2017-01-13) added two tests of trailing garbage in a loose object file: one with a commit and one with a blob. The point of having two is that blobs would follow a different code path that streamed the contents, instead of loading it into a buffer as usual. At the time, merely being a blob was enough to trigger the streaming code path. But since `7ac4f3a007` (fsck: actually fsck blob data, 2018-05-02), we now only stream blobs that are actually large. So since then, the streaming code path is not tested at all for this case. We can restore the original intent of the test by tweaking core.bigFileThreshold to make our small blob seem large. There's no easy way to externally verify that we followed the streaming code path, but I did check before/after using a temporary debug statement. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2018-10-31 12:53:44 +09:00
Nguyễn Thái Ngọc Duy	b29759d89a	fsck: check HEAD and reflog from other worktrees fsck is a repo-wide operation and should check all references no matter which worktree they are associated to. Reported-by: Jeff King <peff@peff.net> Helped-by: Elijah Newren <newren@gmail.com> Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2018-10-22 13:32:54 +09:00
Junio C Hamano	986c518107	Merge branch 'sg/test-must-be-empty' Test fixes. * sg/test-must-be-empty: tests: use 'test_must_be_empty' instead of 'test_cmp <empty> <out>' tests: use 'test_must_be_empty' instead of 'test_cmp /dev/null <out>' tests: use 'test_must_be_empty' instead of 'test ! -s' tests: use 'test_must_be_empty' instead of '! test -s'	2018-08-27 14:33:43 -07:00
SZEDER Gábor	1c5e94f459	tests: use 'test_must_be_empty' instead of 'test_cmp <empty> <out>' Using 'test_must_be_empty' is shorter and more idiomatic than >empty && test_cmp empty out as it saves the creation of an empty file. Furthermore, sometimes the expected empty file doesn't have such a descriptive name like 'empty', and its creation is far away from the place where it's finally used for comparison (e.g. in 't7600-merge.sh', where two expected empty files are created in the 'setup' test, but are used only about 500 lines later). These cases were found by instrumenting 'test_cmp' to error out the test script when it's used to compare empty files, and then converted manually. Note that even after this patch there still remain a lot of cases where we use 'test_cmp' to check empty files: - Sometimes the expected output is not hard-coded in the test, but 'test_cmp' is used to ensure that two similar git commands produce the same output, and that output happens to be empty, e.g. the test 'submodule update --merge - ignores --merge for new submodules' in 't7406-submodule-update.sh'. - Repetitive common tasks, including preparing the expected results and running 'test_cmp', are often extracted into a helper function, and some of this helper's callsites expect no output. - For the same reason as above, the whole 'test_expect_success' block is within a helper function, e.g. in 't3070-wildmatch.sh'. - Or 'test_cmp' is invoked in a loop, e.g. the test 'cvs update (-p)' in 't9400-git-cvsserver-server.sh'. Signed-off-by: SZEDER Gábor <szeder.dev@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2018-08-21 11:48:36 -07:00
Nguyễn Thái Ngọc Duy	42246589b8	object.c: mark more strings for translation Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2018-07-23 11:19:10 -07:00
brian m. carlson	8125a58b91	t: switch $_z40 to $ZERO_OID Switch all uses of $_z40 to $ZERO_OID so that they work correctly with larger hashes. This commit was created by using the following sed command to modify all files in the t directory except t/test-lib.sh: sed -i 's/\$_z40/$ZERO_OID/g' Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2018-05-14 11:02:00 +09:00
René Scharfe	2720f6db5d	fsck: handle NULL return of lookup_blob() and lookup_tree() lookup_blob() and lookup_tree() can return NULL if they find an object of an unexpected type. Accessing the object member is undefined in that case. Cast the result to a struct object pointer instead; we can do that because object is the first member of all object types. This trick is already used in other places in the code. An error message is already shown by object_as_type(), which is called by the lookup functions. The walk callback functions are expected to handle NULL object pointers passed to them, but put_object_name() needs a valid object, so avoid calling it without one. Suggested-by: SZEDER Gábor <szeder.dev@gmail.com> Helped-by: Junio C Hamano <gitster@pobox.com> Signed-off-by: Rene Scharfe <l.s.r@web.de> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2017-10-06 11:04:34 +09:00
Jonathan Tan	a7c28a2161	tests: ensure fsck fails on corrupt packfiles t1450-fsck.sh does not have a test that checks fsck's behavior when a packfile is invalid. It does have a test for when an object in a packfile is invalid, but in that test, the packfile itself is valid. Add such a test. Signed-off-by: Jonathan Tan <jonathantanmy@google.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2017-07-28 15:26:48 -07:00
Junio C Hamano	8f58a34cad	Merge branch 'js/fsck-name-object' Test fix. * js/fsck-name-object: t1450: use egrep for regexp "alternation"	2017-07-06 18:14:43 -07:00
Junio C Hamano	73fc2aadc7	t1450: use egrep for regexp "alternation" GNU grep allows "$A\\|B$" as alternation in BRE, but this is an extension not understood by some other implementations of grep (Michael Kebe reported an breakage on Solaris). Rewrite the offending test to ERE and use egrep instead. Noticed-by: Michael Kebe <michael.kebe@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2017-06-28 10:17:50 -07:00
Jeff Hostetler	4d9bc37fbe	t1450: avoid use of "sed" on the index, which is a binary file The previous step added a path zzzzzzzz to the index, and then used "sed" to replace this string to yyyyyyyy to create a test case where the checksum at the end of the file does not match the contents. Unfortunately, use of "sed" on a non-text file is not portable. Instead, use a Perl script that seeks to the end and modifies the last byte of the file (where we _know_ stores the trailing checksum). Signed-off-by: Jeff Hostetler <jeffhost@microsoft.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2017-04-27 09:41:19 +09:00
Jeff Hostetler	a33fc72fe9	read-cache: force_verify_index_checksum Teach git to skip verification of the SHA1-1 checksum at the end of the index file in verify_hdr() which is called from read_index() unless the "force_verify_index_checksum" global variable is set. Teach fsck to force this verification. The checksum verification is for detecting disk corruption, and for small projects, the time it takes to compute SHA-1 is not that significant, but for gigantic repositories this calculation adds significant time to every command. These effect can be seen using t/perf/p0002-read-cache.sh: Test HEAD~1 HEAD -------------------------------------------------------------------------------------- 0002.1: read_cache/discard_cache 1000 times 0.66(0.44+0.20) 0.30(0.27+0.02) -54.5% Signed-off-by: Jeff Hostetler <jeffhost@microsoft.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2017-04-15 00:58:36 -07:00
Junio C Hamano	4ba6197887	Merge branch 'jk/fsck-connectivity-check-fix' "git fsck --connectivity-check" was not working at all. * jk/fsck-connectivity-check-fix: fsck: lazily load types under --connectivity-only fsck: move typename() printing to its own function t1450: use "mv -f" within loose object directory fsck: check HAS_OBJ more consistently fsck: do not fallback "git fsck <bogus>" to "git fsck" fsck: tighten error-checks of "git fsck <head>" fsck: prepare dummy objects for --connectivity-check fsck: report trees as dangling t1450: clean up sub-objects in duplicate-entry test	2017-01-31 13:15:01 -08:00
Jeff King	c20d4d702f	t1450: use "mv -f" within loose object directory The loose objects are created with mode 0444. That doesn't prevent them being overwritten by rename(), but some versions of "mv" will be extra careful and prompt the user, even without "-i". Reportedly macOS does this, at least in the Travis builds. The prompt reads from /dev/null, defaulting to "no", and the object isn't moved. Then to make matters even more interesting, it still returns "0" and the rest of the test proceeds, but with a broken setup. We can work around it by using "mv -f" to override the prompt. This should work as it's already used in t5504 for the same purpose. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2017-01-25 12:32:32 -08:00
Jeff King	c3271a0e47	fsck: do not fallback "git fsck <bogus>" to "git fsck" Since fsck tries to continue as much as it can after seeing an error, we still do the reachability check even if some heads we were given on the command-line are bogus. But if _none_ of the heads is is valid, we fallback to checking all refs and the index, which is not what the user asked for at all. Instead of checking "heads", the number of successful heads we got, check "argc" (which we know only has non-options in it, because parse_options removed the others). Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2017-01-17 14:24:33 -08:00
Jeff King	c6c7b16d23	fsck: tighten error-checks of "git fsck <head>" Instead of checking reachability from the refs, you can ask fsck to check from a particular set of heads. However, the error checking here is quite lax. In particular: 1. It claims lookup_object() will report an error, which is not true. It only does a hash lookup, and the user has no clue that their argument was skipped. 2. When either the name or sha1 cannot be resolved, we continue to exit with a successful error code, even though we didn't check what the user asked us to. This patch fixes both of these cases. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2017-01-17 14:24:33 -08:00
Jeff King	3e3f8bd608	fsck: prepare dummy objects for --connectivity-check Normally fsck makes a pass over all objects to check their integrity, and then follows up with a reachability check to make sure we have all of the referenced objects (and to know which ones are dangling). The latter checks for the HAS_OBJ flag in obj->flags to see if we found the object in the first pass. Commit `02976bf85` (fsck: introduce `git fsck --connectivity-only`, 2015-06-22) taught fsck to skip the initial pass, and to fallback to has_sha1_file() instead of the HAS_OBJ check. However, it converted only one HAS_OBJ check to use has_sha1_file(). But there are many other places in builtin/fsck.c that assume that the flag is set (or that lookup_object() will return an object at all). This leads to several bugs with --connectivity-only: 1. mark_object() will not queue objects for examination, so recursively following links from commits to trees, etc, did nothing. I.e., we were checking the reachability of hardly anything at all. 2. When a set of heads is given on the command-line, we use lookup_object() to see if they exist. But without the initial pass, we assume nothing exists. 3. When loading reflog entries, we do a similar lookup_object() check, and complain that the reflog is broken if the object doesn't exist in our hash. So in short, --connectivity-only is broken pretty badly, and will claim that your repository is fine when it's not. Presumably nobody noticed for a few reasons. One is that the embedded test does not actually test the recursive nature of the reachability check. All of the missing objects are still in the index, and we directly check items from the index. This patch modifies the test to delete the index, which shows off breakage (1). Another is that --connectivity-only just skips the initial pass for loose objects. So on a real repository, the packed objects were still checked correctly. But on the flipside, it means that "git fsck --connectivity-only" still checks the sha1 of all of the packed objects, nullifying its original purpose of being a faster git-fsck. And of course the final problem is that the bug only shows up when there _is_ corruption, which is rare. So anybody running "git fsck --connectivity-only" proactively would assume it was being thorough, when it was not. One possibility for fixing this is to find all of the spots that rely on HAS_OBJ and tweak them for the connectivity-only case. But besides the risk that we might miss a spot (and I found three already, corresponding to the three bugs above), there are other parts of fsck that _can't_ work without a full list of objects. E.g., the list of dangling objects. Instead, let's make the connectivity-only case look more like the normal case. Rather than skip the initial pass completely, we'll do an abbreviated one that sets up the HAS_OBJ flag for each object, without actually loading the object data. That's simple and fast, and we don't have to care about the connectivity_only flag in the rest of the code at all. While we're at it, let's make sure we treat loose and packed objects the same (i.e., setting up dummy objects for both and skipping the actual sha1 check). That makes the connectivity-only check actually fast on a real repo (40 seconds versus 180 seconds on my copy of linux.git). Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2017-01-17 14:23:20 -08:00
Jeff King	b4584e4f66	fsck: report trees as dangling After checking connectivity, fsck looks through the list of any objects we've seen mentioned, and reports unreachable and un-"used" ones as dangling. However, it skips any object which is not marked as "parsed", as that is an object that we _don't_ have (but that somebody mentioned). Since `6e454b9a3` (clear parsed flag when we free tree buffers, 2013-06-05), that flag can't be relied on, and the correct method is to check the HAS_OBJ flag. The cleanup in that commit missed this callsite, though. As a result, we would generally fail to report dangling trees. We never noticed because there were no tests in this area (for trees or otherwise). Let's add some. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2017-01-17 12:49:41 -08:00
Jeff King	1ada11ee62	t1450: clean up sub-objects in duplicate-entry test This test creates a multi-level set of trees, but its cleanup routine only removes the top-level tree. After the test finishes, the inner tree and the blob it points to remain, making the inner tree dangling. A later test ("cleaned up") verifies that we've removed any cruft and "git fsck" output is clean. This passes only because of a bug in git-fsck which fails to notice dangling trees. In preparation for fixing the bug, let's teach this earlier test to clean up after itself correctly. We have to remove the inner tree (and therefore the blob, too, which becomes dangling after removing that tree). Since the setup code happens inside a subshell, we can't just set a variable for each object. However, we can stuff all of the sha1s into the $T output variable, which is not used for anything except cleanup. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2017-01-17 12:49:41 -08:00
Jeff King	cce044df7f	fsck: detect trailing garbage in all object types When a loose tree or commit is read by fsck (or any git program), unpack_sha1_rest() checks whether there is extra cruft at the end of the object file, after the zlib data. Blobs that are streamed, however, do not have this check. For normal git operations, it's not a big deal. We know the sha1 and size checked out, so we have the object bytes we wanted. The trailing garbage doesn't affect what we're trying to do. But since the point of fsck is to find corruption or other problems, it should be more thorough. This patch teaches its loose-sha1 reader to detect extra bytes after the zlib stream and complain. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2017-01-15 15:59:03 -08:00
Jeff King	c68b489e56	fsck: parse loose object paths directly When we iterate over the list of loose objects to check, we get the actual path of each object. But we then throw it away and pass just the sha1 to fsck_sha1(), which will do a fresh lookup. Usually it would find the same object, but it may not if an object exists both as a loose and a packed object. We may end up checking the packed object twice, and never look at the loose one. In practice this isn't too terrible, because if fsck doesn't complain, it means you have at least one good copy. But since the point of fsck is to look for corruption, we should be thorough. The new read_loose_object() interface can help us get the data from disk, and then we replace parse_object() with parse_object_buffer(). As a bonus, our error messages now mention the path to a corrupted object, which should make it easier to track down errors when they do happen. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2017-01-15 15:59:03 -08:00
Jeff King	118e6cead4	t1450: test fsck of packed objects The code paths in fsck for packed and loose objects are quite different, and it is not immediately obvious that the packed case behaves well. In particular: 1. The fsck_loose() function always returns "0" to tell the iterator to keep checking more objects. Whereas fsck_obj_buffer() (which handles packed objects) returns -1. This is OK, because the callback machinery for verify_pack() does not stop when it sees a non-zero return. 2. The fsck_loose() function sets the ERROR_OBJECT bit when fsck_obj() fails, whereas fsck_obj_buffer() sets it only when it sees a corrupt object. This turns out not to matter. We don't actually do anything with this bit except exit the program with a non-zero code, and that is handled already by the non-zero return from the function. So there are no bugs here, but it was certainly confusing to me. And we do not test either of the properties in t1450 (neither that a non-corruption error will caused a non-zero exit for a packed object, nor that we keep going after seeing the first error). Let's test both of those conditions, so that we'll notice if any of those assumptions becomes invalid. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2017-01-15 15:59:03 -08:00
Jeff King	771e7d578e	sha1_file: fix error message for alternate objects When we fail to open a corrupt loose object, we report an error and mention the filename via sha1_file_name(). However, that function will always give us a path in the local repository, whereas the corrupt object may have come from an alternate. The result is a very misleading error message. Teach the open_sha1_file() and stat_sha1_file() helpers to pass back the path they found, so that we can report it correctly. Note that the pointers we return go to static storage (e.g., from sha1_file_name()), which is slightly dangerous. However, these helpers are static local helpers, and the names are used for immediately generating error messages. The simplicity is an acceptable tradeoff for the danger. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2017-01-15 15:59:03 -08:00
Jeff King	0b20f1a266	t1450: refactor loose-object removal Commit `90cf590f5` (fsck: optionally show more helpful info for broken links, 2016-07-17) added a remove_loose_object() helper, but we already had a remove_object() helper that did the same thing. Let's combine these into one. The implementations had a few subtle differences, so I've tried to take the best of both: - the original used "sed", but the newer version avoids spawning an extra process - the original processed "$*", which was nonsense, as it assumed only a single sha1. Use "$1" to make that more clear. - the newer version ran an extra rev-parse, but it was not necessary; it's sole caller already converted the argument into a raw sha1 - the original used "rm -f", whereas the new one uses "rm". The latter is better because it may notice a bug or other unexpected failure in the test. (The original does check that the object exists before we remove it, which is good, but that's a subset of the possible unexpected conditions). Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2017-01-15 15:59:02 -08:00
David Turner	8354fa3d4c	fsck: handle bad trees like other errors Instead of dying when fsck hits a malformed tree object, log the error like any other and continue. Now fsck can tell the user which tree is bad, too. Signed-off-by: David Turner <dturner@twosigma.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2016-09-27 14:09:10 -07:00
Johannes Schindelin	90cf590f53	fsck: optionally show more helpful info for broken links When reporting broken links between commits/trees/blobs, it would be quite helpful at times if the user would be told how the object is supposed to be reachable. With the new --name-objects option, git-fsck will try to do exactly that: name the objects in a way that shows how they are reachable. For example, when some reflog got corrupted and a blob is missing that should not be, the user might want to remove the corresponding reflog entry. This option helps them find that entry: `git fsck` will now report something like this: broken link from tree b5eb6ff... (refs/stash@{<date>}~37:) to blob ec5cf80... Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2016-07-18 15:15:59 -07:00
Junio C Hamano	6d2d780f63	fsck: detect and warn a commit with embedded NUL Even though a Git commit object is designed to be capable of storing any binary data as its payload, in practice people use it to describe the changes in textual form, and tools like "git log" are designed to treat the payload as text. Detect and warn when we see any commit object with a NUL byte in it. Note that a NUL byte in the header part is already detected as a grave error. This change is purely about the message part. Signed-off-by: Junio C Hamano <gitster@pobox.com>	2016-05-10 10:02:06 -07:00
René Scharfe	8a272f291a	fsck: treat a NUL in a tag header as an error We check the return value of verify_header() for commits already, so do the same for tags as well. Signed-off-by: Rene Scharfe <l.s.r@web.de> Signed-off-by: Jeff King <peff@peff.net>	2015-11-20 08:02:07 -05:00
René Scharfe	80c7f5a011	t1450: add tests for NUL in headers of commits and tags Signed-off-by: Rene Scharfe <l.s.r@web.de> Signed-off-by: Jeff King <peff@peff.net>	2015-11-20 08:02:07 -05:00
Junio C Hamano	30ce3b3bbc	Merge branch 'jc/fsck-dropped-errors' There were some classes of errors that "git fsck" diagnosed to its standard error that did not cause it to exit with non-zero status. * jc/fsck-dropped-errors: fsck: exit with non-zero when problems are found	2015-10-15 15:43:35 -07:00
Junio C Hamano	122f76f574	fsck: exit with non-zero when problems are found After finding some problems (e.g. a ref refs/heads/X points at an object that is not a commit) and issuing an error message, the program failed to signal the fact that it found an error by a non-zero exit status. Signed-off-by: Junio C Hamano <gitster@pobox.com>	2015-09-23 14:29:28 -07:00
Johannes Schindelin	02976bf856	fsck: introduce `git fsck --connectivity-only` This option avoids unpacking each and all blob objects, and just verifies the connectivity. In particular with large repositories, this speeds up the operation, at the expense of missing corrupt blobs, ignoring unreachable objects and other fsck issues, if any. Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2015-06-23 14:27:37 -07:00
Johannes Schindelin	2becf00ff7	fsck: support demoting errors to warnings We already have support in `git receive-pack` to deal with some legacy repositories which have non-fatal issues. Let's make `git fsck` itself useful with such repositories, too, by allowing users to ignore known issues, or at least demote those issues to mere warnings. Example: `git -c fsck.missingEmail=ignore fsck` would hide problems with missing emails in author, committer and tagger lines. In the same spirit that `git receive-pack`'s usage of the fsck machinery differs from `git fsck`'s – some of the non-fatal warnings in `git fsck` are fatal with `git receive-pack` when receive.fsckObjects = true, for example – we strictly separate the fsck.<msg-id> from the receive.fsck.<msg-id> settings. Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2015-06-23 14:27:36 -07:00
Johannes Schindelin	71ab8fa840	fsck: report the ID of the error/warning Some repositories written by legacy code have objects with non-fatal fsck issues. To allow the user to ignore those issues, let's print out the ID (e.g. when encountering "missingEmail", the user might want to call `git config --add receive.fsck.missingEmail=warn`). Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2015-06-23 14:27:35 -07:00
Junio C Hamano	ee6e4c70f1	Merge branch 'maint' * maint: is_hfs_dotgit: loosen over-eager match of \u{..47}	2015-01-07 13:28:29 -08:00
Junio C Hamano	7ba46269a0	Merge branch 'maint-2.1' into maint * maint-2.1: is_hfs_dotgit: loosen over-eager match of \u{..47}	2015-01-07 13:28:10 -08:00
Junio C Hamano	3c84ac86fc	Merge branch 'maint-2.0' into maint-2.1 * maint-2.0: is_hfs_dotgit: loosen over-eager match of \u{..47}	2015-01-07 13:27:56 -08:00
Junio C Hamano	64a03e970a	Merge branch 'maint-1.8.5' into maint-1.9 * maint-1.8.5: is_hfs_dotgit: loosen over-eager match of \u{..47}	2015-01-07 13:27:13 -08:00
Jeff King	6aaf956b08	is_hfs_dotgit: loosen over-eager match of \u{..47} Our is_hfs_dotgit function relies on the hackily-implemented next_hfs_char to give us the next character that an HFS+ filename comparison would look at. It's hacky because it doesn't implement the full case-folding table of HFS+; it gives us just enough to see if the path matches ".git". At the end of next_hfs_char, we use tolower() to convert our 32-bit code point to lowercase. Our tolower() implementation only takes an 8-bit char, though; it throws away the upper 24 bits. This means we can't have any false negatives for is_hfs_dotgit. We only care about matching 7-bit ASCII characters in ".git", and we will correctly process 'G' or 'g'. However, we _can_ have false positives. Because we throw away the upper bits, code point \u{0147} (for example) will look like 'G' and get downcased to 'g'. It's not known whether a sequence of code points whose truncation ends up as ".git" is meaningful in any language, but it does not hurt to be more accurate here. We can just pass out the full 32-bit code point, and compare it manually to the upper and lowercase characters we care about. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2014-12-29 12:06:27 -08:00
Junio C Hamano	1cb4b3d380	Merge branch 'js/fsck-tag-validation' New tag object format validation added in 2.2 showed garbage after a tagname it reported in its error message. * js/fsck-tag-validation: index-pack: terminate object buffers with NUL fsck: properly bound "invalid tag name" error message	2014-12-22 12:27:41 -08:00
Junio C Hamano	77933f4449	Sync with v2.1.4 * maint-2.1: Git 2.1.4 Git 2.0.5 Git 1.9.5 Git 1.8.5.6 fsck: complain about NTFS ".git" aliases in trees read-cache: optionally disallow NTFS .git variants path: add is_ntfs_dotgit() helper fsck: complain about HFS+ ".git" aliases in trees read-cache: optionally disallow HFS+ .git variants utf8: add is_hfs_dotgit() helper fsck: notice .git case-insensitively t1450: refactor ".", "..", and ".git" fsck tests verify_dotfile(): reject .git case-insensitively read-tree: add tests for confusing paths like ".." and ".git" unpack-trees: propagate errors adding entries to the index	2014-12-17 11:46:57 -08:00
Junio C Hamano	58f1d950e3	Sync with v2.0.5 * maint-2.0: Git 2.0.5 Git 1.9.5 Git 1.8.5.6 fsck: complain about NTFS ".git" aliases in trees read-cache: optionally disallow NTFS .git variants path: add is_ntfs_dotgit() helper fsck: complain about HFS+ ".git" aliases in trees read-cache: optionally disallow HFS+ .git variants utf8: add is_hfs_dotgit() helper fsck: notice .git case-insensitively t1450: refactor ".", "..", and ".git" fsck tests verify_dotfile(): reject .git case-insensitively read-tree: add tests for confusing paths like ".." and ".git" unpack-trees: propagate errors adding entries to the index	2014-12-17 11:42:28 -08:00
Junio C Hamano	6898b79721	Sync with v1.8.5.6 * maint-1.8.5: Git 1.8.5.6 fsck: complain about NTFS ".git" aliases in trees read-cache: optionally disallow NTFS .git variants path: add is_ntfs_dotgit() helper fsck: complain about HFS+ ".git" aliases in trees read-cache: optionally disallow HFS+ .git variants utf8: add is_hfs_dotgit() helper fsck: notice .git case-insensitively t1450: refactor ".", "..", and ".git" fsck tests verify_dotfile(): reject .git case-insensitively read-tree: add tests for confusing paths like ".." and ".git" unpack-trees: propagate errors adding entries to the index	2014-12-17 11:20:31 -08:00
Johannes Schindelin	d08c13b947	fsck: complain about NTFS ".git" aliases in trees Now that the index can block pathnames that can be mistaken to mean ".git" on NTFS and FAT32, it would be helpful for fsck to notice such problematic paths. This lets servers which use receive.fsckObjects block them before the damage spreads. Note that the fsck check is always on, even for systems without core.protectNTFS set. This is technically more restrictive than we need to be, as a set of users on ext4 could happily use these odd filenames without caring about NTFS. However, on balance, it's helpful for all servers to block these (because the paths can be used for mischief, and servers which bother to fsck would want to stop the spread whether they are on NTFS themselves or not), and hardly anybody will be affected (because the blocked names are variants of .git or git~1, meaning mischief is almost certainly what the tree author had in mind). Ideally these would be controlled by a separate "fsck.protectNTFS" flag. However, it would be much nicer to be able to enable/disable _any_ fsck flag individually, and any scheme we choose should match such a system. Given the likelihood of anybody using such a path in practice, it is not unreasonable to wait until such a system materializes. Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2014-12-17 11:04:45 -08:00
Jeff King	a18fcc9ff2	fsck: complain about HFS+ ".git" aliases in trees Now that the index can block pathnames that case-fold to ".git" on HFS+, it would be helpful for fsck to notice such problematic paths. This lets servers which use receive.fsckObjects block them before the damage spreads. Note that the fsck check is always on, even for systems without core.protectHFS set. This is technically more restrictive than we need to be, as a set of users on ext4 could happily use these odd filenames without caring about HFS+. However, on balance, it's helpful for all servers to block these (because the paths can be used for mischief, and servers which bother to fsck would want to stop the spread whether they are on HFS+ themselves or not), and hardly anybody will be affected (because the blocked names are variants of .git with invisible Unicode code-points mixed in, meaning mischief is almost certainly what the tree author had in mind). Ideally these would be controlled by a separate "fsck.protectHFS" flag. However, it would be much nicer to be able to enable/disable _any_ fsck flag individually, and any scheme we choose should match such a system. Given the likelihood of anybody using such a path in practice, it is not unreasonable to wait until such a system materializes. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2014-12-17 11:04:45 -08:00
Jeff King	76e86fc6e3	fsck: notice .git case-insensitively We complain about ".git" in a tree because it cannot be loaded into the index or checked out. Since we now also reject ".GIT" case-insensitively, fsck should notice the same, so that errors do not propagate. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2014-12-17 11:04:39 -08:00
Jeff King	450870cba7	t1450: refactor ".", "..", and ".git" fsck tests We check that fsck notices and complains about confusing paths in trees. However, there are a few shortcomings: 1. We check only for these paths as file entries, not as intermediate paths (so ".git" and not ".git/foo"). 2. We check "." and ".." together, so it is possible that we notice only one and not the other. 3. We repeat a lot of boilerplate. Let's use some loops to be more thorough in our testing, and still end up with shorter code. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2014-12-17 11:04:39 -08:00
Jeff King	7add441984	fsck: properly bound "invalid tag name" error message When we detect an invalid tag-name header in a tag object, like, "tag foo bar\n", we feed the pointer starting at "foo bar" to a printf "%s" formatter. This shows the name, as we want, but then it keeps printing the rest of the tag buffer, rather than stopping at the end of the line. Our tests did not notice because they look only for the matching line, but the bug is that we print much more than we wanted to. So we also adjust the test to be more exact. Note that when fscking tags with "index-pack --strict", this is even worse. index-pack does not add a trailing NUL-terminator after the object, so we may actually read past the buffer and print uninitialized memory. Running t5302 with valgrind does notice the bug for that reason. Signed-off-by: Jeff King <peff@peff.net> Acked-by: Johannes Schindelin <johannes.schindelin@gmx.de> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2014-12-09 11:54:25 -08:00
Junio C Hamano	f190737f22	Merge branch 'jc/hash-object-fsck-tag' Using "hash-object --literally", test one of the new breakages js/fsck-tag-validation topic teaches "fsck" to catch is caught. * jc/hash-object-fsck-tag: t1450: make sure fsck detects a malformed tagger line	2014-09-26 14:39:44 -07:00
Junio C Hamano	13f4f04692	Merge branch 'js/fsck-tag-validation' Teach "git fsck" to inspect the contents of annotated tag objects. * js/fsck-tag-validation: Make sure that index-pack --strict checks tag objects Add regression tests for stricter tag fsck'ing fsck: check tag objects' headers Make sure fsck_commit_buffer() does not run out of the buffer fsck_object(): allow passing object data separately from the object itself Refactor type_from_string() to allow continuing after detecting an error	2014-09-26 14:39:43 -07:00
Junio C Hamano	b659605da6	t1450: make sure fsck detects a malformed tagger line With "hash-object --literally", write a tag object that is not supposed to pass one of the new checks added to "fsck", and make sure that the new check catches the breakage. Signed-off-by: Junio C Hamano <gitster@pobox.com>	2014-09-12 11:05:15 -07:00
Jeff King	30d1038d1b	fsck: return non-zero status on missing ref tips Fsck tries hard to detect missing objects, and will complain (and exit non-zero) about any inter-object links that are missing. However, it will not exit non-zero for any missing ref tips, meaning that a severely broken repository may still pass "git fsck && echo ok". The problem is that we use for_each_ref to iterate over the ref tips, which hides broken tips. It does at least print an error from the refs.c code, but fsck does not ever see the ref and cannot note the problem in its exit code. We can solve this by using for_each_rawref and noting the error ourselves. In addition to adding tests for this case, we add tests for all types of missing-object links (all of which worked, but which we were not testing). Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2014-09-12 10:45:49 -07:00
Johannes Schindelin	90e3e5f057	Add regression tests for stricter tag fsck'ing The intent of the new test case is to catch general breakages in the fsck_tag() function, not so much to test it extensively, trying to strike the proper balance between thoroughness and speed. While it would have been nice to test the code path where fsck_object() encounters an invalid tag object, this is not possible using git fsck: tag objects are parsed already before fsck'ing (and the parser already fails upon such objects). Even worse: we would not even be able write out invalid tag objects because git hash-object parses those objects, too, unless we resorted to really ugly hacks such as using something like this in the unit tests (essentially depending on Perl and Compress::Zlib): hash_invalid_object () { contents="$(printf '%s %d\0%s' "$1" ${#2} "$2")" && sha1=$(echo "$contents" \| test-sha1) && suffix=${sha1#??} && mkdir -p .git/objects/${sha1%$suffix} && echo "$contents" \| perl -MCompress::Zlib -e 'undef $/; print compress(<>)' \ > .git/objects/${sha1%$suffix}/$suffix && echo $sha1 } Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2014-09-11 14:19:09 -07:00
Jeff King	2e770fe47e	fsck: exit with non-zero status upon error from fsck_obj() Upon finding a corrupt loose object, we forgot to note the error to signal it with the exit status of the entire process. [jc: adjusted t1450 and added another test] Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2014-09-10 09:40:53 -07:00
Jeff King	d4b8de0420	fsck: report integer overflow in author timestamps When we check commit objects, we complain if commit->date is ULONG_MAX, which is an indication that we saw integer overflow when parsing it. However, we do not do any check at all for author lines, which also contain a timestamp. Let's actually check the timestamps on each ident line with strtoul. This catches both author and committer lines, and we can get rid of the now-redundant commit->date check. Note that like the existing check, we compare only against ULONG_MAX. Now that we are calling strtoul at the site of the check, we could be slightly more careful and also check that errno is set to ERANGE. However, this will make further refactoring in future patches a little harder, and it doesn't really matter in practice. For 32-bit systems, one would have to create a commit at the exact wrong second in 2038. But by the time we get close to that, all systems will hopefully have moved to 64-bit (and if they haven't, they have a real problem one second later). For 64-bit systems, by the time we get close to ULONG_MAX, all systems will hopefully have been consumed in the fiery wrath of our expanding Sun. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2014-02-24 10:12:58 -08:00
Jeff King	5c17f51270	fsck: warn about ".git" in trees Having a ".git" entry inside a tree can cause confusing results on checkout. At the top-level, you could not checkout such a tree, as it would complain about overwriting the real ".git" directory. In a subdirectory, you might check it out, but performing operations in the subdirectory would confusingly consider the in-tree ".git" directory as the repository. The regular git tools already make it hard to accidentally add such an entry to a tree, and do not allow such entries to enter the index at all. Teaching fsck about it provides an additional safety check, and let's us avoid propagating any such bogosity when transfer.fsckObjects is on. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2012-11-28 13:52:54 -08:00
Jeff King	5d34a4359d	fsck: warn about '.' and '..' in trees A tree with meta-paths like '.' or '..' does not work well with git; the index will refuse to load it or check it out to the filesystem (and even if we did not have that safety, it would look like we were overwriting an untracked directory). For the same reason, it is difficult to create such a tree with regular git. Let's warn about these dubious entries during fsck, just in case somebody has created a bogus tree (and this also lets us prevent them from propagating when transfer.fsckObjects is set). Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2012-11-28 10:41:08 -08:00
Junio C Hamano	26c21f8ec6	Merge branch 'jc/maint-t1450-fsck-order-fix' into maint * jc/maint-t1450-fsck-order-fix: t1450: the order the objects are checked is undefined	2012-10-17 10:28:19 -07:00
Junio C Hamano	9dad83be45	t1450: the order the objects are checked is undefined When a tag T points at an object X that is of a type that is different from what the tag records as, fsck should report it as an error. However, depending on the order X and T are checked individually, the actual error message can be different. If X is checked first, fsck remembers X's type and then when it checks T, it notices that T records X as a wrong type (i.e. the complaint is about a broken tag T). If T is checked first, on the other hand, fsck remembers that we need to verify X is of the type tag records, and when it later checks X, it notices that X is of a wrong type (i.e. the complaint is about a broken object X). The important thing is that fsck notices such an error and diagnoses the issue on object X, but the test was expecting that we happen to check objects in the order to make us detect issues with tag T, not with object X. Remove this unwarranted assumption. Signed-off-by: Junio C Hamano <gitster@pobox.com>	2012-10-02 15:08:16 -07:00
Junio C Hamano	03adeeaad6	Merge branch 'jk/maint-null-in-trees' into maint-1.7.11 "git diff" had a confusion between taking data from a path in the working tree and taking data from an object that happens to have name 0{40} recorded in a tree. * jk/maint-null-in-trees: fsck: detect null sha1 in tree entries do not write null sha1s to on-disk index diff: do not use null sha1 as a sentinel value	2012-09-10 15:24:54 -07:00
Jeff King	c479d14a80	fsck: detect null sha1 in tree entries Short of somebody happening to beat the 1 in 2^160 odds of actually generating content that hashes to the null sha1, we should never see this value in a tree entry. So let's have fsck warn if it it seen. As in the previous commit, we test both blob and submodule entries to future-proof the test suite against the implementation depending on connectivity to notice the error. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2012-07-29 15:14:08 -07:00
Junio C Hamano	c6a13b2c86	fsck: --no-dangling omits "dangling object" information The default output from "fsck" is often overwhelmed by informational message on dangling objects, especially if you do not repack often, and a real error can easily be buried. Add "--no-dangling" option to omit them, and update the user manual to demonstrate its use. Based on a patch by Clemens Buchacher, but reverted the part to change the default to --no-dangling, which is unsuitable for the first patch. The usual three-step procedure to break the backward compatibility over time needs to happen on top of this, if we were to go in that direction. Signed-off-by: Junio C Hamano <gitster@pobox.com>	2012-02-28 14:55:39 -08:00
Clemens Buchacher	cb8da70547	git rev-list: fix invalid typecast git rev-list passes rev_list_info, not rev_list objects. Without this fix, rev-list enables or disables the --verify-objects option depending on a read from an undefined memory location. Signed-off-by: Clemens Buchacher <drizzd@aon.at> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2012-02-13 12:49:15 -08:00
Dmitry Ivankov	53f53cff24	fsck: improve committer/author check fsck allows a name with > character in it like "name> <email>". Also for "name email>" fsck says "missing space before email". More precisely, it seeks for a first '<', checks that ' ' preceeds it. Then seeks to '<' or '>' and checks that it is the '>'. Missing space is reported if either '<' is not found or it's not preceeded with ' '. Change it to following. Seek to '<' or '>', check that it is '<' and is preceeded with ' '. Seek to '<' or '>' and check that it is '>'. So now "name> <email>" is rejected as "bad name". More strict name check is the only change in what is accepted. Report 'missing space' only if '<' is found and is not preceeded with a space. Signed-off-by: Dmitry Ivankov <divanorama@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2011-08-11 12:21:07 -07:00
Dmitry Ivankov	e3c98120f5	fsck: add a few committer name tests fsck reports "missing space before <email>" for committer string equal to "name email>" or to "". It'd be nicer to say "missing email" for the second string and "name is bad" (has > in it) for the first one. Add a failing test for these messages. For "name> <email>" no error is reported. Looks like a bug, so add such a failing test." Signed-off-by: Dmitry Ivankov <divanorama@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2011-08-11 12:21:05 -07:00
Jonathan Nieder	a48fcd8369	tests: add missing && Breaks in a test assertion's && chain can potentially hide failures from earlier commands in the chain. Commands intended to fail should be marked with !, test_must_fail, or test_might_fail. The examples in this patch do not require that. Signed-off-by: Jonathan Nieder <jrnieder@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2010-11-09 11:59:49 -08:00
Jonathan Nieder	dbedf8bf42	t1450 (fsck): remove dangling objects The fsck test is generally careful to remove the corrupt objects it inserts, but dangling objects are left behind due to some typos and omissions. It is better to clean up more completely, to simplify the addition of later tests. So: - guard setup and cleanup with test_expect_success to catch typos and errors; - check both stdout and stderr when checking for empty fsck output; - use test_cmp empty file in place of test $(wc -l <file) = 0, for better debugging output when running tests with -v; - add a remove_object () helper and use it to replace broken object removal code that forgot about the fanout in .git/objects; - disable gc.auto, to avoid tripping up object removal if the number of objects ever reaches that threshold. - use test_when_finished to ensure cleanup tasks are run and succeed when tests fail; - add a new final test that no breakage or dangling objects was left behind. While at it, add a brief description to test_description of the history that is expected to persist between tests. Part of a campaign to clean up subshell usage in tests. Cc: Thomas Rast <trast@student.ethz.ch> Signed-off-by: Jonathan Nieder <jrnieder@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2010-09-09 15:58:32 -07:00
Jonathan Nieder	0adc6a3d49	fsck: fix bogus commit header check `daae1922` (fsck: check ident lines in commit objects, 2010-04-24) taught fsck to expect commit objects to have the form tree <object name> <parents> author <valid ident string> committer <valid ident string> log message The check is overly strict: for example, it errors out with the message “expected blank line” for perfectly valid commits with an "encoding ISO-8859-1" line. Later it might make sense to teach fsck about the rest of the header and warn about unrecognized header lines, but for simplicity, let’s accept arbitrary trailing lines for now. Reported-by: Tuncer Ayaz <tuncer.ayaz@gmail.com> Signed-off-by: Jonathan Nieder <jrnieder@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2010-05-28 15:08:27 -07:00
Jonathan Nieder	daae19224a	fsck: check ident lines in commit objects Check that email addresses do not contain <, >, or newline so they can be quickly scanned without trouble. The copy() function in ident.c already ensures that ordinary git commands will not write email addresses without this property. Signed-off-by: Jonathan Nieder <jrnieder@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2010-05-01 12:15:06 -07:00
Thomas Rast	4551d03541	t1450: fix testcases that were wrongly expecting failure Almost exactly a year ago in `02a6552` (Test fsck a bit harder), I introduced two testcases that were expecting failure. However, the only bug was that the testcases wrote blobs because I forgot to pass -t tag to hash-object. Fix this, and then adjust the rest of the test to properly check the result. Signed-off-by: Thomas Rast <trast@student.ethz.ch> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2010-02-19 21:56:19 -08:00
Thomas Rast	02a6552c28	Test fsck a bit harder git-fsck, of all tools, has very few tests. This adds some more: * a corrupted object; * a branch pointing to a non-commit; * a tag pointing to a nonexistent object; * and a tag pointing to an object of a type other than what the tag itself claims. Only the first two are caught. At least the third probably should, too, but currently slips through. Signed-off-by: Thomas Rast <trast@student.ethz.ch> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2009-02-20 00:02:48 -08:00
Junio C Hamano	e15ef66943	fsck: check loose objects from alternate object stores by default "git fsck" used to validate only loose objects that are local and nothing else by default. This is not just too little when a repository is borrowing objects from other object stores, but also caused the connectivity check to mistakenly declare loose objects borrowed from them to be missing. The rationale behind the default mode that validates only loose objects is because these objects are still young and more unlikely to have been pushed to other repositories yet. That holds for loose objects borrowed from alternate object stores as well. Signed-off-by: Junio C Hamano <gitster@pobox.com>	2009-01-30 19:23:22 -08:00
Junio C Hamano	469e2ebf63	fsck: HEAD is part of refs By default we looked at all refs but not HEAD. The only thing that made fsck not lose sight of commits that are only reachable from a detached HEAD was the reflog for the HEAD. This fixes it, with a new test. Signed-off-by: Junio C Hamano <gitster@pobox.com>	2009-01-30 19:23:22 -08:00

1 2 3

149 commits