development/git - HydraGit

mirror of https://github.com/git/git synced 2024-11-05 18:59:29 +00:00

Author	SHA1	Message	Date
Junio C Hamano	655e494047	Merge branch 'jk/rev-list-verify-objects-fix' "git rev-list --verify-objects" ought to inspect the contents of objects and notice corrupted ones, but it didn't when the commit graph is in use, which has been corrected. * jk/rev-list-verify-objects-fix: rev-list: disable commit graph with --verify-objects lookup_commit_in_graph(): use prepare_commit_graph() to check for graph	2022-09-13 11:38:24 -07:00
Jeff King	0bc2557951	upload-pack: skip parse-object re-hashing of "want" objects Imagine we have a history with commit C pointing to a large blob B. If a client asks us for C, we can generally serve both objects to them without accessing the uncompressed contents of B. In upload-pack, we figure out which commits we have and what the client has, and feed those tips to pack-objects. In pack-objects, we traverse the commits and trees (or use bitmaps!) to find the set of objects needed, but we never open up B. When we serve it to the client, we can often pass the compressed bytes directly from the on-disk packfile over the wire. But if a client asks us directly for B, perhaps because they are doing an on-demand fetch to fill in the missing blob of a partial clone, we end up much slower. Upload-pack calls parse_object() on the oid we receive, which opens up the object and re-checks its hash (even though if it were a commit, we might skip this parse entirely in favor of the commit graph!). And then we feed the oid directly to pack-objects, which again calls parse_object() and opens the object. And then finally, when we write out the result, we may send bytes straight from disk, but only after having unnecessarily uncompressed and computed the sha1 of the object twice! This patch teaches both code paths to use the new SKIP_HASH_CHECK flag for parse_object(). You can see the speed-up in p5600, which does a blob:none clone followed by a checkout. The savings for git.git are modest: Test HEAD^ HEAD ---------------------------------------------------------------------- 5600.3: checkout of result 2.23(4.19+0.24) 1.72(3.79+0.18) -22.9% But the savings scale with the number of bytes. So on a repository like linux.git with more files, we see more improvement (in both absolute and relative numbers): Test HEAD^ HEAD ---------------------------------------------------------------------------- 5600.3: checkout of result 51.62(77.26+2.76) 34.86(61.41+2.63) -32.5% And here's an even more extreme case. This is the android gradle-plugin repository, whose tip checkout has ~3.7GB of files: Test HEAD^ HEAD -------------------------------------------------------------------------- 5600.3: checkout of result 79.51(90.84+5.55) 40.28(51.88+5.67) -49.3% Keep in mind that these timings are of the whole checkout operation. So they count the client indexing the pack and actually writing out the files. If we want to see just the server's view, we can hack up the GIT_TRACE_PACKET output from those operations and replay it via upload-pack. For the gradle example, that gives me: Benchmark 1: GIT_PROTOCOL=version=2 git.old upload-pack ../gradle-plugin <input Time (mean ± σ): 50.884 s ± 0.239 s [User: 51.450 s, System: 1.726 s] Range (min … max): 50.608 s … 51.025 s 3 runs Benchmark 2: GIT_PROTOCOL=version=2 git.new upload-pack ../gradle-plugin <input Time (mean ± σ): 9.728 s ± 0.112 s [User: 10.466 s, System: 1.535 s] Range (min … max): 9.618 s … 9.842 s 3 runs Summary 'GIT_PROTOCOL=version=2 git.new upload-pack ../gradle-plugin <input' ran 5.23 ± 0.07 times faster than 'GIT_PROTOCOL=version=2 git.old upload-pack ../gradle-plugin <input' So a server would see an 80% reduction in CPU serving the initial checkout of a partial clone for this repository. Or possibly even more depending on the packing; most of the time spent in the faster one were objects we had to open during the write phase. In both cases skipping the extra hashing on the server should be pretty safe. The client doesn't trust the server anyway, so it will re-hash all of the objects via index-pack. There is one thing to note, though: the change in get_reference() affects not just pack-objects, but rev-list, git-log, etc. We could use a flag to limit to index-pack here, but we may already skip hash checks in this instance. For commits, we'd skip anything we load via the commit-graph. And while before this commit we would check a blob fed directly to rev-list on the command-line, we'd skip checking that same blob if we found it by traversing a tree. The exception for both is if --verify-objects is used. In that case, we'll skip this optimization, and the new test makes sure we do this correctly. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2022-09-07 12:20:02 -07:00
Jeff King	b27ccae34b	rev-list: disable commit graph with --verify-objects Since the point of --verify-objects is to actually load and checksum the bytes of each object, optimizing out reads using the commit graph runs contrary to our goal. The most targeted way to implement this would be for the revision traversal code to check revs->verify_objects and avoid using the commit graph. But it's difficult to be sure we've hit all of the correct spots. For instance, I started this patch by writing the first of the included test cases, where the corrupted commit is directly on rev-list's command line. And that is easy to fix by teaching get_reference() to check revs->verify_objects before calling lookup_commit_in_graph(). But that doesn't cover the second test case: when we traverse to a corrupted commit, we'd parse the parent in process_parents(). So we'd need to check there, too. And it keeps going. In handle_commit() we sometimes parses commits, too, though I couldn't figure out a way to trigger it that did not already parse via get_reference() or tag peeling. And try_to_simplify_commit() has its own parse call, and so on. So it seems like the safest thing is to just disable the commit graph for the whole process when we see the --verify-objects option. We can do that either in builtin/rev-list.c, where we use the option, or in revision.c, where we parse it. There are some subtleties: - putting it in rev-list.c is less surprising in some ways, because there we know we are just doing a single traversal. In a command which does multiple traversals in a single process, it's rather unexpected to globally disable the commit graph. - putting it in revision.c is less surprising in some ways, because the caller does not have to remember to disable the graph themselves. But this is already tricky! The verify_objects flag in rev_info doesn't do anything by itself. The caller has to provide an object callback which does the right thing. - for that reason, in practice nobody but rev-list uses this option in the first place. So the distinction is probably not important either way. Arguably it should just be an option of rev-list, and not the general revision machinery; right now you can run "git log --verify-objects", but it does not actually do anything useful. - checking for a parsed revs.verify_objects flag in rev-list.c is too late. By that time we've already passed the arguments to setup_revisions(), which will have parsed the commits using the graph. So this commit disables the graph as soon as we see the option in revision.c. That's a pretty broad hammer, but it does what we want, and in practice nobody but rev-list is using this flag anyway. The tests cover both the "tip" and "parent" cases. Obviously our hammer hits them both in this case, but it's good to check both in case somebody later tries the more focused approach. Signed-off-by: Jeff King <peff@peff.net> Reviewed-by: Taylor Blau <me@ttaylorr.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2022-09-07 09:44:30 -07:00
Jeff King	53602a937d	fsck: actually detect bad file modes in trees We use the normal tree_desc code to iterate over trees in fsck, meaning we only see the canonicalized modes it returns. And hence we'd never see anything unexpected, since it will coerce literally any garbage into one of our normal and accepted modes. We can use the new RAW_MODES flag to see the real modes, and then use the existing code to actually analyze them. The existing code is written as allow-known-good, so there's not much point in testing a variety of breakages. The one tested here should be S_IFREG but with nonsense permissions. Do note that the error-reporting here isn't great. We don't mention the specific bad mode, but just that the tree has one or more broken modes. But when you go to look at it with "git ls-tree", we'll report the canonicalized mode! This isn't ideal, but given that this should come up rarely, and that any number of other tree corruptions might force you into looking at the binary bytes via "cat-file", it's not the end of the world. And it's something we can improve on top later if we choose. Reported-by: Xavier Morel <xavier.morel@masklinn.net> Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2022-08-10 14:26:27 -07:00
Ævar Arnfjörð Bjarmason	4627c67fa6	object-file: fix a unpack_loose_header() regression in `3b6a8db3b0` Fix a regression in my `3b6a8db3b0` (object-file.c: use "enum" return type for unpack_loose_header(), 2021-10-01) revealed both by running the test suite with --valgrind, and with the amended "git fsck" test. In practice this regression in v2.34.0 caused us to claim that we couldn't parse the header, as opposed to not being able to unpack it. Before the change in the C code the test_cmp added here would emit: -error: unable to unpack header of ./objects/e6/9de29bb2d1d6434b8b29ae775ad8c2e48c5391 +error: unable to parse header of ./objects/e6/9de29bb2d1d6434b8b29ae775ad8c2e48c5391 I.e. we'd proceed to call parse_loose_header() on the uninitialized "hdr" value, and it would have been very unlikely for that uninitialized memory to be a valid git object. The other callers of unpack_loose_header() were already checking the enum values exhaustively. See `3b6a8db3b0` and `5848fb11ac` (object-file.c: return ULHR_TOO_LONG on "header too long", 2021-10-01). Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2022-05-12 15:42:26 -07:00
Johannes Sixt	5906910794	t1450-fsck: exec-bit is not needed to make loose object writable A test case wants to append stuff to a loose object file to ensure that this kind of corruption is detected. To make a read-only loose object file writable with chmod, it is not necessary to also make it executable. Replace the bitmask 755 with the instruction +w to request only the write bit and to also heed the umask. And get rid of a POSIXPERM prerequisite, which is unnecessary for the test. Signed-off-by: Johannes Sixt <j6t@kdbg.org> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2022-01-13 12:36:12 -08:00
Ævar Arnfjörð Bjarmason	96e41f58fe	fsck: report invalid object type-path combinations Improve the error that's emitted in cases where we find a loose object we parse, but which isn't at the location we expect it to be. Before this change we'd prefix the error with a not-a-OID derived from the path at which the object was found, due to an emergent behavior in how we'd end up with an "OID" in these codepaths. Now we'll instead say what object we hashed, and what path it was found at. Before this patch series e.g.: $ git hash-object --stdin -w -t blob </dev/null `e69de29bb2` $ mv objects/e6/ objects/e7 Would emit ("[...]" used to abbreviate the OIDs): git fsck error: hash mismatch for ./objects/e7/9d[...] (expected e79d[...]) error: e79d[...]: object corrupt or missing: ./objects/e7/9d[...] Now we'll instead emit: error: e69d[...]: hash-path mismatch, found at: ./objects/e7/9d[...] Furthermore, we'll do the right thing when the object type and its location are bad. I.e. this case: $ git hash-object --stdin -w -t garbage --literally </dev/null 8315a83d2acc4c174aed59430f9a9c4ed926440f $ mv objects/83 objects/84 As noted in an earlier commits we'd simply die early in those cases, until preceding commits fixed the hard die on invalid object type: $ git fsck fatal: invalid object type Now we'll instead emit sensible error messages: $ git fsck error: 8315[...]: hash-path mismatch, found at: ./objects/84/15[...] error: 8315[...]: object is of unknown type 'garbage': ./objects/84/15[...] In both fsck.c and object-file.c we're using null_oid as a sentinel value for checking whether we got far enough to be certain that the issue was indeed this OID mismatch. We need to add the "object corrupt or missing" special-case to deal with cases where read_loose_object() will return an error before completing check_object_signature(), e.g. if we have an error in unpack_loose_rest() because we find garbage after the valid gzip content: $ git hash-object --stdin -w -t blob </dev/null `e69de29bb2` $ chmod 755 objects/e6/9de29bb2d1d6434b8b29ae775ad8c2e48c5391 $ echo garbage >>objects/e6/9de29bb2d1d6434b8b29ae775ad8c2e48c5391 $ git fsck error: garbage at end of loose object 'e69d[...]' error: unable to unpack contents of ./objects/e6/9d[...] error: e69d[...]: object corrupt or missing: ./objects/e6/9d[...] There is currently some weird messaging in the edge case when the two are combined, i.e. because we're not explicitly passing along an error state about this specific scenario from check_stream_oid() via read_loose_object() we'll end up printing the null OID if an object is of an unknown type and it can't be unpacked by zlib, e.g.: $ git hash-object --stdin -w -t garbage --literally </dev/null 8315a83d2acc4c174aed59430f9a9c4ed926440f $ chmod 755 objects/83/15a83d2acc4c174aed59430f9a9c4ed926440f $ echo garbage >>objects/83/15a83d2acc4c174aed59430f9a9c4ed926440f $ /usr/bin/git fsck fatal: invalid object type $ ~/g/git/git fsck error: garbage at end of loose object '8315a83d2acc4c174aed59430f9a9c4ed926440f' error: unable to unpack contents of ./objects/83/15a83d2acc4c174aed59430f9a9c4ed926440f error: 8315a83d2acc4c174aed59430f9a9c4ed926440f: object corrupt or missing: ./objects/83/15a83d2acc4c174aed59430f9a9c4ed926440f error: 0000000000000000000000000000000000000000: object is of unknown type 'garbage': ./objects/83/15a83d2acc4c174aed59430f9a9c4ed926440f [...] I think it's OK to leave that for future improvements, which would involve enum-ifying more error state as we've done with "enum unpack_loose_header_result" in preceding commits. In these increasingly more obscure cases the worst that can happen is that we'll get slightly nonsensical or inapplicable error messages. There's other such potential edge cases, all of which might produce some confusing messaging, but still be handled correctly as far as passing along errors goes. E.g. if check_object_signature() returns and oideq(real_oid, null_oid()) is true, which could happen if it returns -1 due to the read_istream() call having failed. Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2021-10-01 15:06:01 -07:00
Ævar Arnfjörð Bjarmason	31deb28f5e	fsck: don't hard die on invalid object types Change the error fsck emits on invalid object types, such as: $ git hash-object --stdin -w -t garbage --literally </dev/null <OID> From the very ungraceful error of: $ git fsck fatal: invalid object type $ To: $ git fsck error: <OID>: object is of unknown type 'garbage': <OID_PATH> [ other fsck output ] We'll still exit with non-zero, but now we'll finish the rest of the traversal. The tests that's being added here asserts that we'll still complain about other fsck issues (e.g. an unrelated dangling blob). To do this we need to pass down the "OBJECT_INFO_ALLOW_UNKNOWN_TYPE" flag from read_loose_object() through to parse_loose_header(). Since the read_loose_object() function is only used in builtin/fsck.c we can simply change it to accept a "struct object_info" (which contains the OBJECT_INFO_ALLOW_UNKNOWN_TYPE in its flags). See `f6371f9210` (sha1_file: add read_loose_object() function, 2017-01-13) for the introduction of read_loose_object(). Since we'll need a "struct strbuf" to hold the "type_name" let's pass it to the for_each_loose_file_in_objdir() callback to avoid allocating a new one for each loose object in the iteration. It also makes the memory management simpler than sticking it in fsck_loose() itself, as we'll only need to strbuf_reset() it, with no need to do a strbuf_release() before each "return". Before this commit we'd never check the "type" if read_loose_object() failed, but now we do. We therefore need to initialize it to OBJ_NONE to be able to tell the difference between e.g. its unpack_loose_header() having failed, and us getting past that and into parse_loose_header(). Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2021-10-01 15:06:01 -07:00
Ævar Arnfjörð Bjarmason	a5ed333121	fsck tests: test for garbage appended to a loose object There wasn't any output tests for this scenario, let's ensure that we don't regress on it in the changes that come after this. Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2021-10-01 15:05:59 -07:00
Ævar Arnfjörð Bjarmason	42cd635b21	fsck tests: test current hash/type mismatch behavior If fsck we move an object around between .git/objects/?? directories to simulate a hash mismatch "git fsck" will currently hard die() in object-file.c. This behavior will be fixed in subsequent commits, but let's test for it as-is for now. Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2021-10-01 15:05:59 -07:00
Ævar Arnfjörð Bjarmason	f7a0dba7a2	fsck tests: refactor one test to use a sub-repo Refactor one of the fsck tests to use a throwaway repository. It's a pervasive pattern in t1450-fsck.sh to spend a lot of effort on the teardown of a tests so we're not leaving corrupt content for the next test. We can instead use the pattern of creating a named sub-repository, then we don't have to worry about cleaning up after ourselves, nobody will care what state the broken "hash-mismatch" repository is after this test runs. See [1] for related discussion on various "modern" test patterns that can be used to avoid verbosity and increase reliability. 1. https://lore.kernel.org/git/87y27veeyj.fsf@evledraar.gmail.com/ Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2021-10-01 15:05:59 -07:00
Ævar Arnfjörð Bjarmason	093fffdfbe	fsck tests: add test for fsck-ing an unknown type Fix a blindspot in the fsck tests by checking what we do when we encounter an unknown "garbage" type produced with hash-object's --literally option. This behavior needs to be improved, which'll be done in subsequent patches, but for now let's test for the current behavior. Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2021-10-01 15:05:59 -07:00
Junio C Hamano	608cc4f273	Merge branch 'ab/detox-gettext-tests' Removal of GIT_TEST_GETTEXT_POISON continues. * ab/detox-gettext-tests: tests: remove most uses of test_i18ncmp tests: remove last uses of C_LOCALE_OUTPUT tests: remove most uses of C_LOCALE_OUTPUT tests: remove last uses of GIT_TEST_GETTEXT_POISON=false	2021-02-25 16:43:29 -08:00
Junio C Hamano	9e634a91c8	Merge branch 'js/fsck-name-objects-fix' Fix "git fsck --name-objects" which apparently has not been used by anybody who is motivated enough to report breakage. * js/fsck-name-objects-fix: fsck --name-objects: be more careful parsing generation numbers t1450: robustify `remove_object()`	2021-02-17 17:21:42 -08:00
Ævar Arnfjörð Bjarmason	1108cea7f8	tests: remove most uses of test_i18ncmp As a follow-up to `d162b25f95` (tests: remove support for GIT_TEST_GETTEXT_POISON, 2021-01-20) remove most uses of test_i18ncmp via a simple s/test_i18ncmp/test_cmp/g search-replacement. I'm leaving t6300-for-each-ref.sh out due to a conflict with in-flight changes between "master" and "seen", as well as the prerequisite itself due to other changes between "master" and "next/seen" which add new test_i18ncmp uses. Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2021-02-10 23:48:27 -08:00
Johannes Schindelin	e89f89361c	fsck --name-objects: be more careful parsing generation numbers In `7b35efd734` (fsck_walk(): optionally name objects on the go, 2016-07-17), the `fsck` machinery learned to optionally name the objects, so that it is easier to see what part of the repository is in a bad shape, say, when objects are missing. To save on complexity, this machinery uses a parser to determine the name of a parent given a commit's name: any `~<n>` suffix is parsed and the parent's name is formed from the prefix together with `~<n+1>`. However, this parser has a bug: if it finds a suffix `<n>` that is _not_ `~<n>`, it will mistake the empty string for the prefix and `<n>` for the generation number. In other words, it will generate a name of the form `~<bogus-number>`. Let's fix this. Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2021-02-10 12:38:05 -08:00
Johannes Schindelin	8c891eed3a	t1450: robustify `remove_object()` This function can be simplified by using the `test_oid_to_path()` helper, which incidentally also makes it more robust by not relying on the exact file system layout of the loose object files. While at it, do not define those functions in a test case, it buys us nothing. Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2021-02-10 12:38:00 -08:00
Johannes Schindelin	06d531486e	t[01]: adjust the references to the default branch name "main" Carefully excluding t1309, which sees independent development elsewhere at the time of writing, we transition above-mentioned tests to the default branch name `main`. This trick was performed via $ (cd t && sed -i -e 's/master/main/g' -e 's/MASTER/MAIN/g' \ -e 's/Master/Main/g' -e 's/naster/nain/g' -- t[01].sh && git checkout HEAD -- t1309\*) Note that t5533 contains a variation of the name `master` (`naster`) that we rename here, too. This allows us to define `GIT_TEST_DEFAULT_INITIAL_BRANCH_NAME=main` for those tests. Helped-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com> Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2020-11-19 15:44:18 -08:00
Junio C Hamano	e0ad9574dd	Merge branch 'bc/sha-256-part-3' The final leg of SHA-256 transition. * bc/sha-256-part-3: (39 commits) t: remove test_oid_init in tests docs: add documentation for extensions.objectFormat ci: run tests with SHA-256 t: make SHA1 prerequisite depend on default hash t: allow testing different hash algorithms via environment t: add test_oid option to select hash algorithm repository: enable SHA-256 support by default setup: add support for reading extensions.objectformat bundle: add new version for use with SHA-256 builtin/verify-pack: implement an --object-format option http-fetch: set up git directory before parsing pack hashes t0410: mark test with SHA1 prerequisite t5308: make test work with SHA-256 t9700: make hash size independent t9500: ensure that algorithm info is preserved in config t9350: make hash size independent t9301: make hash size independent t9300: use $ZERO_OID instead of hard-coded object ID t9300: abstract away SHA-1-specific constants t8011: make hash size independent ...	2020-08-11 18:04:11 -07:00
Junio C Hamano	5b53175b7a	Merge branch 'ma/t1450-quotefix' Test fix. * ma/t1450-quotefix: t1450: fix quoting of NUL byte when corrupting pack	2020-08-10 10:23:59 -07:00
Martin Ågren	dc156bc31f	t1450: fix quoting of NUL byte when corrupting pack We use printf '\0' to generate a NUL byte which we then `dd` into the packfile to ensure that we modify the first byte of the first object, thereby (probabilistically) invalidating the checksum. Except the single quotes we're using are interpreted to match with the ones we enclose the whole test in. So we actually execute printf \0 and end up injecting the ASCII code for "0", 0x30, instead. The comment right above this `printf` invocation says that "at least one of [the type bits] is not zero, so setting the first byte to 0 is sufficient". Substituting "0x30" for "0" in that comment won't do: we'd need to reason about which bits go where and just what the packfile looks like that we're modifying in this test. Let's avoid all of that by actually executing printf "\0" to generate a NUL byte, as intended. Signed-off-by: Martin Ågren <martin.agren@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2020-08-01 17:46:42 -07:00
brian m. carlson	e023ff0691	t: remove test_oid_init in tests Now that we call test_oid_init in the setup for all test scripts, there's no point in calling it individually. Remove all of the places where we've done so to help keep tests tidy. Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net> Reviewed-by: Eric Sunshine <sunshine@sunshineco.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2020-07-30 09:16:49 -07:00
Junio C Hamano	7e75aeb290	Merge branch 'rs/fsck-duplicate-names-in-trees' The check in "git fsck" to ensure that the tree objects are sorted still had corner cases it missed unsorted entries. * rs/fsck-duplicate-names-in-trees: fsck: detect more in-tree d/f conflicts t1450: demonstrate undetected in-tree d/f conflict t1450: increase test coverage of in-tree d/f detection fsck: fix a typo in a comment	2020-06-08 18:06:29 -07:00
René Scharfe	fe747043dc	fsck: detect more in-tree d/f conflicts If the conflict candidate file name from the top of the stack is not a prefix of the current candiate directory then we can discard it as no matching directory can come up later. But we are not done checking the candidate directory -- the stack might still hold a matching file name, so stay in the loop and check the next candidate file name. Signed-off-by: René Scharfe <l.s.r@web.de> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2020-05-21 11:03:13 -07:00
René Scharfe	3d71b1cf60	t1450: demonstrate undetected in-tree d/f conflict Signed-off-by: René Scharfe <l.s.r@web.de> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2020-05-21 11:03:13 -07:00
René Scharfe	fc12aa7bfd	t1450: increase test coverage of in-tree d/f detection Exercise the case of putting a conflict candidate file name back on the stack because a matching directory might yet come up later. Do that by factoring out the test code into a function to allow for more concise notation in the form of parameters indicating names of trees (with trailing slash) and blobs (without trailing slash) in no particular order (they are sorted by git mktree). Then add the new test case as a second function call. Fix a typo in the test title while at it ("dublicate"). Reported-by: Derrick Stolee <stolee@gmail.com> Signed-off-by: René Scharfe <l.s.r@web.de> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2020-05-21 11:03:10 -07:00
Junio C Hamano	0498840b35	Merge branch 'rs/fsck-duplicate-names-in-trees' "git fsck" ensures that the paths recorded in tree objects are sorted and without duplicates, but it failed to notice a case where a blob is followed by entries that sort before a tree with the same name. This has been corrected. * rs/fsck-duplicate-names-in-trees: fsck: report non-consecutive duplicate names in trees	2020-05-14 14:39:44 -07:00
René Scharfe	9068cfb20f	fsck: report non-consecutive duplicate names in trees Tree entries are sorted in path order, meaning that directory names get a slash ('/') appended implicitly. Git fsck checks if trees contains consecutive duplicates, but due to that ordering there can be non-consecutive duplicates as well if one of them is a directory and the other one isn't. Such a tree cannot be fully checked out. Find these duplicates by recording candidate file names on a stack and check candidate directory names against that stack to find matches. Suggested-by: Brandon Williams <bwilliamseng@gmail.com> Original-test-by: Brandon Williams <bwilliamseng@gmail.com> Signed-off-by: René Scharfe <l.s.r@web.de> Reviewed-by: Luke Diamand <luke@diamand.org> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2020-05-11 08:40:28 -07:00
Junio C Hamano	f8cb64e3d4	Merge branch 'bc/sha-256-part-1-of-4' SHA-256 transition continues. * bc/sha-256-part-1-of-4: (22 commits) fast-import: add options for rewriting submodules fast-import: add a generic function to iterate over marks fast-import: make find_marks work on any mark set fast-import: add helper function for inserting mark object entries fast-import: permit reading multiple marks files commit: use expected signature header for SHA-256 worktree: allow repository version 1 init-db: move writing repo version into a function builtin/init-db: add environment variable for new repo hash builtin/init-db: allow specifying hash algorithm on command line setup: allow check_repository_format to read repository format t/helper: make repository tests hash independent t/helper: initialize repository if necessary t/helper/test-dump-split-index: initialize git repository t6300: make hash algorithm independent t6300: abstract away SHA-1-specific constants t: use hash-specific lookup tables to define test constants repository: require a build flag to use SHA-256 hex: add functions to parse hex object IDs in any algorithm hex: introduce parsing variants taking hash algorithms ...	2020-03-26 17:11:20 -07:00
Martin Ågren	3c29e21eb0	t: drop debug `cat` calls We `cat` files, but don't inspect or grab the contents in any way. Unlike in an earlier commit, there is no reason to suspect that these files could be missing, so `cat`-ing them is just wasted effort. Signed-off-by: Martin Ågren <martin.agren@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2020-02-24 11:18:25 -08:00
brian m. carlson	42d4e1d112	commit: use expected signature header for SHA-256 The transition plan anticipates that we will allow signatures using multiple algorithms in a single commit. In order to do so, we need to use a different header per algorithm so that it will be obvious over which data to compute the signature. The transition plan specifies that we should use "gpgsig-sha256", so wire up the commit code such that it can write and parse the current algorithm, and it can remove the headers for any algorithm when creating a new commit. Add tests to ensure that we write using the right header and that git fsck doesn't reject these commits. Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2020-02-24 09:33:30 -08:00
Junio C Hamano	7034cd094b	Sync with Git 2.24.1	2019-12-09 22:17:55 -08:00
Johannes Schindelin	67af91c47a	Sync with 2.23.1 * maint-2.23: (44 commits) Git 2.23.1 Git 2.22.2 Git 2.21.1 mingw: sh arguments need quoting in more circumstances mingw: fix quoting of empty arguments for `sh` mingw: use MSYS2 quoting even when spawning shell scripts mingw: detect when MSYS2's sh is to be spawned more robustly t7415: drop v2.20.x-specific work-around Git 2.20.2 t7415: adjust test for dubiously-nested submodule gitdirs for v2.20.x Git 2.19.3 Git 2.18.2 Git 2.17.3 Git 2.16.6 test-drop-caches: use `has_dos_drive_prefix()` Git 2.15.4 Git 2.14.6 mingw: handle `subst`-ed "DOS drives" mingw: refuse to access paths with trailing spaces or periods mingw: refuse to access paths with illegal characters ...	2019-12-06 16:31:39 +01:00
Johannes Schindelin	7fd9fd94fb	Sync with 2.22.2 * maint-2.22: (43 commits) Git 2.22.2 Git 2.21.1 mingw: sh arguments need quoting in more circumstances mingw: fix quoting of empty arguments for `sh` mingw: use MSYS2 quoting even when spawning shell scripts mingw: detect when MSYS2's sh is to be spawned more robustly t7415: drop v2.20.x-specific work-around Git 2.20.2 t7415: adjust test for dubiously-nested submodule gitdirs for v2.20.x Git 2.19.3 Git 2.18.2 Git 2.17.3 Git 2.16.6 test-drop-caches: use `has_dos_drive_prefix()` Git 2.15.4 Git 2.14.6 mingw: handle `subst`-ed "DOS drives" mingw: refuse to access paths with trailing spaces or periods mingw: refuse to access paths with illegal characters unpack-trees: let merged_entry() pass through do_add_entry()'s errors ...	2019-12-06 16:31:30 +01:00
Johannes Schindelin	5421ddd8d0	Sync with 2.21.1 * maint-2.21: (42 commits) Git 2.21.1 mingw: sh arguments need quoting in more circumstances mingw: fix quoting of empty arguments for `sh` mingw: use MSYS2 quoting even when spawning shell scripts mingw: detect when MSYS2's sh is to be spawned more robustly t7415: drop v2.20.x-specific work-around Git 2.20.2 t7415: adjust test for dubiously-nested submodule gitdirs for v2.20.x Git 2.19.3 Git 2.18.2 Git 2.17.3 Git 2.16.6 test-drop-caches: use `has_dos_drive_prefix()` Git 2.15.4 Git 2.14.6 mingw: handle `subst`-ed "DOS drives" mingw: refuse to access paths with trailing spaces or periods mingw: refuse to access paths with illegal characters unpack-trees: let merged_entry() pass through do_add_entry()'s errors quote-stress-test: offer to test quoting arguments for MSYS2 sh ...	2019-12-06 16:31:23 +01:00
Johannes Schindelin	fc346cb292	Sync with 2.20.2 * maint-2.20: (36 commits) Git 2.20.2 t7415: adjust test for dubiously-nested submodule gitdirs for v2.20.x Git 2.19.3 Git 2.18.2 Git 2.17.3 Git 2.16.6 test-drop-caches: use `has_dos_drive_prefix()` Git 2.15.4 Git 2.14.6 mingw: handle `subst`-ed "DOS drives" mingw: refuse to access paths with trailing spaces or periods mingw: refuse to access paths with illegal characters unpack-trees: let merged_entry() pass through do_add_entry()'s errors quote-stress-test: offer to test quoting arguments for MSYS2 sh t6130/t9350: prepare for stringent Win32 path validation quote-stress-test: allow skipping some trials quote-stress-test: accept arguments to test via the command-line tests: add a helper to stress test argument quoting mingw: fix quoting of arguments Disallow dubiously-nested submodule git directories ...	2019-12-06 16:31:12 +01:00
Johannes Schindelin	d851d94151	Sync with 2.19.3 * maint-2.19: (34 commits) Git 2.19.3 Git 2.18.2 Git 2.17.3 Git 2.16.6 test-drop-caches: use `has_dos_drive_prefix()` Git 2.15.4 Git 2.14.6 mingw: handle `subst`-ed "DOS drives" mingw: refuse to access paths with trailing spaces or periods mingw: refuse to access paths with illegal characters unpack-trees: let merged_entry() pass through do_add_entry()'s errors quote-stress-test: offer to test quoting arguments for MSYS2 sh t6130/t9350: prepare for stringent Win32 path validation quote-stress-test: allow skipping some trials quote-stress-test: accept arguments to test via the command-line tests: add a helper to stress test argument quoting mingw: fix quoting of arguments Disallow dubiously-nested submodule git directories protect_ntfs: turn on NTFS protection by default path: also guard `.gitmodules` against NTFS Alternate Data Streams ...	2019-12-06 16:30:49 +01:00
Johannes Schindelin	7c9fbda6e2	Sync with 2.18.2 * maint-2.18: (33 commits) Git 2.18.2 Git 2.17.3 Git 2.16.6 test-drop-caches: use `has_dos_drive_prefix()` Git 2.15.4 Git 2.14.6 mingw: handle `subst`-ed "DOS drives" mingw: refuse to access paths with trailing spaces or periods mingw: refuse to access paths with illegal characters unpack-trees: let merged_entry() pass through do_add_entry()'s errors quote-stress-test: offer to test quoting arguments for MSYS2 sh t6130/t9350: prepare for stringent Win32 path validation quote-stress-test: allow skipping some trials quote-stress-test: accept arguments to test via the command-line tests: add a helper to stress test argument quoting mingw: fix quoting of arguments Disallow dubiously-nested submodule git directories protect_ntfs: turn on NTFS protection by default path: also guard `.gitmodules` against NTFS Alternate Data Streams is_ntfs_dotgit(): speed it up ...	2019-12-06 16:30:38 +01:00
Johannes Schindelin	14af7ed5a9	Sync with 2.17.3 * maint-2.17: (32 commits) Git 2.17.3 Git 2.16.6 test-drop-caches: use `has_dos_drive_prefix()` Git 2.15.4 Git 2.14.6 mingw: handle `subst`-ed "DOS drives" mingw: refuse to access paths with trailing spaces or periods mingw: refuse to access paths with illegal characters unpack-trees: let merged_entry() pass through do_add_entry()'s errors quote-stress-test: offer to test quoting arguments for MSYS2 sh t6130/t9350: prepare for stringent Win32 path validation quote-stress-test: allow skipping some trials quote-stress-test: accept arguments to test via the command-line tests: add a helper to stress test argument quoting mingw: fix quoting of arguments Disallow dubiously-nested submodule git directories protect_ntfs: turn on NTFS protection by default path: also guard `.gitmodules` against NTFS Alternate Data Streams is_ntfs_dotgit(): speed it up mingw: disallow backslash characters in tree objects' file names ...	2019-12-06 16:29:15 +01:00
Johannes Schindelin	e1d911dd4c	mingw: disallow backslash characters in tree objects' file names The backslash character is not a valid part of a file name on Windows. Hence it is dangerous to allow writing files that were unpacked from tree objects, when the stored file name contains a backslash character: it will be misinterpreted as directory separator. This not only causes ambiguity when a tree contains a blob `a\b` and a tree `a` that contains a blob `b`, but it also can be used as part of an attack vector to side-step the careful protections against writing into the `.git/` directory during a clone of a maliciously-crafted repository. Let's prevent that, addressing CVE-2019-1354. Note: we guard against backslash characters in tree objects' file names _only_ on Windows (because on other platforms, even on those where NTFS volumes can be mounted, the backslash character is _not_ a directory separator), and _only_ when `core.protectNTFS = true` (because users might need to generate tree objects for other platforms, of course without touching the worktree, e.g. using `git update-index --cacheinfo`). Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>	2019-12-04 13:20:05 +01:00
Jeff King	a59cfb3230	fsck: unify object-name code Commit `90cf590f53` (fsck: optionally show more helpful info for broken links, 2016-07-17) added a system for decorating objects with names. The code is split across builtin/fsck.c (which gives the initial names) and fsck.c (which adds to the names as it traverses the object graph). This leads to some duplication, where both sites have near-identical describe_object() functions (the difference being that the one in builtin/fsck.c uses a circular array of buffers to allow multiple calls in a single printf). Let's provide a unified object_name API for fsck. That lets us drop the duplication, as well as making the interface boundaries more clear (which will let us refactor the implementation more in a future patch). We'll leave describe_object() in builtin/fsck.c as a thin wrapper around the new API, as it relies on a static global to make its many callers a bit shorter. We'll also convert the bare add_decoration() calls in builtin/fsck.c to put_object_name(). This fixes two minor bugs: 1. We leak many small strings. add_decoration() has a last-one-wins approach: it updates the decoration to the new string and returns the old one. But we ignore the return value, leaking the old string. This is quite common to trigger, since we look at reflogs: the tip of any ref will be described both by looking at the actual ref, as well as the latest reflog entry. So we'd always end up leaking one of those strings. 2. The last-one-wins approach gives us lousy names. For instance, we first look at all of the refs, and then all of the reflogs. So rather than seeing "refs/heads/master", we're likely to overwrite it with "HEAD@{12345678}". We're generally better off using the first name we find. And indeed, the test in t1450 expects this ugly HEAD@{} name. After this patch, we've switched to using fsck_put_object_name()'s first-one-wins semantics, and we output the more human-friendly "refs/tags/julius" (and the test is updated accordingly). Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2019-10-28 14:05:17 +09:00
René Scharfe	f537485fa5	tests: remove "cat foo" before "test_i18ngrep bar foo" Some tests print a file before searching for a pattern using test_i18ngrep. This is useful when debugging tests with --verbose when the pattern is not found as expected. Since `63b1a175ee` (t: make 'test_i18ngrep' more informative on failure, 2018-02-08) test_i18ngrep already shows the contents of a file that doesn't match the expected pattern, though. So don't bother doing the same unconditionally up-front. The contents are not interesting if the expected pattern is found, and showing it twice if it doesn't match is of no use. Signed-off-by: René Scharfe <l.s.r@web.de> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2019-10-07 10:17:44 +09:00
brian m. carlson	8de6b383c1	t1450: make hash size independent Replace several hard-coded full and partial object IDs with variables or computed values. Create junk data to stuff inside an invalid tree that can be either 20 or 32 bytes long. Compute a binary all-zeros object ID instead of hard-coding a 20-byte length. Additionally, compute various object IDs by using test_oid and $EMPTY_BLOB so that this test works with multiple hash algorithms. Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2019-07-01 13:28:18 -07:00
Junio C Hamano	ea2dab1abb	Merge branch 'tb/unexpected' Code tightening against a "wrong" object appearing where an object of a different type is expected, instead of blindly assuming that the connection between objects are correctly made. * tb/unexpected: rev-list: detect broken root trees rev-list: let traversal die when --missing is not in use get_commit_tree(): return NULL for broken tree list-objects.c: handle unexpected non-tree entries list-objects.c: handle unexpected non-blob entries t: introduce tests for unexpected object types t: move 'hex2oct' into test-lib-functions.sh	2019-05-09 00:37:25 +09:00
Taylor Blau	5c07647d98	t: move 'hex2oct' into test-lib-functions.sh The helper 'hex2oct' is used to convert base-16 encoded data into a base-8 binary form, and is useful for preparing data for commands that accept input in a binary format, such as 'git hash-object', via 'printf'. This helper is defined identically in three separate places throughout 't'. Move the definition to test-lib-function.sh, so that it can be used in new test suites, and its definition is not redundant. This will likewise make our job easier in the subsequent commit, which also uses 'hex2oct'. Signed-off-by: Taylor Blau <me@ttaylorr.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2019-04-05 15:06:04 +09:00
Junio C Hamano	ea327760d3	Merge branch 'jk/fsck-doc' "git fsck --connectivity-only" omits computation necessary to sift the objects that are not reachable from any of the refs into unreachable and dangling. This is now enabled when dangling objects are requested (which is done by default, but can be overridden with the "--no-dangling" option). * jk/fsck-doc: fsck: always compute USED flags for unreachable objects doc/fsck: clarify --connectivity-only behavior	2019-03-20 15:16:06 +09:00
Jeff King	8d8c2a5aef	fsck: always compute USED flags for unreachable objects The --connectivity-only option avoids opening every object, and instead just marks reachable objects with a flag and compares this to the set of all objects. This strategy is discussed in more detail in `3e3f8bd608` (fsck: prepare dummy objects for --connectivity-check, 2017-01-17). This means that we report _every_ unreachable object as dangling. Whereas in a full fsck, we'd have actually opened and parsed each of those unreachable objects, marking their child objects with the USED flag, to mean "this was mentioned by another object". And thus we can report only the tip of an unreachable segment of the object graph as dangling. You can see this difference with a trivial example: tree=$(git hash-object -t tree -w /dev/null) one=$(echo one \| git commit-tree $tree) two=$(echo two \| git commit-tree -p $one $tree) Running `git fsck` will report only $two as dangling, but with --connectivity-only, both commits (and the tree) are reported. Likewise, using --lost-found would write all three objects. We can make --connectivity-only work like the normal case by taking a separate pass over the unreachable objects, parsing them and marking objects they refer to as USED. That still avoids parsing any blobs, though we do pay the cost to access any unreachable commits and trees (which may or may not be noticeable, depending on how many you have). If neither --dangling nor --lost-found is in effect, then we can skip this step entirely, just like we do now. That makes "--connectivity-only --no-dangling" just as fast as the current "--connectivity-only". I.e., we do the correct thing always, but you can still tweak the options to make it faster if you don't care about dangling objects. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2019-03-05 22:55:57 +09:00
Jeff King	01f8d5948a	prefer "hash mismatch" to "sha1 mismatch" To future-proof ourselves against a change in the hash, let's use the more generic "hash mismatch" to refer to integrity problems. Note that we do advertise this exact string in git-fsck(1). However, the message itself is marked for translation, meaning we do not expect it to be machine-readable. While we're touching that documentation, let's also update it for grammar and clarity. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2019-01-08 09:41:06 -08:00
Junio C Hamano	3813a89fae	Merge branch 'nd/i18n' More _("i18n") markings. * nd/i18n: fsck: mark strings for translation fsck: reduce word legos to help i18n parse-options.c: mark more strings for translation parse-options.c: turn some die() to BUG() parse-options: replace opterror() with optname() repack: mark more strings for translation remote.c: mark messages for translation remote.c: turn some error() or die() to BUG() reflog: mark strings for translation read-cache.c: add missing colon separators read-cache.c: mark more strings for translation read-cache.c: turn die("internal error") to BUG() attr.c: mark more string for translation archive.c: mark more strings for translation alias.c: mark split_cmdline_strerror() strings for translation git.c: mark more strings for translation	2019-01-04 13:33:31 -08:00
Junio C Hamano	e146cc97be	Merge branch 'nd/per-worktree-ref-iteration' The code to traverse objects for reachability, used to decide what objects are unreferenced and expendable, have been taught to also consider per-worktree refs of other worktrees as starting points to prevent data loss. * nd/per-worktree-ref-iteration: git-worktree.txt: correct linkgit command name reflog expire: cover reflog from all worktrees fsck: check HEAD and reflog from other worktrees fsck: move fsck_head_link() to get_default_heads() to avoid some globals revision.c: better error reporting on ref from different worktrees revision.c: correct a parameter name refs: new ref types to make per-worktree refs visible to all worktrees Add a place for (not) sharing stuff between worktrees refs.c: indent with tabs, not spaces	2018-11-13 22:37:26 +09:00

1 2 3

127 commits