development/git - HydraGit

mirror of https://github.com/git/git synced 2024-10-30 04:01:21 +00:00

Author	SHA1	Message	Date
Junio C Hamano	b84c783882	fsck: s/++i > 1/i++/ Signed-off-by: Junio C Hamano <gitster@pobox.com>	2018-10-24 10:25:12 +09:00
Junio C Hamano	66ec2373fe	Merge branch 'ab/fsck-skiplist' Update fsck.skipList implementation and documentation. * ab/fsck-skiplist: fsck: support comments & empty lines in skipList fsck: use oidset instead of oid_array for skipList fsck: use strbuf_getline() to read skiplist file fsck: add a performance test for skipList fsck: add a performance test fsck: document that skipList input must be unabbreviated fsck: document and test commented & empty line skipList input fsck: document and test sorted skipList input fsck tests: add a test for no skipList input fsck tests: setup of bogus commit object	2018-10-10 12:37:16 +09:00
Junio C Hamano	1958ad504b	Sync with 2.18.1 * maint-2.18: Git 2.18.1 Git 2.17.2 fsck: detect submodule paths starting with dash fsck: detect submodule urls starting with dash Git 2.16.5 Git 2.15.3 Git 2.14.5 submodule-config: ban submodule paths that start with a dash submodule-config: ban submodule urls that start with dash submodule--helper: use "--" to signal end of clone options	2018-09-27 11:50:45 -07:00
Junio C Hamano	44f87dac99	Sync with 2.17.2 * maint-2.17: Git 2.17.2 fsck: detect submodule paths starting with dash fsck: detect submodule urls starting with dash Git 2.16.5 Git 2.15.3 Git 2.14.5 submodule-config: ban submodule paths that start with a dash submodule-config: ban submodule urls that start with dash submodule--helper: use "--" to signal end of clone options	2018-09-27 11:45:01 -07:00
Jeff King	1a7fd1fb29	fsck: detect submodule paths starting with dash As with urls, submodule paths with dashes are ignored by git, but may end up confusing older versions. Detecting them via fsck lets us prevent modern versions of git from being a vector to spread broken .gitmodules to older versions. Compared to blocking leading-dash urls, though, this detection may be less of a good idea: 1. While such paths provide confusing and broken results, they don't seem to actually work as option injections against anything except "cd". In particular, the submodule code seems to canonicalize to an absolute path before running "git clone" (so it passes /your/clone/-sub). 2. It's more likely that we may one day make such names actually work correctly. Even after we revert this fsck check, it will continue to be a hassle until hosting servers are all updated. On the other hand, it's not entirely clear that the behavior in older versions is safe. And if we do want to eventually allow this, we may end up doing so with a special syntax anyway (e.g., writing "./-sub" in the .gitmodules file, and teaching the submodule code to canonicalize it when comparing). So on balance, this is probably a good protection. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2018-09-27 11:41:31 -07:00
Jeff King	a124133e1e	fsck: detect submodule urls starting with dash Urls with leading dashes can cause mischief on older versions of Git. We should detect them so that they can be rejected by receive.fsckObjects, preventing modern versions of git from being a vector by which attacks can spread. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2018-09-27 11:41:26 -07:00
Ævar Arnfjörð Bjarmason	371a655074	fsck: support comments & empty lines in skipList It's annoying not to be able to put comments and empty lines in the skipList, when e.g. keeping a big central list of commits to skip in /etc/gitconfig, which was my motivation for `1362df0d41` ("fetch: implement fetch.fsck.*", 2018-07-27). Implement that, and document what version of Git this was changed in, since this on-disk format can be expected to be used by multiple versions of git. There is no notable performance impact from this change, using the test setup described a couple of commits back: Test HEAD~ HEAD ---------------------------------------------------------------------------------------- 1450.3: fsck with 0 skipped bad commits 7.69(7.27+0.42) 7.86(7.48+0.37) +2.2% 1450.5: fsck with 1 skipped bad commits 7.69(7.30+0.38) 7.83(7.47+0.36) +1.8% 1450.7: fsck with 10 skipped bad commits 7.76(7.38+0.38) 7.79(7.38+0.41) +0.4% 1450.9: fsck with 100 skipped bad commits 7.76(7.38+0.38) 7.74(7.36+0.38) -0.3% 1450.11: fsck with 1000 skipped bad commits 7.71(7.30+0.41) 7.72(7.34+0.38) +0.1% 1450.13: fsck with 10000 skipped bad commits 7.74(7.34+0.40) 7.72(7.34+0.38) -0.3% 1450.15: fsck with 100000 skipped bad commits 7.75(7.40+0.35) 7.70(7.29+0.40) -0.6% 1450.17: fsck with 1000000 skipped bad commits 7.12(6.86+0.26) 7.13(6.87+0.26) +0.1% Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2018-09-12 15:17:46 -07:00
René Scharfe	3b41fb0cb2	fsck: use oidset instead of oid_array for skipList Change the implementation of the skipList feature to use oidset instead of oid_array to store SHA-1s for later lookup. This list is parsed once on startup by fsck, fetch-pack or receive-pack depending on the *.skipList config in use. I.e. only once per invocation, but note that for "clone --recurse-submodules" each submodule will re-parse the list, in addition to the main project, and it will be re-parsed when checking .gitmodules blobs, see `fb16287719` ("fsck: check skiplist for object in fsck_blob()", 2018-06-27). Memory usage is a bit higher, but we don't need to keep track of the sort order anymore. Embed the oidset into struct fsck_options to make its ownership clear (no hidden sharing) and avoid unnecessary pointer indirection. The cumulative impact on performance of this & the preceding change, using the test setup described in the previous commit: Test HEAD~2 HEAD~ HEAD ---------------------------------------------------------------------------------------------------------------- 1450.3: fsck with 0 skipped bad commits 7.70(7.31+0.38) 7.72(7.33+0.38) +0.3% 7.70(7.30+0.40) +0.0% 1450.5: fsck with 1 skipped bad commits 7.84(7.47+0.37) 7.69(7.32+0.36) -1.9% 7.71(7.29+0.41) -1.7% 1450.7: fsck with 10 skipped bad commits 7.81(7.40+0.40) 7.94(7.57+0.36) +1.7% 7.92(7.55+0.37) +1.4% 1450.9: fsck with 100 skipped bad commits 7.81(7.42+0.38) 7.95(7.53+0.41) +1.8% 7.83(7.42+0.41) +0.3% 1450.11: fsck with 1000 skipped bad commits 7.99(7.62+0.36) 7.90(7.50+0.40) -1.1% 7.86(7.49+0.37) -1.6% 1450.13: fsck with 10000 skipped bad commits 7.98(7.57+0.40) 7.94(7.53+0.40) -0.5% 7.90(7.45+0.44) -1.0% 1450.15: fsck with 100000 skipped bad commits 7.97(7.57+0.39) 8.03(7.67+0.36) +0.8% 7.84(7.43+0.41) -1.6% 1450.17: fsck with 1000000 skipped bad commits 7.72(7.22+0.50) 7.28(7.07+0.20) -5.7% 7.13(6.87+0.25) -7.6% Helped-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com> Signed-off-by: Rene Scharfe <l.s.r@web.de> Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2018-09-12 15:17:46 -07:00
René Scharfe	fb8952077d	fsck: use strbuf_getline() to read skiplist file The buffer is unlikely to contain a NUL character, so printing its contents using %s in a die() format is unsafe (detected with ASan). Use an idiomatic strbuf_getline() loop instead, which ensures the buffer is always NUL-terminated, supports CRLF files as well, accepts files without a newline after the last line, supports any hash length automatically, and is shorter. This fixes a bug where emitting an error about an invalid line on say line 1 would continue printing subsequent lines, and usually continue into uninitialized memory. The performance impact of this, on a CentOS 7 box with RedHat GCC 4.8.5-28: $ GIT_PERF_REPEAT_COUNT=5 GIT_PERF_MAKE_OPTS='-j56 CFLAGS="-O3"' ./run HEAD~ HEAD p1451-fsck-skip-list.sh Test HEAD~ HEAD ---------------------------------------------------------------------------------------- 1450.3: fsck with 0 skipped bad commits 7.75(7.39+0.35) 7.68(7.29+0.39) -0.9% 1450.5: fsck with 1 skipped bad commits 7.70(7.30+0.40) 7.80(7.42+0.37) +1.3% 1450.7: fsck with 10 skipped bad commits 7.77(7.37+0.40) 7.87(7.47+0.40) +1.3% 1450.9: fsck with 100 skipped bad commits 7.82(7.41+0.40) 7.88(7.43+0.44) +0.8% 1450.11: fsck with 1000 skipped bad commits 7.88(7.49+0.39) 7.84(7.43+0.40) -0.5% 1450.13: fsck with 10000 skipped bad commits 8.02(7.63+0.39) 8.07(7.67+0.39) +0.6% 1450.15: fsck with 100000 skipped bad commits 8.01(7.60+0.41) 8.08(7.70+0.38) +0.9% 1450.17: fsck with 1000000 skipped bad commits 7.60(7.10+0.50) 7.37(7.18+0.19) -3.0% Helped-by: Jeff King <peff@peff.net> Signed-off-by: Rene Scharfe <l.s.r@web.de> Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2018-09-12 15:17:46 -07:00
Junio C Hamano	3a2a1dc170	Merge branch 'sb/object-store-lookup' lookup_commit_reference() and friends have been updated to find in-core object for a specific in-core repository instance. * sb/object-store-lookup: (32 commits) commit.c: allow lookup_commit_reference to handle arbitrary repositories commit.c: allow lookup_commit_reference_gently to handle arbitrary repositories tag.c: allow deref_tag to handle arbitrary repositories object.c: allow parse_object to handle arbitrary repositories object.c: allow parse_object_buffer to handle arbitrary repositories commit.c: allow get_cached_commit_buffer to handle arbitrary repositories commit.c: allow set_commit_buffer to handle arbitrary repositories commit.c: migrate the commit buffer to the parsed object store commit-slabs: remove realloc counter outside of slab struct commit.c: allow parse_commit_buffer to handle arbitrary repositories tag: allow parse_tag_buffer to handle arbitrary repositories tag: allow lookup_tag to handle arbitrary repositories commit: allow lookup_commit to handle arbitrary repositories tree: allow lookup_tree to handle arbitrary repositories blob: allow lookup_blob to handle arbitrary repositories object: allow lookup_object to handle arbitrary repositories object: allow object_as_type to handle arbitrary repositories tag: add repository argument to deref_tag tag: add repository argument to parse_tag_buffer tag: add repository argument to lookup_tag ...	2018-08-02 15:30:42 -07:00
Junio C Hamano	bd1a32d5c8	Merge branch 'jk/fsck-gitmodules-gently' Recent "security fix" to pay attention to contents of ".gitmodules" while accepting "git push" was a bit overly strict than necessary, which has been adjusted. * jk/fsck-gitmodules-gently: fsck: downgrade gitmodulesParse default to "info" fsck: split ".gitmodules too large" error from parse failure fsck: silence stderr when parsing .gitmodules config: add options parameter to git_config_from_mem config: add CONFIG_ERROR_SILENT handler config: turn die_on_error into caller-facing enum	2018-08-02 15:30:39 -07:00
Junio C Hamano	a9e7fe96cc	Merge branch 'rj/submodule-fsck-skip' "fsck.skipList" did not prevent a blob object listed there from being inspected for is contents (e.g. we recently started to inspect the contents of ".gitmodules" for certain malicious patterns), which has been corrected. * rj/submodule-fsck-skip: fsck: check skiplist for object in fsck_blob()	2018-07-24 14:50:42 -07:00
Junio C Hamano	00624d608c	Merge branch 'sb/object-store-grafts' The conversion to pass "the_repository" and then "a_repository" throughout the object access API continues. * sb/object-store-grafts: commit: allow lookup_commit_graft to handle arbitrary repositories commit: allow prepare_commit_graft to handle arbitrary repositories shallow: migrate shallow information into the object parser path.c: migrate global git_path_* to take a repository argument cache: convert get_graft_file to handle arbitrary repositories commit: convert read_graft_file to handle arbitrary repositories commit: convert register_commit_graft to handle arbitrary repositories commit: convert commit_graft_pos() to handle arbitrary repositories shallow: add repository argument to is_repository_shallow shallow: add repository argument to check_shallow_file_for_update shallow: add repository argument to register_shallow shallow: add repository argument to set_alternate_shallow_file commit: add repository argument to lookup_commit_graft commit: add repository argument to prepare_commit_graft commit: add repository argument to read_graft_file commit: add repository argument to register_commit_graft commit: add repository argument to commit_graft_pos object: move grafts to object parser object-store: move object access functions to object-store.h	2018-07-18 12:20:28 -07:00
Jeff King	64eb14d310	fsck: downgrade gitmodulesParse default to "info" We added an fsck check in `ed8b10f631` (fsck: check .gitmodules content, 2018-05-02) as a defense against the vulnerability from `0383bbb901` (submodule-config: verify submodule names as paths, 2018-04-30). With the idea that up-to-date hosting sites could protect downstream unpatched clients that fetch from them. As part of that defense, we reject any ".gitmodules" entry that is not syntactically valid. The theory is that if we cannot even parse the file, we cannot accurately check it for vulnerabilities. And anybody with a broken .gitmodules file would eventually want to know anyway. But there are a few reasons this is a bad tradeoff in practice: - for this particular vulnerability, the client has to be able to parse the file. So you cannot sneak an attack through using a broken file, assuming the config parsers for the process running fsck and the eventual victim are functionally equivalent. - a broken .gitmodules file is not necessarily a problem. Our fsck check detects .gitmodules in _any_ tree, not just at the root. And the presence of a .gitmodules file does not necessarily mean it will be used; you'd have to also have gitlinks in the tree. The cgit repository, for example, has a file named .gitmodules from a pre-submodule attempt at sharing code, but does not actually have any gitlinks. - when the fsck check is used to reject a push, it's often hard to work around. The pusher may not have full control over the destination repository (e.g., if it's on a hosting server, they may need to contact the hosting site's support). And the broken .gitmodules may be too far back in history for rewriting to be feasible (again, this is an issue for cgit). So we're being unnecessarily restrictive without actually improving the security in a meaningful way. It would be more convenient to downgrade this check to "info", which means we'd still comment on it, but not reject a push. Site admins can already do this via config, but we should ship sensible defaults. There are a few counterpoints to consider in favor of keeping the check as an error: - the first point above assumes that the config parsers for the victim and the fsck process are equivalent. This is pretty true now, but as time goes on will become less so. Hosting sites are likely to upgrade their version of Git, whereas vulnerable clients will be stagnant (if they did upgrade, they'd cease to be vulnerable!). So in theory we may see drift over time between what two config parsers will accept. In practice, this is probably OK. The config format is pretty established at this point and shouldn't change a lot. And the farther we get from the announcement of the vulnerability, the less interesting this extra layer of protection becomes. I.e., it was _most_ valuable on day 0, when everybody's client was still vulnerable and hosting sites could protect people. But as time goes on and people upgrade, the population of vulnerable clients becomes smaller and smaller. - In theory this could protect us from other vulnerabilities in the future. E.g., .gitmodules are the only way for a malicious repository to feed data to the config parser, so this check could similarly protect clients from a future (to-be-found) bug there. But that's trading a hypothetical case for real-world pain today. If we do find such a bug, the hosting site would need to be updated to fix it, too. At which point we could figure out whether it's possible to detect _just_ the malicious case without hurting existing broken-but-not-evil cases. - Until recently, we hadn't made any restrictions on .gitmodules content. So now in tightening that we're hitting cases where certain things used to work, but don't anymore. There's some moderate pain now. But as time goes on, we'll see more (and more varied) cases that will make tightening harder in the future. So there's some argument for putting rules in place _now_, before users grow more cases that violate them. Again, this is trading pain now for hypothetical benefit in the future. And if we try hard in the future to keep our tightening to a minimum (i.e., rejecting true maliciousness without hurting broken-but-not-evil repos), then that reduces even the hypothetical benefit. Considering both sets of arguments, it makes sense to loosen this check for now. Note that we have to tweak the test in t7415 since fsck will no longer consider this a fatal error. But we still check that it reports the warning, and that we don't get the spurious error from the config code. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2018-07-16 10:57:23 -07:00
Jeff King	0d68764d94	fsck: split ".gitmodules too large" error from parse failure Since `ed8b10f631` (fsck: check .gitmodules content, 2018-05-02), we'll report a gitmodulesParse error for two conditions: - a .gitmodules entry is not syntactically valid - a .gitmodules entry is larger than core.bigFileThreshold with the intent that we can detect malicious files and protect downstream clients. E.g., from the issue in `0383bbb901` (submodule-config: verify submodule names as paths, 2018-04-30). But these conditions are actually quite different with respect to that bug: - a syntactically invalid file cannot trigger the problem, as the victim would barf before hitting the problematic code - a too-big .gitmodules _can_ trigger the problem. Even though it is obviously silly to have a 500MB .gitmodules file, the submodule code will happily parse it if you have enough memory. So it may be reasonable to configure their severity separately. Let's add a new class for the "too large" case to allow that. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2018-07-16 10:57:22 -07:00
Ramsay Jones	fb16287719	fsck: check skiplist for object in fsck_blob() Since commit `ed8b10f631` ("fsck: check .gitmodules content", 2018-05-02), fsck will issue an error message for '.gitmodules' content that cannot be parsed correctly. This is the case, even when the corresponding blob object has been included on the skiplist. For example, using the cgit repository, we see the following: $ git fsck Checking object directories: 100% (256/256), done. error: bad config line 5 in blob .gitmodules error in blob 51dd1eff1edc663674df9ab85d2786a40f7ae3a5: gitmodulesParse: could not parse gitmodules blob Checking objects: 100% (6626/6626), done. $ $ git config fsck.skiplist '.git/skip' $ echo 51dd1eff1edc663674df9ab85d2786a40f7ae3a5 >.git/skip $ $ git fsck Checking object directories: 100% (256/256), done. error: bad config line 5 in blob .gitmodules Checking objects: 100% (6626/6626), done. $ Note that the error message issued by the config parser is still present, despite adding the object-id of the blob to the skiplist. One solution would be to provide a means of suppressing the messages issued by the config parser. However, given that (logically) we are asking fsck to ignore this object, a simpler approach is to just not call the config parser if the object is to be skipped. Add a check to the 'fsck_blob()' processing function, to determine if the object is on the skiplist and, if so, exit the function early. Signed-off-by: Ramsay Jones <ramsay@ramsayjones.plus.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2018-07-03 09:49:44 -07:00
Jeff King	de6bd9e3ea	fsck: silence stderr when parsing .gitmodules If there's a parsing error we'll already report it via the usual fsck report() function (or not, if the user has asked to skip this object or warning type). The error message from the config parser just adds confusion. Let's suppress it. Note that we didn't test this case at all, so I've added coverage in t7415. We may end up toning down or removing this fsck check in the future. So take this test as checking what happens now with a focus on stderr, and not any ironclad guarantee that we must detect and report parse failures in the future. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2018-07-03 09:36:41 -07:00
Jeff King	4574f1aace	config: add options parameter to git_config_from_mem The underlying config parser knows how to handle a config_options struct, but git_config_from_mem() always passes NULL. Let's allow our callers to specify the options struct. We could add a "_with_options" variant, but since there are only a handful of callers, let's just update them to pass NULL. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2018-07-03 09:36:06 -07:00
Stefan Beller	f86bcc7b2c	tree: add repository argument to lookup_tree Add a repository argument to allow the callers of lookup_tree to be more specific about which repository to act on. This is a small mechanical change; it doesn't change the implementation to handle repositories other than the_repository yet. As with the previous commits, use a macro to catch callers passing a repository other than the_repository at compile time. Signed-off-by: Jonathan Nieder <jrnieder@gmail.com> Signed-off-by: Stefan Beller <sbeller@google.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2018-06-29 10:43:38 -07:00
Stefan Beller	da14a7ff99	blob: add repository argument to lookup_blob Add a repository argument to allow the callers of lookup_blob to be more specific about which repository to act on. This is a small mechanical change; it doesn't change the implementation to handle repositories other than the_repository yet. As with the previous commits, use a macro to catch callers passing a repository other than the_repository at compile time. Signed-off-by: Jonathan Nieder <jrnieder@gmail.com> Signed-off-by: Stefan Beller <sbeller@google.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2018-06-29 10:43:38 -07:00
Stefan Beller	109cd76dd3	object: add repository argument to parse_object Add a repository argument to allow the callers of parse_object to be more specific about which repository to act on. This is a small mechanical change; it doesn't change the implementation to handle repositories other than the_repository yet. As with the previous commits, use a macro to catch callers passing a repository other than the_repository at compile time. Signed-off-by: Jonathan Nieder <jrnieder@gmail.com> Signed-off-by: Stefan Beller <sbeller@google.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2018-06-29 10:43:38 -07:00
Junio C Hamano	b16b60f71b	Merge branch 'sb/object-store-grafts' into sb/object-store-lookup * sb/object-store-grafts: commit: allow lookup_commit_graft to handle arbitrary repositories commit: allow prepare_commit_graft to handle arbitrary repositories shallow: migrate shallow information into the object parser path.c: migrate global git_path_* to take a repository argument cache: convert get_graft_file to handle arbitrary repositories commit: convert read_graft_file to handle arbitrary repositories commit: convert register_commit_graft to handle arbitrary repositories commit: convert commit_graft_pos() to handle arbitrary repositories shallow: add repository argument to is_repository_shallow shallow: add repository argument to check_shallow_file_for_update shallow: add repository argument to register_shallow shallow: add repository argument to set_alternate_shallow_file commit: add repository argument to lookup_commit_graft commit: add repository argument to prepare_commit_graft commit: add repository argument to read_graft_file commit: add repository argument to register_commit_graft commit: add repository argument to commit_graft_pos object: move grafts to object parser object-store: move object access functions to object-store.h	2018-06-29 10:43:28 -07:00
Junio C Hamano	ebaf0a56f3	Merge branch 'nd/complete-config-vars' Continuing with the idea to programatically enumerate various pieces of data required for command line completion, teach the codebase to report the list of configuration variables subcommands care about to help complete them. * nd/complete-config-vars: completion: complete general config vars in two steps log-tree: allow to customize 'grafted' color completion: support case-insensitive config vars completion: keep other config var completion in camelCase completion: drop the hard coded list of config vars am: move advice.amWorkDir parsing back to advice.c advice: keep config name in camelCase in advice_config[] fsck: produce camelCase config key names help: add --config to list all available config fsck: factor out msg_id_info[] lazy initialization code grep: keep all colors in an array Add and use generic name->id mapping code for color slot parsing	2018-06-25 13:22:38 -07:00
Junio C Hamano	fb6ac9e79a	Merge branch 'jk/submodule-fsck-loose-fixup' Finishing touches to a topic that already is in 'maint'. * jk/submodule-fsck-loose-fixup: fsck: avoid looking at NULL blob->object t7415: don't bother creating commit for symlink test	2018-06-13 12:50:44 -07:00
Jeff King	47cc91310a	fsck: avoid looking at NULL blob->object Commit `159e7b080b` (fsck: detect gitmodules files, 2018-05-02) taught fsck to look at the content of .gitmodules files. If the object turns out not to be a blob at all, we just complain and punt on checking the content. And since this was such an obvious and trivial code path, I didn't even bother to add a test. Except it _does_ do one non-trivial thing, which is call the report() function, which wants us to pass a pointer to a "struct object". Which we don't have (we have only a "struct object_id"). So we erroneously pass a NULL object to report(), which gets dereferenced and causes a segfault. It seems like we could refactor report() to just take the object_id itself. But we pass the object pointer along to a callback function, and indeed this ends up in builtin/fsck.c's objreport() which does want to look at other parts of the object (like the type). So instead, let's just use lookup_unknown_object() to get the real "struct object", and pass that. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2018-06-11 10:56:06 -07:00
Junio C Hamano	42c8ce1c49	Merge branch 'bc/object-id' Conversion from uchar[20] to struct object_id continues. * bc/object-id: (42 commits) merge-one-file: compute empty blob object ID add--interactive: compute the empty tree value Update shell scripts to compute empty tree object ID sha1_file: only expose empty object constants through git_hash_algo dir: use the_hash_algo for empty blob object ID sequencer: use the_hash_algo for empty tree object ID cache-tree: use is_empty_tree_oid sha1_file: convert cached object code to struct object_id builtin/reset: convert use of EMPTY_TREE_SHA1_BIN builtin/receive-pack: convert one use of EMPTY_TREE_SHA1_HEX wt-status: convert two uses of EMPTY_TREE_SHA1_HEX submodule: convert several uses of EMPTY_TREE_SHA1_HEX sequencer: convert one use of EMPTY_TREE_SHA1_HEX merge: convert empty tree constant to the_hash_algo builtin/merge: switch tree functions to use object_id builtin/am: convert uses of EMPTY_TREE_SHA1_BIN to the_hash_algo sha1-file: add functions for hex empty tree and blob OIDs builtin/receive-pack: avoid hard-coded constants for push certs diff: specify abbreviation size in terms of the_hash_algo upload-pack: replace use of several hard-coded constants ...	2018-05-30 14:04:10 +09:00
Junio C Hamano	7913f53b56	Sync with Git 2.17.1 * maint: (25 commits) Git 2.17.1 Git 2.16.4 Git 2.15.2 Git 2.14.4 Git 2.13.7 fsck: complain when .gitmodules is a symlink index-pack: check .gitmodules files with --strict unpack-objects: call fsck_finish() after fscking objects fsck: call fsck_finish() after fscking objects fsck: check .gitmodules content fsck: handle promisor objects in .gitmodules check fsck: detect gitmodules files fsck: actually fsck blob data fsck: simplify ".git" check index-pack: make fsck error message more specific verify_path: disallow symlinks in .gitmodules update-index: stat updated files earlier verify_dotfile: mention case-insensitivity in comment verify_path: drop clever fallthrough skip_prefix: add case-insensitive variant ...	2018-05-29 17:10:05 +09:00
Nguyễn Thái Ngọc Duy	a4a9cc19a2	fsck: produce camelCase config key names Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2018-05-29 14:51:28 +09:00
Nguyễn Thái Ngọc Duy	3ac68a93fd	help: add --config to list all available config Sometimes it helps to list all available config vars so the user can search for something they want. The config man page can also be used but it's harder to search if you want to focus on the variable name, for example. This is not the best way to collect the available config since it's not precise. Ideally we should have a centralized list of config in C code (pretty much like 'struct option'), but that's a lot more work. This will do for now. Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2018-05-29 14:51:28 +09:00
Nguyễn Thái Ngọc Duy	a46baac61e	fsck: factor out msg_id_info[] lazy initialization code This array will be used by some other function than parse_msg_id() in the following commit. Factor out this prep code so it could be called from that one. Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2018-05-29 14:51:28 +09:00
Junio C Hamano	c89b6e136e	Merge branch 'ds/lazy-load-trees' The code has been taught to use the duplicated information stored in the commit-graph file to learn the tree object name for a commit to avoid opening and parsing the commit object when it makes sense to do so. * ds/lazy-load-trees: coccinelle: avoid wrong transformation suggestions from commit.cocci commit-graph: lazy-load trees for commits treewide: replace maybe_tree with accessor methods commit: create get_commit_tree() method treewide: rename tree to maybe_tree	2018-05-23 14:38:13 +09:00
Jeff King	b7b1fca175	fsck: complain when .gitmodules is a symlink We've recently forbidden .gitmodules to be a symlink in verify_path(). And it's an easy way to circumvent our fsck checks for .gitmodules content. So let's complain when we see it. Signed-off-by: Jeff King <peff@peff.net>	2018-05-21 23:55:12 -04:00
Jeff King	ed8b10f631	fsck: check .gitmodules content This patch detects and blocks submodule names which do not match the policy set forth in submodule-config. These should already be caught by the submodule code itself, but putting the check here means that newer versions of Git can protect older ones from malicious entries (e.g., a server with receive.fsckObjects will block the objects, protecting clients which fetch from it). As a side effect, this means fsck will also complain about .gitmodules files that cannot be parsed (or were larger than core.bigFileThreshold). Signed-off-by: Jeff King <peff@peff.net>	2018-05-21 23:55:12 -04:00
Jeff King	2738744426	fsck: handle promisor objects in .gitmodules check If we have a tree that points to a .gitmodules blob but don't have that blob, we can't check its contents. This produces an fsck error when we encounter it. But in the case of a promisor object, this absence is expected, and we must not complain. Note that this can technically circumvent our transfer.fsckObjects check. Imagine a client fetches a tree, but not the matching .gitmodules blob. An fsck of the incoming objects will show that we don't have enough information. Later, we do fetch the actual blob. But we have no idea that it's a .gitmodules file. The only ways to get around this would be to re-scan all of the existing trees whenever new ones enter (which is expensive), or to somehow persist the gitmodules_found set between fsck runs (which is complicated). In practice, it's probably OK to ignore the problem. Any repository which has all of the objects (including the one serving the promisor packs) can perform the checks. Since promisor packs are inherently about a hierarchical topology in which clients rely on upstream repositories, those upstream repositories can protect all of their downstream clients from broken objects. Signed-off-by: Jeff King <peff@peff.net>	2018-05-21 23:55:12 -04:00
Jeff King	159e7b080b	fsck: detect gitmodules files In preparation for performing fsck checks on .gitmodules files, this commit plumbs in the actual detection of the files. Note that unlike most other fsck checks, this cannot be a property of a single object: we must know that the object is found at a ".gitmodules" path at the root tree of a commit. Since the fsck code only sees one object at a time, we have to mark the related objects to fit the puzzle together. When we see a commit we mark its tree as a root tree, and when we see a root tree with a .gitmodules file, we mark the corresponding blob to be checked. In an ideal world, we'd check the objects in topological order: commits followed by trees followed by blobs. In that case we can avoid ever loading an object twice, since all markings would be complete by the time we get to the marked objects. And indeed, if we are checking a single packfile, this is the order in which Git will generally write the objects. But we can't count on that: 1. git-fsck may show us the objects in arbitrary order (loose objects are fed in sha1 order, but we may also have multiple packs, and we process each pack fully in sequence). 2. The type ordering is just what git-pack-objects happens to write now. The pack format does not require a specific order, and it's possible that future versions of Git (or a custom version trying to fool official Git's fsck checks!) may order it differently. 3. We may not even be fscking all of the relevant objects at once. Consider pushing with transfer.fsckObjects, where one push adds a blob at path "foo", and then a second push adds the same blob at path ".gitmodules". The blob is not part of the second push at all, but we need to mark and check it. So in the general case, we need to make up to three passes over the objects: once to make sure we've seen all commits, then once to cover any trees we might have missed, and then a final pass to cover any .gitmodules blobs we found in the second pass. We can simplify things a bit by loosening the requirement that we find .gitmodules only at root trees. Technically a file like "subdir/.gitmodules" is not parsed by Git, but it's not unreasonable for us to declare that Git is aware of all ".gitmodules" files and make them eligible for checking. That lets us drop the root-tree requirement, which eliminates one pass entirely. And it makes our worst case much better: instead of potentially queueing every root tree to be re-examined, the worst case is that we queue each unique .gitmodules blob for a second look. This patch just adds the boilerplate to find .gitmodules files. The actual content checks will come in a subsequent commit. Signed-off-by: Jeff King <peff@peff.net>	2018-05-21 23:55:12 -04:00
Jeff King	7ac4f3a007	fsck: actually fsck blob data Because fscking a blob has always been a noop, we didn't bother passing around the blob data. In preparation for content-level checks, let's fix up a few things: 1. The fsck_object() function just returns success for any blob. Let's a noop fsck_blob(), which we can fill in with actual logic later. 2. The fsck_loose() function in builtin/fsck.c just threw away blob content after loading it. Let's hold onto it until after we've called fsck_object(). The easiest way to do this is to just drop the parse_loose_object() helper entirely. Incidentally, this also fixes a memory leak: if we successfully loaded the object data but did not parse it, we would have left the function without freeing it. 3. When fsck_loose() loads the object data, it does so with a custom read_loose_object() helper. This function streams any blobs, regardless of size, under the assumption that we're only checking the sha1. Instead, let's actually load blobs smaller than big_file_threshold, as the normal object-reading code-paths would do. This lets us fsck small files, and a NULL return is an indication that the blob was so big that it needed to be streamed, and we can pass that information along to fsck_blob(). Signed-off-by: Jeff King <peff@peff.net>	2018-05-21 23:55:12 -04:00
Jeff King	ed9c322062	fsck: simplify ".git" check There's no need for us to manually check for ".git"; it's a subset of the other filesystem-specific tests. Dropping it makes our code slightly shorter. More importantly, the existing code may make a reader wonder why ".GIT" is not covered here, and whether that is a bug (it isn't, as it's also covered in the filesystem-specific tests). Signed-off-by: Jeff King <peff@peff.net>	2018-05-21 23:55:12 -04:00
Jonathan Nieder	1f93ecd1ab	commit: add repository argument to lookup_commit_graft Add a repository argument to allow callers of lookup_commit_graft to be more specific about which repository to handle. This is a small mechanical change; it doesn't change the implementation to handle repositories other than the_repository yet. As with the previous commits, use a macro to catch callers passing a repository other than the_repository at compile time. Signed-off-by: Jonathan Nieder <jrnieder@gmail.com> Signed-off-by: Stefan Beller <sbeller@google.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2018-05-18 08:13:10 +09:00
Stefan Beller	cbd53a2193	object-store: move object access functions to object-store.h This should make these functions easier to find and cache.h less overwhelming to read. In particular, this moves: - read_object_file - oid_object_info - write_object_file As a result, most of the codebase needs to #include object-store.h. In this patch the #include is only added to files that would fail to compile otherwise. It would be better to #include wherever identifiers from the header are used. That can happen later when we have better tooling for it. Signed-off-by: Stefan Beller <sbeller@google.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2018-05-16 11:42:03 +09:00
brian m. carlson	c54f5ca970	fsck: convert static functions to struct object_id Convert two static functions to use struct object_id and parse_oid_hex, instead of relying on harcoded 20 and 40-based constants. Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2018-05-02 13:59:50 +09:00
Derrick Stolee	2e27bd7731	treewide: replace maybe_tree with accessor methods In anticipation of making trees load lazily, create a Coccinelle script (contrib/coccinelle/commit.cocci) to ensure that all references to the 'maybe_tree' member of struct commit are either mutations or accesses through get_commit_tree() or get_commit_tree_oid(). Apply the Coccinelle script to create the rest of the patch. Signed-off-by: Derrick Stolee <dstolee@microsoft.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2018-04-11 10:47:16 +09:00
Derrick Stolee	891435d55d	treewide: rename tree to maybe_tree Using the commit-graph file to walk commit history removes the large cost of parsing commits during the walk. This exposes a performance issue: lookup_tree() takes a large portion of the computation time, even when Git never uses those trees. In anticipation of lazy-loading these trees, rename the 'tree' member of struct commit to 'maybe_tree'. This serves two purposes: it hints at the future role of possibly being NULL even if the commit has a valid tree, and it allows for unambiguous transformation from simple member access (i.e. commit->maybe_tree) to method access. Signed-off-by: Derrick Stolee <dstolee@microsoft.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2018-04-11 10:47:16 +09:00
brian m. carlson	b4f5aca40e	sha1_file: convert read_sha1_file to struct object_id Convert read_sha1_file to take a pointer to struct object_id and rename it read_object_file. Do the same for read_sha1_file_extended. Convert one use in grep.c to use the new function without any other code change, since the pointer being passed is a void pointer that is already initialized with a pointer to struct object_id. Update the declaration and definitions of the modified functions, and apply the following semantic patch to convert the remaining callers: @@ expression E1, E2, E3; @@ - read_sha1_file(E1.hash, E2, E3) + read_object_file(&E1, E2, E3) @@ expression E1, E2, E3; @@ - read_sha1_file(E1->hash, E2, E3) + read_object_file(E1, E2, E3) @@ expression E1, E2, E3, E4; @@ - read_sha1_file_extended(E1.hash, E2, E3, E4) + read_object_file_extended(&E1, E2, E3, E4) @@ expression E1, E2, E3, E4; @@ - read_sha1_file_extended(E1->hash, E2, E3, E4) + read_object_file_extended(E1, E2, E3, E4) Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2018-03-14 09:23:50 -07:00
Brandon Williams	debca9d2fe	object: rename function 'typename' to 'type_name' Rename C++ keyword in order to bring the codebase closer to being able to be compiled with a C++ compiler. Signed-off-by: Brandon Williams <bmwill@google.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2018-02-14 13:10:05 -08:00
Junio C Hamano	6defdc9fe8	Merge branch 'rs/fsck-null-return-from-lookup' Improve behaviour of "git fsck" upon finding a missing object. * rs/fsck-null-return-from-lookup: fsck: handle NULL return of lookup_blob() and lookup_tree()	2017-10-11 14:52:23 +09:00
René Scharfe	2720f6db5d	fsck: handle NULL return of lookup_blob() and lookup_tree() lookup_blob() and lookup_tree() can return NULL if they find an object of an unexpected type. Accessing the object member is undefined in that case. Cast the result to a struct object pointer instead; we can do that because object is the first member of all object types. This trick is already used in other places in the code. An error message is already shown by object_as_type(), which is called by the lookup functions. The walk callback functions are expected to handle NULL object pointers passed to them, but put_object_name() needs a valid object, so avoid calling it without one. Suggested-by: SZEDER Gábor <szeder.dev@gmail.com> Helped-by: Junio C Hamano <gitster@pobox.com> Signed-off-by: Rene Scharfe <l.s.r@web.de> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2017-10-06 11:04:34 +09:00
Jeff King	1cf01a34ea	consistently use "fallthrough" comments in switches Gcc 7 adds -Wimplicit-fallthrough, which can warn when a switch case falls through to the next case. The general idea is that the compiler can't tell if this was intentional or not, so you should annotate any intentional fall-throughs as such, leaving it to complain about any unannotated ones. There's a GNU __attribute__ which can be used for annotation, but of course we'd have to #ifdef it away on non-gcc compilers. Gcc will also recognize specially-formatted comments, which matches our current practice. Let's extend that practice to all of the unannotated sites (which I did look over and verify that they were behaving as intended). Ideally in each case we'd actually give some reasons in the comment about why we're falling through, or what we're falling through to. And gcc does support that with -Wimplicit-fallthrough=2, which relaxes the comment pattern matching to anything that contains "fallthrough" (or a variety of spelling variants). However, this isn't the default for -Wimplicit-fallthrough, nor for -Wextra. In the name of simplicity, it's probably better for us to support the default level, which requires "fallthrough" to be the only thing in the comment (modulo some window dressing like "else" and some punctuation; see the gcc manual for the complete set of patterns). This patch suppresses all warnings due to -Wimplicit-fallthrough. We might eventually want to add that to the DEVELOPER Makefile knob, but we should probably wait until gcc 7 is more widely adopted (since earlier versions will complain about the unknown warning type). Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2017-09-22 12:49:57 +09:00
Stefan Beller	8b65a34c4a	commit: convert lookup_commit_graft to struct object_id With this patch, commit.h doesn't contain the string 'sha1' any more. Signed-off-by: Stefan Beller <sbeller@google.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2017-07-13 12:02:40 -07:00
Junio C Hamano	6b526ced6f	Merge branch 'bc/object-id' Conversion from uchar[20] to struct object_id continues. * bc/object-id: (53 commits) object: convert parse_object* to take struct object_id tree: convert parse_tree_indirect to struct object_id sequencer: convert do_recursive_merge to struct object_id diff-lib: convert do_diff_cache to struct object_id builtin/ls-tree: convert to struct object_id merge: convert checkout_fast_forward to struct object_id sequencer: convert fast_forward_to to struct object_id builtin/ls-files: convert overlay_tree_on_cache to object_id builtin/read-tree: convert to struct object_id sha1_name: convert internals of peel_onion to object_id upload-pack: convert remaining parse_object callers to object_id revision: convert remaining parse_object callers to object_id revision: rename add_pending_sha1 to add_pending_oid http-push: convert process_ls_object and descendants to object_id refs/files-backend: convert many internals to struct object_id refs: convert struct ref_update to use struct object_id ref-filter: convert some static functions to struct object_id Convert struct ref_array_item to struct object_id Convert the verify_pack callback to struct object_id Convert lookup_tag to struct object_id ...	2017-05-29 12:34:43 +09:00
brian m. carlson	c251c83df2	object: convert parse_object* to take struct object_id Make parse_object, parse_object_or_die, and parse_object_buffer take a pointer to struct object_id. Remove the temporary variables inserted earlier, since they are no longer necessary. Transform all of the callers using the following semantic patch: @@ expression E1; @@ - parse_object(E1.hash) + parse_object(&E1) @@ expression E1; @@ - parse_object(E1->hash) + parse_object(E1) @@ expression E1, E2; @@ - parse_object_or_die(E1.hash, E2) + parse_object_or_die(&E1, E2) @@ expression E1, E2; @@ - parse_object_or_die(E1->hash, E2) + parse_object_or_die(E1, E2) @@ expression E1, E2, E3, E4, E5; @@ - parse_object_buffer(E1.hash, E2, E3, E4, E5) + parse_object_buffer(&E1, E2, E3, E4, E5) @@ expression E1, E2, E3, E4, E5; @@ - parse_object_buffer(E1->hash, E2, E3, E4, E5) + parse_object_buffer(E1, E2, E3, E4, E5) Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2017-05-08 15:12:58 +09:00
brian m. carlson	740ee055c6	Convert lookup_tree to struct object_id Convert the lookup_tree function to take a pointer to struct object_id. The commit was created with manual changes to tree.c, tree.h, and object.c, plus the following semantic patch: @@ @@ - lookup_tree(EMPTY_TREE_SHA1_BIN) + lookup_tree(&empty_tree_oid) @@ expression E1; @@ - lookup_tree(E1.hash) + lookup_tree(&E1) @@ expression E1; @@ - lookup_tree(E1->hash) + lookup_tree(E1) Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2017-05-08 15:12:57 +09:00
brian m. carlson	3aca1fc6c9	Convert lookup_blob to struct object_id Convert lookup_blob to take a pointer to struct object_id. The commit was created with manual changes to blob.c and blob.h, plus the following semantic patch: @@ expression E1; @@ - lookup_blob(E1.hash) + lookup_blob(&E1) @@ expression E1; @@ - lookup_blob(E1->hash) + lookup_blob(E1) Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2017-05-08 15:12:57 +09:00
Johannes Schindelin	1aeb7e756c	parse_timestamp(): specify explicitly where we parse timestamps Currently, Git's source code represents all timestamps as `unsigned long`. In preparation for using a more appropriate data type, let's introduce a symbol `parse_timestamp` (currently being defined to `strtoul`) where appropriate, so that we can later easily switch to, say, use `strtoull()` instead. Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2017-04-23 20:19:15 -07:00
brian m. carlson	910650d2f8	Rename sha1_array to oid_array Since this structure handles an array of object IDs, rename it to struct oid_array. Also rename the accessor functions and the initialization constant. This commit was produced mechanically by providing non-Documentation files to the following Perl one-liners: perl -pi -E 's/struct sha1_array/struct oid_array/g' perl -pi -E 's/\bsha1_array_/oid_array_/g' perl -pi -E 's/SHA1_ARRAY_INIT/OID_ARRAY_INIT/g' Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2017-03-31 08:33:56 -07:00
brian m. carlson	5d3206d501	Convert sha1_array_lookup to take struct object_id Convert this function by changing the declaration and definition and applying the following semantic patch to update the callers: @@ expression E1, E2; @@ - sha1_array_lookup(E1, E2.hash) + sha1_array_lookup(E1, &E2) @@ expression E1, E2; @@ - sha1_array_lookup(E1, E2->hash) + sha1_array_lookup(E1, E2) Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2017-03-31 08:33:55 -07:00
brian m. carlson	98a72ddc12	Make sha1_array_append take a struct object_id * Convert the callers to pass struct object_id by changing the function declaration and definition and applying the following semantic patch: @@ expression E1, E2; @@ - sha1_array_append(E1, E2.hash) + sha1_array_append(E1, &E2) @@ expression E1, E2; @@ - sha1_array_append(E1, E2->hash) + sha1_array_append(E1, E2) Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2017-03-31 08:33:55 -07:00
brian m. carlson	ee3051bd23	sha1-array: convert internal storage for struct sha1_array to object_id Make the internal storage for struct sha1_array use an array of struct object_id internally. Update the users of this struct which inspect its internals. Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2017-03-28 09:59:34 -07:00
brian m. carlson	365c27fbff	fsck: convert init_skiplist to struct object_id Convert a hardcoded constant buffer size to a use of GIT_MAX_HEXSZ, and use parse_oid_hex to reduce the dependency on the size of the hash. This function is a caller of sha1_array_append, which will be converted later. Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2017-03-28 09:59:33 -07:00
Jeff King	a2b22854bd	fsck: lazily load types under --connectivity-only The recent fixes to "fsck --connectivity-only" load all of the objects with their correct types. This keeps the connectivity-only code path close to the regular one, but it also introduces some unnecessary inefficiency. While getting the type of an object is cheap compared to actually opening and parsing the object (as the non-connectivity-only case would do), it's still not free. For reachable non-blob objects, we end up having to parse them later anyway (to see what they point to), making our type lookup here redundant. For unreachable objects, we might never hit them at all in the reachability traversal, making the lookup completely wasted. And in some cases, we might have quite a few unreachable objects (e.g., when alternates are used for shared object storage between repositories, it's normal for there to be objects reachable from other repositories but not the one running fsck). The comment in mark_object_for_connectivity() claims two benefits to getting the type up front: 1. We need to know the types during fsck_walk(). (And not explicitly mentioned, but we also need them when printing the types of broken or dangling commits). We can address this by lazy-loading the types as necessary. Most objects never need this lazy-load at all, because they fall into one of these categories: a. Reachable from our tips, and are coerced into the correct type as we traverse (e.g., a parent link will call lookup_commit(), which converts OBJ_NONE to OBJ_COMMIT). b. Unreachable, but not at the tip of a chunk of unreachable history. We only mention the tips as "dangling", so an unreachable commit which links to hundreds of other objects needs only report the type of the tip commit. 2. It serves as a cross-check that the coercion in (1a) is correct (i.e., we'll complain about a parent link that points to a blob). But we get most of this for free already, because right after coercing, we'll parse any non-blob objects. So we'd notice then if we expected a commit and got a blob. The one exception is when we expect a blob, in which case we never actually read the object contents. So this is a slight weakening, but given that the whole point of --connectivity-only is to sacrifice some data integrity checks for speed, this seems like an acceptable tradeoff. Here are before and after timings for an extreme case with ~5M reachable objects and another ~12M unreachable (it's the torvalds/linux repository on GitHub, connected to shared storage for all of the other kernel forks): [before] $ time git fsck --no-dangling --connectivity-only real 3m4.323s user 1m25.121s sys 1m38.710s [after] $ time git fsck --no-dangling --connectivity-only real 0m51.497s user 0m49.575s sys 0m1.776s Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2017-01-26 10:51:09 -08:00
David Turner	8354fa3d4c	fsck: handle bad trees like other errors Instead of dying when fsck hits a malformed tree object, log the error like any other and continue. Now fsck can tell the user which tree is bad, too. Signed-off-by: David Turner <dturner@twosigma.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2016-09-27 14:09:10 -07:00
Johannes Schindelin	90cf590f53	fsck: optionally show more helpful info for broken links When reporting broken links between commits/trees/blobs, it would be quite helpful at times if the user would be told how the object is supposed to be reachable. With the new --name-objects option, git-fsck will try to do exactly that: name the objects in a way that shows how they are reachable. For example, when some reflog got corrupted and a blob is missing that should not be, the user might want to remove the corresponding reflog entry. This option helps them find that entry: `git fsck` will now report something like this: broken link from tree b5eb6ff... (refs/stash@{<date>}~37:) to blob ec5cf80... Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2016-07-18 15:15:59 -07:00
Johannes Schindelin	1cd772cc41	fsck: give the error function a chance to see the fsck_options We will need this in the next commit, where fsck will be taught to optionally name the objects when reporting issues about them. Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2016-07-18 11:35:00 -07:00
Johannes Schindelin	7b35efd734	fsck_walk(): optionally name objects on the go If fsck_options->name_objects is initialized, and if it already has name(s) for the object(s) that are to be the starting point(s) for fsck_walk(), then that function will now add names for the objects that were walked. This will be highly useful for teaching git-fsck to identify root causes for broken links, which is the task for the next patch in this series. Note that this patch opts for decorating the objects with plain strings instead of full-blown structs (à la `struct rev_name` in the code of the `git name-rev` command), for several reasons: - the code is much simpler than if it had to work with structs that describe arbitrarily long names such as "master~14^2~5:builtin/am.c", - the string processing is actually quite light-weight compared to the rest of fsck's operation, - the caller of fsck_walk() is expected to provide names for the starting points, and using plain and simple strings is just the easiest way to do that. Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2016-07-18 11:35:00 -07:00
Junio C Hamano	6bfb7de89e	Merge branch 'jc/fsck-nul-in-commit' "git fsck" learned to catch NUL byte in a commit object as potential error and warn. * jc/fsck-nul-in-commit: fsck: detect and warn a commit with embedded NUL fsck_commit_buffer(): do not special case the last validation	2016-05-17 14:38:34 -07:00
Junio C Hamano	6d2d780f63	fsck: detect and warn a commit with embedded NUL Even though a Git commit object is designed to be capable of storing any binary data as its payload, in practice people use it to describe the changes in textual form, and tools like "git log" are designed to treat the payload as text. Detect and warn when we see any commit object with a NUL byte in it. Note that a NUL byte in the header part is already detected as a grave error. This change is purely about the message part. Signed-off-by: Junio C Hamano <gitster@pobox.com>	2016-05-10 10:02:06 -07:00
brian m. carlson	ce6663a9da	tree-walk: convert tree_entry_extract() to use struct object_id Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2016-04-25 14:26:28 -07:00
brian m. carlson	7d924c9139	struct name_entry: use struct object_id instead of unsigned char sha1[20] Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2016-04-25 14:23:42 -07:00
Junio C Hamano	5af297185e	fsck_commit_buffer(): do not special case the last validation The pattern taken by all the validations in this function is: if (notice a violation exists) { err = report(... VIOLATION_KIND ...); if (err) return err; } where report() returns zero if specified kind of violation is set to be ignored, and otherwise shows an error message and returns non-zero. The last validation in the function immediately before the function returns 0 to declare "all good" can cheat and directly return the return value from report(), and the current code does so, i.e. if (notice a violation exists) return report(... VIOLATION_KIND ...); return 0; But that is a selfish code that declares it is the ultimate and final form of the function, never to be enhanced later. To allow and invite future enhancements, make the last test follow the same pattern. Signed-off-by: Junio C Hamano <gitster@pobox.com>	2016-04-14 11:15:48 -07:00
Jeff King	b32fa95fd8	convert trivial cases to ALLOC_ARRAY Each of these cases can be converted to use ALLOC_ARRAY or REALLOC_ARRAY, which has two advantages: 1. It automatically checks the array-size multiplication for overflow. 2. It always uses sizeof(*array) for the element-size, so that it can never go out of sync with the declared type of the array. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2016-02-22 14:51:09 -08:00
Junio C Hamano	844a9ce472	Merge branch 'bc/object-id' More transition from "unsigned char[40]" to "struct object_id". This needed a few merge fixups, but is mostly disentangled from other topics. * bc/object-id: remote: convert functions to struct object_id Remove get_object_hash. Convert struct object to object_id Add several uses of get_object_hash. object: introduce get_object_hash macro. ref_newer: convert to use struct object_id push_refs_with_export: convert to struct object_id get_remote_heads: convert to struct object_id parse_fetch: convert to use struct object_id add_sought_entry_mem: convert to struct object_id Convert struct ref to use object_id. sha1_file: introduce has_object_file helper.	2015-12-10 12:36:13 -08:00
René Scharfe	8a272f291a	fsck: treat a NUL in a tag header as an error We check the return value of verify_header() for commits already, so do the same for tags as well. Signed-off-by: Rene Scharfe <l.s.r@web.de> Signed-off-by: Jeff King <peff@peff.net>	2015-11-20 08:02:07 -05:00
brian m. carlson	ed1c9977cb	Remove get_object_hash. Convert all instances of get_object_hash to use an appropriate reference to the hash member of the oid member of struct object. This provides no functional change, as it is essentially a macro substitution. Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net> Signed-off-by: Jeff King <peff@peff.net>	2015-11-20 08:02:05 -05:00
brian m. carlson	f2fd0760f6	Convert struct object to object_id struct object is one of the major data structures dealing with object IDs. Convert it to use struct object_id instead of an unsigned char array. Convert get_object_hash to refer to the new member as well. Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net> Signed-off-by: Jeff King <peff@peff.net>	2015-11-20 08:02:05 -05:00
brian m. carlson	7999b2cf77	Add several uses of get_object_hash. Convert most instances where the sha1 member of struct object is dereferenced to use get_object_hash. Most instances that are passed to functions that have versions taking struct object_id, such as get_sha1_hex/get_oid_hex, or instances that can be trivially converted to use struct object_id instead, are not converted. Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net> Signed-off-by: Jeff King <peff@peff.net>	2015-11-20 08:02:05 -05:00
Junio C Hamano	b2f44feba5	Merge branch 'js/fsck-opt' Allow ignoring fsck errors on specific set of known-to-be-bad objects, and also tweaking warning level of various kinds of non critical breakages reported. * js/fsck-opt: fsck: support ignoring objects in `git fsck` via fsck.skiplist fsck: git receive-pack: support excluding objects from fsck'ing fsck: introduce `git fsck --connectivity-only` fsck: support demoting errors to warnings fsck: document the new receive.fsck.<msg-id> options fsck: allow upgrading fsck warnings to errors fsck: optionally ignore specific fsck issues completely fsck: disallow demoting grave fsck errors to warnings fsck: add a simple test for receive.fsck.<msg-id> fsck: make fsck_tag() warn-friendly fsck: handle multiple authors in commits specially fsck: make fsck_commit() warn-friendly fsck: make fsck_ident() warn-friendly fsck: report the ID of the error/warning fsck (receive-pack): allow demoting errors to warnings fsck: offer a function to demote fsck errors to warnings fsck: provide a function to parse fsck message IDs fsck: introduce identifiers for fsck messages fsck: introduce fsck options	2015-08-03 11:01:18 -07:00
Junio C Hamano	acf7189512	Merge branch 'jc/fsck-retire-require-eoh' A fix to a minor regression to "git fsck" in v2.2 era that started complaining about a body-less tag object when it lacks a separator empty line after its header to separate it with a non-existent body. * jc/fsck-retire-require-eoh: fsck: it is OK for a tag and a commit to lack the body	2015-07-13 14:00:24 -07:00
Junio C Hamano	84d18c0bcf	fsck: it is OK for a tag and a commit to lack the body When fsck validates a commit or a tag, it scans each line in the header of the object using helper functions such as "start_with()", etc. that work on a NUL terminated buffer, but before `a1e920a0` (index-pack: terminate object buffers with NUL, 2014-12-08), the validation functions were fed the object data in a piece of memory that is not necessarily terminated with a NUL. We added a helper function require_end_of_header() to be called at the beginning of these validation functions to insist that the object data contains an empty line before its end. The theory is that the validating functions will notice and stop when it hits an empty line as a normal end of header (or a required header line that is missing) without scanning past the end of potentially not NUL-terminated buffer. But the theory forgot that in the older days, Git itself happily created objects with only the header lines without a body. This caused Git 2.2 and later to issue an unnecessary warning in some existing repositories. With `a1e920a0`, we do not need to require an empty line (or the body) in these objects to safely parse and validate them. Drop the offending "must have an empty line" check from this helper function, while keeping the other check to make sure that there is no NUL in the header part of the object, and adjust the name of the helper to what it does accordingly. Noticed-by: Wolfgang Denk <wd@denx.de> Helped-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2015-06-28 14:47:28 -07:00
Johannes Schindelin	cd94c6f91e	fsck: git receive-pack: support excluding objects from fsck'ing The optional new config option `receive.fsck.skipList` specifies the path to a file listing the names, i.e. SHA-1s, one per line, of objects that are to be ignored by `git receive-pack` when `receive.fsckObjects = true`. This is extremely handy in case of legacy repositories where it would cause more pain to change incorrect objects than to live with them (e.g. a duplicate 'author' line in an early commit object). The intended use case is for server administrators to inspect objects that are reported by `git push` as being too problematic to enter the repository, and to add the objects' SHA-1 to a (preferably sorted) file when the objects are legitimate, i.e. when it is determined that those problematic objects should be allowed to enter the server. Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2015-06-23 14:27:37 -07:00
Johannes Schindelin	f27d05b170	fsck: allow upgrading fsck warnings to errors The 'invalid tag name' and 'missing tagger entry' warnings can now be upgraded to errors by specifying `invalidTagName` and `missingTaggerEntry` in the receive.fsck.<msg-id> config setting. Incidentally, the missing tagger warning is now really shown as a warning (as opposed to being reported with the "error:" prefix, as it used to be the case before this commit). Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2015-06-23 14:27:36 -07:00
Johannes Schindelin	efaba7cc77	fsck: optionally ignore specific fsck issues completely An fsck issue in a legacy repository might be so common that one would like not to bother the user with mentioning it at all. With this change, that is possible by setting the respective message type to "ignore". This change "abuses" the missingEmail=warn test to verify that "ignore" is also accepted and works correctly. And while at it, it makes sure that multiple options work, too (they are passed to unpack-objects or index-pack as a comma-separated list via the --strict=... command-line option). Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2015-06-23 14:27:36 -07:00
Johannes Schindelin	f50c440730	fsck: disallow demoting grave fsck errors to warnings Some kinds of errors are intrinsically unrecoverable (e.g. errors while uncompressing objects). It does not make sense to allow demoting them to mere warnings. Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2015-06-23 14:27:36 -07:00
Johannes Schindelin	7d7d5b0568	fsck: make fsck_tag() warn-friendly When fsck_tag() identifies a problem with the commit, it should try to make it possible to continue checking the commit object, in case the user wants to demote the detected errors to mere warnings. Just like fsck_commit(), there are certain problems that could hide other issues with the same tag object. For example, if the 'type' line is not encountered in the correct position, the 'tag' line – if there is any – would not be handled at all. Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2015-06-23 14:27:35 -07:00
Johannes Schindelin	c9ad147f83	fsck: handle multiple authors in commits specially This problem has been detected in the wild, and is the primary reason to introduce an option to demote certain fsck errors to warnings. Let's offer to ignore this particular problem specifically. Technically, we could handle such repositories by setting receive.fsck.<msg-id> to missingCommitter=warn, but that could hide missing tree objects in the same commit because we cannot continue verifying any commit object after encountering a missing committer line, while we can continue in the case of multiple author lines. Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2015-06-23 14:27:35 -07:00
Johannes Schindelin	b3584761eb	fsck: make fsck_commit() warn-friendly When fsck_commit() identifies a problem with the commit, it should try to make it possible to continue checking the commit object, in case the user wants to demote the detected errors to mere warnings. Note that some problems are too problematic to simply ignore. For example, when the header lines are mixed up, we punt after encountering an incorrect line. Therefore, demoting certain warnings to errors can hide other problems. Example: demoting the missingauthor error to a warning would hide a problematic committer line. Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2015-06-23 14:27:35 -07:00
Johannes Schindelin	e6826e335a	fsck: make fsck_ident() warn-friendly When fsck_ident() identifies a problem with the ident, it should still advance the pointer to the next line so that fsck can continue in the case of a mere warning. Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2015-06-23 14:27:35 -07:00
Johannes Schindelin	71ab8fa840	fsck: report the ID of the error/warning Some repositories written by legacy code have objects with non-fatal fsck issues. To allow the user to ignore those issues, let's print out the ID (e.g. when encountering "missingEmail", the user might want to call `git config --add receive.fsck.missingEmail=warn`). Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2015-06-23 14:27:35 -07:00
Johannes Schindelin	5d477a334a	fsck (receive-pack): allow demoting errors to warnings For example, missing emails in commit and tag objects can be demoted to mere warnings with git config receive.fsck.missingemail=warn The value is actually a comma-separated list. In case that the same key is listed in multiple receive.fsck.<msg-id> lines in the config, the latter configuration wins (this can happen for example when both $HOME/.gitconfig and .git/config contain message type settings). As git receive-pack does not actually perform the checks, it hands off the setting to index-pack or unpack-objects in the form of an optional argument to the --strict option. Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2015-06-23 14:27:34 -07:00
Johannes Schindelin	0282f4dced	fsck: offer a function to demote fsck errors to warnings There are legacy repositories out there whose older commits and tags have issues that prevent pushing them when 'receive.fsckObjects' is set. One real-life example is a commit object that has been hand-crafted to list two authors. Often, it is not possible to fix those issues without disrupting the work with said repositories, yet it is still desirable to perform checks by setting `receive.fsckObjects = true`. This commit is the first step to allow demoting specific fsck issues to mere warnings. The `fsck_set_msg_types()` function added by this commit parses a list of settings in the form: missingemail=warn,badname=warn,... Unfortunately, the FSCK_WARN/FSCK_ERROR flag is only really heeded by git fsck so far, but other call paths (e.g. git index-pack --strict) error out always no matter what type was specified. Therefore, we need to take extra care to set all message types to FSCK_ERROR by default in those cases. Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2015-06-23 14:26:46 -07:00
Johannes Schindelin	f417eed8cd	fsck: provide a function to parse fsck message IDs These functions will be used in the next commits to allow the user to ask fsck to handle specific problems differently, e.g. demoting certain errors to warnings. The upcoming `fsck_set_msg_types()` function has to handle partial strings because we would like to be able to parse, say, 'missingemail=warn,missingtaggerentry=warn' command line parameters (which will be passed by receive-pack to index-pack and unpack-objects). To make the parsing robust, we generate strings from the enum keys, and using these keys, we match up strings without dashes case-insensitively to the corresponding enum values. Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2015-06-22 12:53:25 -07:00
Johannes Schindelin	c99ba492f1	fsck: introduce identifiers for fsck messages Instead of specifying whether a message by the fsck machinery constitutes an error or a warning, let's specify an identifier relating to the concrete problem that was encountered. This is necessary for upcoming support to be able to demote certain errors to warnings. In the process, simplify the requirements on the calling code: instead of having to handle full-blown varargs in every callback, we now send a string buffer ready to be used by the callback. We could use a simple enum for the message IDs here, but we want to guarantee that the enum values are associated with the appropriate message types (i.e. error or warning?). Besides, we want to introduce a parser in the next commit that maps the string representation to the enum value, hence we use the slightly ugly preprocessor construct that is extensible for use with said parser. Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2015-06-22 10:24:27 -07:00
Johannes Schindelin	22410549fc	fsck: introduce fsck options Just like the diff machinery, we are about to introduce more settings, therefore it makes sense to carry them around as a (pointer to a) struct containing all of them. Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2015-06-22 10:23:32 -07:00
Junio C Hamano	1cb4b3d380	Merge branch 'js/fsck-tag-validation' New tag object format validation added in 2.2 showed garbage after a tagname it reported in its error message. * js/fsck-tag-validation: index-pack: terminate object buffers with NUL fsck: properly bound "invalid tag name" error message	2014-12-22 12:27:41 -08:00
Junio C Hamano	77933f4449	Sync with v2.1.4 * maint-2.1: Git 2.1.4 Git 2.0.5 Git 1.9.5 Git 1.8.5.6 fsck: complain about NTFS ".git" aliases in trees read-cache: optionally disallow NTFS .git variants path: add is_ntfs_dotgit() helper fsck: complain about HFS+ ".git" aliases in trees read-cache: optionally disallow HFS+ .git variants utf8: add is_hfs_dotgit() helper fsck: notice .git case-insensitively t1450: refactor ".", "..", and ".git" fsck tests verify_dotfile(): reject .git case-insensitively read-tree: add tests for confusing paths like ".." and ".git" unpack-trees: propagate errors adding entries to the index	2014-12-17 11:46:57 -08:00
Junio C Hamano	58f1d950e3	Sync with v2.0.5 * maint-2.0: Git 2.0.5 Git 1.9.5 Git 1.8.5.6 fsck: complain about NTFS ".git" aliases in trees read-cache: optionally disallow NTFS .git variants path: add is_ntfs_dotgit() helper fsck: complain about HFS+ ".git" aliases in trees read-cache: optionally disallow HFS+ .git variants utf8: add is_hfs_dotgit() helper fsck: notice .git case-insensitively t1450: refactor ".", "..", and ".git" fsck tests verify_dotfile(): reject .git case-insensitively read-tree: add tests for confusing paths like ".." and ".git" unpack-trees: propagate errors adding entries to the index	2014-12-17 11:42:28 -08:00
Junio C Hamano	5e519fb8b0	Sync with v1.9.5 * maint-1.9: Git 1.9.5 Git 1.8.5.6 fsck: complain about NTFS ".git" aliases in trees read-cache: optionally disallow NTFS .git variants path: add is_ntfs_dotgit() helper fsck: complain about HFS+ ".git" aliases in trees read-cache: optionally disallow HFS+ .git variants utf8: add is_hfs_dotgit() helper fsck: notice .git case-insensitively t1450: refactor ".", "..", and ".git" fsck tests verify_dotfile(): reject .git case-insensitively read-tree: add tests for confusing paths like ".." and ".git" unpack-trees: propagate errors adding entries to the index	2014-12-17 11:28:54 -08:00
Junio C Hamano	6898b79721	Sync with v1.8.5.6 * maint-1.8.5: Git 1.8.5.6 fsck: complain about NTFS ".git" aliases in trees read-cache: optionally disallow NTFS .git variants path: add is_ntfs_dotgit() helper fsck: complain about HFS+ ".git" aliases in trees read-cache: optionally disallow HFS+ .git variants utf8: add is_hfs_dotgit() helper fsck: notice .git case-insensitively t1450: refactor ".", "..", and ".git" fsck tests verify_dotfile(): reject .git case-insensitively read-tree: add tests for confusing paths like ".." and ".git" unpack-trees: propagate errors adding entries to the index	2014-12-17 11:20:31 -08:00
Johannes Schindelin	d08c13b947	fsck: complain about NTFS ".git" aliases in trees Now that the index can block pathnames that can be mistaken to mean ".git" on NTFS and FAT32, it would be helpful for fsck to notice such problematic paths. This lets servers which use receive.fsckObjects block them before the damage spreads. Note that the fsck check is always on, even for systems without core.protectNTFS set. This is technically more restrictive than we need to be, as a set of users on ext4 could happily use these odd filenames without caring about NTFS. However, on balance, it's helpful for all servers to block these (because the paths can be used for mischief, and servers which bother to fsck would want to stop the spread whether they are on NTFS themselves or not), and hardly anybody will be affected (because the blocked names are variants of .git or git~1, meaning mischief is almost certainly what the tree author had in mind). Ideally these would be controlled by a separate "fsck.protectNTFS" flag. However, it would be much nicer to be able to enable/disable _any_ fsck flag individually, and any scheme we choose should match such a system. Given the likelihood of anybody using such a path in practice, it is not unreasonable to wait until such a system materializes. Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2014-12-17 11:04:45 -08:00
Jeff King	a18fcc9ff2	fsck: complain about HFS+ ".git" aliases in trees Now that the index can block pathnames that case-fold to ".git" on HFS+, it would be helpful for fsck to notice such problematic paths. This lets servers which use receive.fsckObjects block them before the damage spreads. Note that the fsck check is always on, even for systems without core.protectHFS set. This is technically more restrictive than we need to be, as a set of users on ext4 could happily use these odd filenames without caring about HFS+. However, on balance, it's helpful for all servers to block these (because the paths can be used for mischief, and servers which bother to fsck would want to stop the spread whether they are on HFS+ themselves or not), and hardly anybody will be affected (because the blocked names are variants of .git with invisible Unicode code-points mixed in, meaning mischief is almost certainly what the tree author had in mind). Ideally these would be controlled by a separate "fsck.protectHFS" flag. However, it would be much nicer to be able to enable/disable _any_ fsck flag individually, and any scheme we choose should match such a system. Given the likelihood of anybody using such a path in practice, it is not unreasonable to wait until such a system materializes. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2014-12-17 11:04:45 -08:00
Jeff King	76e86fc6e3	fsck: notice .git case-insensitively We complain about ".git" in a tree because it cannot be loaded into the index or checked out. Since we now also reject ".GIT" case-insensitively, fsck should notice the same, so that errors do not propagate. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2014-12-17 11:04:39 -08:00
Jeff King	7add441984	fsck: properly bound "invalid tag name" error message When we detect an invalid tag-name header in a tag object, like, "tag foo bar\n", we feed the pointer starting at "foo bar" to a printf "%s" formatter. This shows the name, as we want, but then it keeps printing the rest of the tag buffer, rather than stopping at the end of the line. Our tests did not notice because they look only for the matching line, but the bug is that we print much more than we wanted to. So we also adjust the test to be more exact. Note that when fscking tags with "index-pack --strict", this is even worse. index-pack does not add a trailing NUL-terminator after the object, so we may actually read past the buffer and print uninitialized memory. Running t5302 with valgrind does notice the bug for that reason. Signed-off-by: Jeff King <peff@peff.net> Acked-by: Johannes Schindelin <johannes.schindelin@gmx.de> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2014-12-09 11:54:25 -08:00
Johannes Schindelin	cec097be3a	fsck: check tag objects' headers We inspect commit objects pretty much in detail in git-fsck, but we just glanced over the tag objects. Let's be stricter. Since we do not want to limit 'tag' lines unduly, values that would fail the refname check only result in warnings, not errors. Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2014-09-11 10:44:26 -07:00
Johannes Schindelin	4d0d89755e	Make sure fsck_commit_buffer() does not run out of the buffer So far, we assumed that the buffer is NUL terminated, but this is not a safe assumption, now that we opened the fsck_object() API to pass a buffer directly. So let's make sure that there is at least an empty line in the buffer. That way, our checks would fail if the empty line was encountered prematurely, and consequently we can get away with the current string comparisons even with non-NUL-terminated buffers are passed to fsck_object(). Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2014-09-11 10:44:01 -07:00
Johannes Schindelin	90a398bbd7	fsck_object(): allow passing object data separately from the object itself When fsck'ing an incoming pack, we need to fsck objects that cannot be read via read_sha1_file() because they are not local yet (and might even be rejected if transfer.fsckobjects is set to 'true'). For commits, there is a hack in place: we basically cache commit objects' buffers anyway, but the same is not true, say, for tag objects. By refactoring fsck_object() to take the object buffer and size as optional arguments -- optional, because we still fall back to the previous method to look at the cached commit objects if the caller passes NULL -- we prepare the machinery for the upcoming handling of tag objects. The assumption that such buffers are inherently NUL terminated is now wrong, of course, hence we pass the size of the buffer so that we can add a sanity check later, to prevent running past the end of the buffer. Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2014-09-10 13:54:21 -07:00
René Scharfe	9d02150cf4	fsck: simplify fsck_commit_buffer() by using commit_list_count() fsck_commit_buffer() checks that the number of items in the parents list of a commit matches the number of parent lines in its buffer or -- if a graft is used -- the number of parents in that graft. Simplify the code by using commit_list_count() instead of counting by hand. Also use different variables for the number of lines and the number of list items, making it easier to compare them. Signed-off-by: Rene Scharfe <l.s.r@web.de> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2014-07-10 14:10:27 -07:00
Junio C Hamano	e91ae32a01	Merge branch 'jk/skip-prefix' * jk/skip-prefix: http-push: refactor parsing of remote object names imap-send: use skip_prefix instead of using magic numbers use skip_prefix to avoid repeated calculations git: avoid magic number with skip_prefix fetch-pack: refactor parsing in get_ack fast-import: refactor parsing of spaces stat_opt: check extra strlen call daemon: use skip_prefix to avoid magic numbers fast-import: use skip_prefix for parsing input use skip_prefix to avoid repeating strings use skip_prefix to avoid magic numbers transport-helper: avoid reading past end-of-string fast-import: fix read of uninitialized argv memory apply: use skip_prefix instead of raw addition refactor skip_prefix to return a boolean avoid using skip_prefix as a boolean daemon: mark some strings as const parse_diff_color_slot: drop ofs parameter	2014-07-09 11:33:28 -07:00
Jeff King	cf4fff579e	refactor skip_prefix to return a boolean The skip_prefix() function returns a pointer to the content past the prefix, or NULL if the prefix was not found. While this is nice and simple, in practice it makes it hard to use for two reasons: 1. When you want to conditionally skip or keep the string as-is, you have to introduce a temporary variable. For example: tmp = skip_prefix(buf, "foo"); if (tmp) buf = tmp; 2. It is verbose to check the outcome in a conditional, as you need extra parentheses to silence compiler warnings. For example: if ((cp = skip_prefix(buf, "foo")) /* do something with cp */ Both of these make it harder to use for long if-chains, and we tend to use starts_with() instead. However, the first line of "do something" is often to then skip forward in buf past the prefix, either using a magic constant or with an extra strlen(3) (which is generally computed at compile time, but means we are repeating ourselves). This patch refactors skip_prefix() to return a simple boolean, and to provide the pointer value as an out-parameter. If the prefix is not found, the out-parameter is untouched. This lets you write: if (skip_prefix(arg, "foo ", &arg)) do_foo(arg); else if (skip_prefix(arg, "bar ", &arg)) do_bar(arg); Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2014-06-20 10:44:43 -07:00
Jeff King	8597ea3afe	commit: record buffer length in cache Most callsites which use the commit buffer try to use the cached version attached to the commit, rather than re-reading from disk. Unfortunately, that interface provides only a pointer to the NUL-terminated buffer, with no indication of the original length. For the most part, this doesn't matter. People do not put NULs in their commit messages, and the log code is happy to treat it all as a NUL-terminated string. However, some code paths do care. For example, when checking signatures, we want to be very careful that we verify all the bytes to avoid malicious trickery. This patch just adds an optional "size" out-pointer to get_commit_buffer and friends. The existing callers all pass NULL (there did not seem to be any obvious sites where we could avoid an immediate strlen() call, though perhaps with some further refactoring we could). Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2014-06-13 12:09:38 -07:00
Jeff King	bc6b8fc130	use get_commit_buffer everywhere Each of these sites assumes that commit->buffer is valid. Since they would segfault if this was not the case, they are likely to be correct in practice. However, we can future-proof them by using get_commit_buffer. And as a side effect, we abstract away the final bare uses of commit->buffer. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2014-06-13 12:08:17 -07:00
Junio C Hamano	f7804e250d	Merge branch 'hs/simplify-bit-setting-in-fsck-tree' * hs/simplify-bit-setting-in-fsck-tree: fsck: use bitwise-or assignment operator to set flag	2014-03-31 16:30:44 -07:00
Junio C Hamano	40adf520a3	Merge branch 'ys/fsck-commit-parsing' * ys/fsck-commit-parsing: fsck.c:fsck_commit(): use skip_prefix() to verify and skip constant fsck.c:fsck_ident(): ident points at a const string	2014-03-28 13:51:24 -07:00
Hiroyuki Sano	effd12ec87	fsck: use bitwise-or assignment operator to set flag fsck_tree() has two different ways to set a flag variable, either by using a if-statement that guards an assignment, or by using a bitwise-or assignment operator. Most are done with the former, and only one variable is assigned with the latter. Since all the conditions are short-and-sweet, we can afford to uniformly use the latter style, which makes the resulting code shorter and easier to read. Signed-off-by: Hiroyuki Sano <sh19910711@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2014-03-20 11:20:48 -07:00
Yuxuan Shui	2d820a61df	fsck.c:fsck_commit(): use skip_prefix() to verify and skip constant fsck_commit() uses memcmp() to check if the buffer starts with a certain prefix, and skips the prefix if it does. This is exactly what skip_prefix() was designed for. Signed-off-by: Yuxuan Shui <yshuiv7@gmail.com> Helped-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2014-03-19 15:34:56 -07:00
Yuxuan Shui	de42180f6a	fsck.c:fsck_ident(): ident points at a const string Since fsck_ident doesn't change the content of ident, the type of ident could be const char . This change is required to rewrite fsck_commit() to use skip_prefix(). Signed-off-by: Yuxuan Shui <yshuiv7@gmail.com> Helped-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2014-03-13 13:00:30 -07:00
Jeff King	7ca36d9398	date: check date overflow against time_t When we check whether a timestamp has overflowed, we check only against ULONG_MAX, meaning that strtoul has overflowed. However, we also feed these timestamps to system functions like gmtime, which expect a time_t. On many systems, time_t is actually smaller than "unsigned long" (e.g., because it is signed), and we would overflow when using these functions. We don't know the actual size or signedness of time_t, but we can easily check for truncation with a simple assignment. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2014-02-24 10:12:58 -08:00
Jeff King	d4b8de0420	fsck: report integer overflow in author timestamps When we check commit objects, we complain if commit->date is ULONG_MAX, which is an indication that we saw integer overflow when parsing it. However, we do not do any check at all for author lines, which also contain a timestamp. Let's actually check the timestamps on each ident line with strtoul. This catches both author and committer lines, and we can get rid of the now-redundant commit->date check. Note that like the existing check, we compare only against ULONG_MAX. Now that we are calling strtoul at the site of the check, we could be slightly more careful and also check that errno is set to ERANGE. However, this will make further refactoring in future patches a little harder, and it doesn't really matter in practice. For 32-bit systems, one would have to create a commit at the exact wrong second in 2038. But by the time we get close to that, all systems will hopefully have moved to 64-bit (and if they haven't, they have a real problem one second later). For 64-bit systems, by the time we get close to ULONG_MAX, all systems will hopefully have been consumed in the fiery wrath of our expanding Sun. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2014-02-24 10:12:58 -08:00
Jeff King	5c17f51270	fsck: warn about ".git" in trees Having a ".git" entry inside a tree can cause confusing results on checkout. At the top-level, you could not checkout such a tree, as it would complain about overwriting the real ".git" directory. In a subdirectory, you might check it out, but performing operations in the subdirectory would confusingly consider the in-tree ".git" directory as the repository. The regular git tools already make it hard to accidentally add such an entry to a tree, and do not allow such entries to enter the index at all. Teaching fsck about it provides an additional safety check, and let's us avoid propagating any such bogosity when transfer.fsckObjects is on. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2012-11-28 13:52:54 -08:00
Jeff King	5d34a4359d	fsck: warn about '.' and '..' in trees A tree with meta-paths like '.' or '..' does not work well with git; the index will refuse to load it or check it out to the filesystem (and even if we did not have that safety, it would look like we were overwriting an untracked directory). For the same reason, it is difficult to create such a tree with regular git. Let's warn about these dubious entries during fsck, just in case somebody has created a bogus tree (and this also lets us prevent them from propagating when transfer.fsckObjects is set). Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2012-11-28 10:41:08 -08:00
Junio C Hamano	03adeeaad6	Merge branch 'jk/maint-null-in-trees' into maint-1.7.11 "git diff" had a confusion between taking data from a path in the working tree and taking data from an object that happens to have name 0{40} recorded in a tree. * jk/maint-null-in-trees: fsck: detect null sha1 in tree entries do not write null sha1s to on-disk index diff: do not use null sha1 as a sentinel value	2012-09-10 15:24:54 -07:00
Jeff King	c479d14a80	fsck: detect null sha1 in tree entries Short of somebody happening to beat the 1 in 2^160 odds of actually generating content that hashes to the null sha1, we should never see this value in a tree entry. So let's have fsck warn if it it seen. As in the previous commit, we test both blob and submodule entries to future-proof the test suite against the implementation depending on connectivity to notice the error. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2012-07-29 15:14:08 -07:00
Pete Wyckoff	82247e9bd5	remove superfluous newlines in error messages The error handling routines add a newline. Remove the duplicate ones in error messages. Signed-off-by: Pete Wyckoff <pw@padd.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2012-04-30 15:45:51 -07:00
Dmitry Ivankov	53f53cff24	fsck: improve committer/author check fsck allows a name with > character in it like "name> <email>". Also for "name email>" fsck says "missing space before email". More precisely, it seeks for a first '<', checks that ' ' preceeds it. Then seeks to '<' or '>' and checks that it is the '>'. Missing space is reported if either '<' is not found or it's not preceeded with ' '. Change it to following. Seek to '<' or '>', check that it is '<' and is preceeded with ' '. Seek to '<' or '>' and check that it is '>'. So now "name> <email>" is rejected as "bad name". More strict name check is the only change in what is accepted. Report 'missing space' only if '<' is found and is not preceeded with a space. Signed-off-by: Dmitry Ivankov <divanorama@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2011-08-11 12:21:07 -07:00
Junio C Hamano	5b42477b59	Merge branch 'jm/maint-misc-fix' into maint * jm/maint-misc-fix: read_gitfile_gently: use ssize_t to hold read result remove tests of always-false condition rerere.c: diagnose a corrupt MERGE_RR when hitting EOF between TAB and '\0'	2011-05-30 00:09:41 -07:00
Jim Meyering	5dd564895e	remove tests of always-false condition * fsck.c (fsck_error_function): Don't test obj->sha1 == 0. It can never be true, since that sha1 member is an array. * transport.c (set_upstreams): Likewise for ref->new_sha1. Signed-off-by: Jim Meyering <meyering@redhat.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2011-05-26 11:24:24 -07:00
Jeff King	ebeb60900f	strbuf: add strbuf_vaddf In a variable-args function, the code for writing into a strbuf is non-trivial. We ended up cutting and pasting it in several places because there was no vprintf-style function for strbufs (which in turn was held up by a lack of va_copy). Now that we have a fallback va_copy, we can add strbuf_vaddf, the strbuf equivalent of vsprintf. And we can clean up the cut and paste mess. Signed-off-by: Jeff King <peff@peff.net> Improved-by: Christian Couder <christian.couder@gmail.com> Signed-off-by: Jonathan Nieder <jrnieder@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2011-02-26 01:06:50 -08:00
Jonathan Nieder	0adc6a3d49	fsck: fix bogus commit header check `daae1922` (fsck: check ident lines in commit objects, 2010-04-24) taught fsck to expect commit objects to have the form tree <object name> <parents> author <valid ident string> committer <valid ident string> log message The check is overly strict: for example, it errors out with the message “expected blank line” for perfectly valid commits with an "encoding ISO-8859-1" line. Later it might make sense to teach fsck about the rest of the header and warn about unrecognized header lines, but for simplicity, let’s accept arbitrary trailing lines for now. Reported-by: Tuncer Ayaz <tuncer.ayaz@gmail.com> Signed-off-by: Jonathan Nieder <jrnieder@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2010-05-28 15:08:27 -07:00
Jonathan Nieder	daae19224a	fsck: check ident lines in commit objects Check that email addresses do not contain <, >, or newline so they can be quickly scanned without trouble. The copy() function in ident.c already ensures that ordinary git commands will not write email addresses without this property. Signed-off-by: Jonathan Nieder <jrnieder@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2010-05-01 12:15:06 -07:00
Eric Wong	f290974390	Allow the Unix epoch to be a valid commit date It is common practice to use the Unix epoch as a fallback date when a suitable date is not available. This is true of git svn and possibly other importing tools that import non-git history into git. Instead of clobbering established strtoul() error reporting semantics with our own, preserve the strtoul() error value of ULONG_MAX for fsck.c to handle. Signed-off-by: Eric Wong <normalperson@yhbt.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2009-07-06 09:36:16 -07:00
Benjamin Kramer	eb3a9dd327	Remove unused function scope local variables These variables were unused and can be removed safely: builtin-clone.c::cmd_clone(): use_local_hardlinks, use_separate_remote builtin-fetch-pack.c::find_common(): len builtin-remote.c::mv(): symref diff.c::show_stats():show_stats(): total diffcore-break.c::should_break(): base_size fast-import.c::validate_raw_date(): date, sign fsck.c::fsck_tree(): o_sha1, sha1 xdiff-interface.c::parse_num(): read_some Signed-off-by: Benjamin Kramer <benny.kra@googlemail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2009-03-07 20:52:17 -08:00
Junio C Hamano	8bb4646dae	Merge branch 'maint' * maint: Fix non-literal format in printf-style calls git-submodule: Avoid printing a spurious message. git ls-remote: make usage string match manpage Makefile: help people who run 'make check' by mistake	2008-11-11 14:49:50 -08:00
Daniel Lowe	9db56f71b9	Fix non-literal format in printf-style calls These were found using gcc 4.3.2-1ubuntu11 with the warning: warning: format not a string literal and no format arguments Incorporated suggestions from Brandon Casey <casey@nrlssc.navy.mil>. Signed-off-by: Junio C Hamano <gitster@pobox.com>	2008-11-11 14:43:59 -08:00
Brandon Casey	f285a2d7ed	Replace calls to strbuf_init(&foo, 0) with STRBUF_INIT initializer Many call sites use strbuf_init(&foo, 0) to initialize local strbuf variable "foo" which has not been accessed since its declaration. These can be replaced with a static initialization using the STRBUF_INIT macro which is just as readable, saves a function call, and takes up fewer lines. Signed-off-by: Brandon Casey <casey@nrlssc.navy.mil> Signed-off-by: Shawn O. Pearce <spearce@spearce.org>	2008-10-12 12:36:19 -07:00
Junio C Hamano	79b1138e78	fsck.c: fix bogus "empty tree" check `ba002f3` (builtin-fsck: move common object checking code to fsck.c) did more than what it claimed to. Most notably, it wrongly made an empty tree object an error by pretending to only move code from fsck_tree() in builtin-fsck.c to fsck_tree() in fsck.c, but in fact adding a bogus check to barf on an empty tree. An empty tree object is _unusual_. Recent porcelains try reasonably hard not to let the user create a commit that contains such a tree. Perhaps warning about them in git-fsck may have some merit. HOWEVER. Being unusual and being errorneous are two quite different things. This is especially true now we seem to use the same fsck_$object() code in places other than git-fsck itself. For example, receive-pack should not reject unusual objects, even if it would be a good idea to tighten it to reject incorrect ones. Signed-off-by: Junio C Hamano <gitster@pobox.com>	2008-03-05 10:32:01 -08:00
Martin Koegler	d6ffc8d784	add common fsck error printing function Signed-off-by: Martin Koegler <mkoegler@auto.tuwien.ac.at> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2008-02-25 23:57:35 -08:00
Martin Koegler	ba002f3b28	builtin-fsck: move common object checking code to fsck.c Signed-off-by: Martin Koegler <mkoegler@auto.tuwien.ac.at> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2008-02-25 23:57:35 -08:00
Martin Koegler	355885d531	add generic, type aware object chain walker The requirements are: * it may not crash on NULL pointers * a callback function is needed, as index-pack/unpack-objects need to do different things * the type information is needed to check the expected <-> real type and print better error messages Signed-off-by: Martin Koegler <mkoegler@auto.tuwien.ac.at> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2008-02-25 23:57:34 -08:00
Mark Wooding	b4dfefe00f	Make fsck and fsck-objects be builtins. The earlier change `df391b192` to rename fsck-objects to fsck broke fsck-objects. This should fix it again. Signed-off-by: Mark Wooding <mdw@distorted.org.uk> Signed-off-by: Junio C Hamano <junkio@cox.net>	2007-01-29 09:36:21 -08:00
Junio C Hamano	df391b192d	git-fsck-objects is now synonym to git-fsck Signed-off-by: Junio C Hamano <junkio@cox.net>	2007-01-28 16:33:58 -08:00

1 2 3 4 5

237 commits