development/git - HydraGit

mirror of https://github.com/git/git synced 2024-07-20 17:44:37 +00:00

Author	SHA1	Message	Date
Jeff King	f59793763d	fsck: only require an oid for skiplist functions The skiplist is inherently an oidset, so we don't need a full object struct. Let's take just the oid to give our callers more flexibility. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2019-10-28 14:05:18 +09:00
Jeff King	5afc4b1dc6	fsck: only provide oid/type in fsck_error callback None of the callbacks actually care about having a "struct object"; they're happy with just the oid and type information. So let's give ourselves more flexibility to avoid having a "struct object" by just passing the broken-down fields. Note that the callback already takes a "type" field for the fsck message type. We'll rename that to "msg_type" (and use "object_type" for the object type) to make the distinction explicit. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2019-10-28 14:05:18 +09:00
Jeff King	733902905d	fsck: use oids rather than objects for object_name API We don't actually care about having object structs; we only need to look up decorations by oid. Let's accept this more limited form, which will give our callers more flexibility. Note that the decoration API we rely on uses object structs itself (even though it only looks at their oids). We can solve this by switching to a kh_oid_map (we could also use the hashmap oidmap, but it's more awkward for the simple case of just storing a void pointer). Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2019-10-28 14:05:18 +09:00
Jeff King	d40bbc109b	fsck_describe_object(): build on our get_object_name() primitive This isolates the implementation detail of using the decoration code to our put/get functions. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2019-10-28 14:05:17 +09:00
Jeff King	a59cfb3230	fsck: unify object-name code Commit `90cf590f53` (fsck: optionally show more helpful info for broken links, 2016-07-17) added a system for decorating objects with names. The code is split across builtin/fsck.c (which gives the initial names) and fsck.c (which adds to the names as it traverses the object graph). This leads to some duplication, where both sites have near-identical describe_object() functions (the difference being that the one in builtin/fsck.c uses a circular array of buffers to allow multiple calls in a single printf). Let's provide a unified object_name API for fsck. That lets us drop the duplication, as well as making the interface boundaries more clear (which will let us refactor the implementation more in a future patch). We'll leave describe_object() in builtin/fsck.c as a thin wrapper around the new API, as it relies on a static global to make its many callers a bit shorter. We'll also convert the bare add_decoration() calls in builtin/fsck.c to put_object_name(). This fixes two minor bugs: 1. We leak many small strings. add_decoration() has a last-one-wins approach: it updates the decoration to the new string and returns the old one. But we ignore the return value, leaking the old string. This is quite common to trigger, since we look at reflogs: the tip of any ref will be described both by looking at the actual ref, as well as the latest reflog entry. So we'd always end up leaking one of those strings. 2. The last-one-wins approach gives us lousy names. For instance, we first look at all of the refs, and then all of the reflogs. So rather than seeing "refs/heads/master", we're likely to overwrite it with "HEAD@{12345678}". We're generally better off using the first name we find. And indeed, the test in t1450 expects this ugly HEAD@{} name. After this patch, we've switched to using fsck_put_object_name()'s first-one-wins semantics, and we output the more human-friendly "refs/tags/julius" (and the test is updated accordingly). Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2019-10-28 14:05:17 +09:00
Jeff King	23a173a761	fsck: require an actual buffer for non-blobs The fsck_object() function takes in a buffer, but also a "struct object". The rules for using these vary between types: - for a commit, we'll use the provided buffer; if it's NULL, we'll fall back to get_commit_buffer(), which loads from either an in-memory cache or from disk. If the latter fails, we'd die(), which is non-ideal for fsck. - for a tag, a NULL buffer will fall back to loading the object from disk (and failure would lead to an fsck error) - for a tree, we _never_ look at the provided buffer, and always use tree->buffer - for a blob, we usually don't look at the buffer at all, unless it has been marked as a .gitmodule file. In that case we check the buffer given to us, or assume a NULL buffer is a very large blob (and complain about it) This is much more complex than it needs to be. It turns out that nobody ever feeds a NULL buffer that isn't a blob: - git-fsck calls fsck_object() only from fsck_obj(). That in turn is called by one of: - fsck_obj_buffer(), which is a callback to verify_pack(), which unpacks everything except large blobs into a buffer (see pack-check.c, lines 131-141). - fsck_loose(), which hits a BUG() on non-blobs with a NULL buffer (builtin/fsck.c, lines 639-640) And in either case, we'll have just called parse_object_buffer() anyway, which would segfault on a NULL buffer for commits or tags (not for trees, but it would install a NULL tree->buffer which would later cause a segfault) - git-index-pack asserts that the buffer is non-NULL unless the object is a blob (see builtin/index-pack.c, line 832) - git-unpack-objects always writes a non-NULL buffer into its obj_buffer hash, which is then fed to fsck_object(). (There is actually a funny thing here where it does not store blob buffers at all, nor does it call fsck on them; it does check any needed blobs via fsck_finish() though). Let's make the rules simpler, which reduces the amount of code and gives us more flexibility in refactoring the fsck code. The new rules are: - only blobs are allowed to pass a NULL buffer - we always use the provided buffer, never pulling information from the object struct We don't have to adjust any callers, because they were already adhering to these. Note that we do drop a few fsck identifiers for missing tags, but that was all dead code (because nobody passed a NULL tag buffer). Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2019-10-28 14:05:17 +09:00
Jeff King	2175a0c601	fsck: stop checking tag->tagged Way back in `92d4c85d24` (fsck-cache: fix SIGSEGV on bad tag object, 2005-05-03), we added an fsck check that the "tagged" field of a tag struct isn't NULL. But that was mainly protecting the printing code for "--tags", and that code wasn't moved along with the check as part of `ba002f3b28` (builtin-fsck: move common object checking code to fsck.c, 2008-02-25). It could also serve to detect type mismatch problems (where a tag points to object X as a commit, but really X is a blob), but it couldn't do so reliably (we'd call lookup_commit(X), but it will only notice the problem if we happen to have previously called lookup_blob(X) in the same process). And as of a commit earlier in this series, we'd consider that a parse error and complain about the object even before getting to this point anyway. So let's drop this "tag->tagged" check. It's not helping anything, and getting rid of it makes the function conceptually cleaner, as it really is just checking the buffer we feed it. In fact, we can get rid of our one-line wrapper and just unify fsck_tag() and fsck_tag_buffer(). Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2019-10-28 14:05:17 +09:00
Jeff King	ec65231571	fsck: stop checking commit->parent counts In `4516338243` (builtin-fsck: reports missing parent commits, 2008-02-25), we added code to check that fsck found the same number of parents from parsing the commit itself as we see in the commit struct we got from parse_commit_buffer(). Back then the rationale was that the normal commit parser might skip some bad parents. But earlier in this series, we started treating that reliably as a parsing error, meaning that we'd complain about it before we even hit the code in fsck.c. Let's drop this code, which now makes fsck_commit_buffer() completely independent of any parsed values in the commit struct (that's conceptually cleaner, and also opens up more refactoring options). Note that we can also drop the MISSING_PARENT and MISSING_GRAFT fsck identifiers. This is no loss, as these would not trigger reliably anyway. We'd hit them only when lookup_commit() failed, which occurs only if we happen to have seen the object with another type already in the same process. In most cases, we'd actually run into the problem during the connectivity walk, not here. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2019-10-28 14:05:17 +09:00
Jeff King	1de6007d85	fsck: stop checking commit->tree value We check in fsck_commit_buffer() that commit->tree isn't NULL, which in turn generally comes from a previous parse by parse_commit(). But this isn't really accomplishing anything. The two things we might care about are: - was there a syntactically valid "tree <oid>" line in the object? But we've just done our own parse in fsck_commit_buffer() to check this. - does it point to a valid tree object? But checking the "tree" pointer here doesn't actually accomplish that; it just shows that lookup_tree() didn't return NULL, which only means that we haven't yet seen that oid as a non-tree in this process. A real connectivity check would exhaustively walk all graph links, and we do that already in a separate function. So this code isn't helping anything. And it makes the fsck code slightly more confusing and rigid (e.g., it requires that any commit structs have already been parsed). Let's drop it. As a bit of history, the presence of this code looks like a leftover from early fsck code (which did rely on parse_commit() to do most of the parsing). The check comes from `ff5ebe39b0` (Port fsck-cache to use parsing functions, 2005-04-18), but we later added an explicit walk in `355885d531` (add generic, type aware object chain walker, 2008-02-25). Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2019-10-28 14:05:17 +09:00
Junio C Hamano	209f075593	Merge branch 'br/blame-ignore' "git blame" learned to "ignore" commits in the history, whose effects (as well as their presence) get ignored. * br/blame-ignore: t8014: remove unnecessary braces blame: drop some unused function parameters blame: add a test to cover blame_coalesce() blame: use the fingerprint heuristic to match ignored lines blame: add a fingerprint heuristic to match ignored lines blame: optionally track line fingerprints during fill_blame_origin() blame: add config options for the output of ignored or unblamable lines blame: add the ability to ignore commits and their changes blame: use a helper function in blame_chunk() Move oidset_parse_file() to oidset.c fsck: rename and touch up init_skiplist()	2019-07-19 11:30:20 -07:00
Jeff King	0ebbcf70e6	object: convert lookup_unknown_object() to use object_id There are no callers left of lookup_unknown_object() that aren't just passing us the "hash" member of a "struct object_id". Let's take the whole struct, which gets us closer to removing all raw sha1 variables. It also matches the existing conversions of lookup_blob(), etc. The conversions of callers were done by hand, but they're all mechanical one-liners. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2019-06-20 10:06:19 -07:00
Barret Rhoden	f93895f8fc	Move oidset_parse_file() to oidset.c Signed-off-by: Barret Rhoden <brho@google.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2019-05-16 11:36:23 +09:00
Barret Rhoden	24eb33ebc5	fsck: rename and touch up init_skiplist() init_skiplist() took a file consisting of SHA-1s and comments and added the objects to an oidset. This functionality is useful for other commands and will be moved to oidset.c in a future commit. In preparation for that move, this commit renames it to oidset_parse_file() to reflect its more generic usage and cleans up a few of the names. Signed-off-by: Barret Rhoden <brho@google.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2019-05-16 11:36:23 +09:00
Elijah Newren	5ec1e72823	Use 'unsigned short' for mode, like diff_filespec does struct diff_filespec defines mode to be an 'unsigned short'. Several other places in the API which we'd like to interact with using a diff_filespec used a plain unsigned (or unsigned int). This caused problems when taking addresses, so switch to unsigned short. Signed-off-by: Elijah Newren <newren@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2019-04-08 16:02:07 +09:00
brian m. carlson	ea82b2a085	tree-walk: store object_id in a separate member When parsing a tree, we read the object ID directly out of the tree buffer. This is normally fine, but such an object ID cannot be used with oidcpy, which copies GIT_MAX_RAWSZ bytes, because if we are using SHA-1, there may not be that many bytes to copy. Instead, store the object ID in a separate struct member. Since we can no longer efficiently compute the path length, store that information as well in struct name_entry. Ensure we only copy the object ID into the new buffer if the path length is nonzero, as some callers will pass us an empty path with no object ID following it, and we will not want to read past the end of the buffer. Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2019-01-15 09:57:41 -08:00
Junio C Hamano	da3e0752cd	Merge branch 'jc/cocci-preincr' Code cleanup. * jc/cocci-preincr: fsck: s/++i > 1/i++/ cocci: simplify "if (++u > 1)" to "if (u++)"	2018-10-30 15:43:48 +09:00
Junio C Hamano	b84c783882	fsck: s/++i > 1/i++/ Signed-off-by: Junio C Hamano <gitster@pobox.com>	2018-10-24 10:25:12 +09:00
Junio C Hamano	66ec2373fe	Merge branch 'ab/fsck-skiplist' Update fsck.skipList implementation and documentation. * ab/fsck-skiplist: fsck: support comments & empty lines in skipList fsck: use oidset instead of oid_array for skipList fsck: use strbuf_getline() to read skiplist file fsck: add a performance test for skipList fsck: add a performance test fsck: document that skipList input must be unabbreviated fsck: document and test commented & empty line skipList input fsck: document and test sorted skipList input fsck tests: add a test for no skipList input fsck tests: setup of bogus commit object	2018-10-10 12:37:16 +09:00
Junio C Hamano	1958ad504b	Sync with 2.18.1 * maint-2.18: Git 2.18.1 Git 2.17.2 fsck: detect submodule paths starting with dash fsck: detect submodule urls starting with dash Git 2.16.5 Git 2.15.3 Git 2.14.5 submodule-config: ban submodule paths that start with a dash submodule-config: ban submodule urls that start with dash submodule--helper: use "--" to signal end of clone options	2018-09-27 11:50:45 -07:00
Junio C Hamano	44f87dac99	Sync with 2.17.2 * maint-2.17: Git 2.17.2 fsck: detect submodule paths starting with dash fsck: detect submodule urls starting with dash Git 2.16.5 Git 2.15.3 Git 2.14.5 submodule-config: ban submodule paths that start with a dash submodule-config: ban submodule urls that start with dash submodule--helper: use "--" to signal end of clone options	2018-09-27 11:45:01 -07:00
Jeff King	1a7fd1fb29	fsck: detect submodule paths starting with dash As with urls, submodule paths with dashes are ignored by git, but may end up confusing older versions. Detecting them via fsck lets us prevent modern versions of git from being a vector to spread broken .gitmodules to older versions. Compared to blocking leading-dash urls, though, this detection may be less of a good idea: 1. While such paths provide confusing and broken results, they don't seem to actually work as option injections against anything except "cd". In particular, the submodule code seems to canonicalize to an absolute path before running "git clone" (so it passes /your/clone/-sub). 2. It's more likely that we may one day make such names actually work correctly. Even after we revert this fsck check, it will continue to be a hassle until hosting servers are all updated. On the other hand, it's not entirely clear that the behavior in older versions is safe. And if we do want to eventually allow this, we may end up doing so with a special syntax anyway (e.g., writing "./-sub" in the .gitmodules file, and teaching the submodule code to canonicalize it when comparing). So on balance, this is probably a good protection. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2018-09-27 11:41:31 -07:00
Jeff King	a124133e1e	fsck: detect submodule urls starting with dash Urls with leading dashes can cause mischief on older versions of Git. We should detect them so that they can be rejected by receive.fsckObjects, preventing modern versions of git from being a vector by which attacks can spread. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2018-09-27 11:41:26 -07:00
Ævar Arnfjörð Bjarmason	371a655074	fsck: support comments & empty lines in skipList It's annoying not to be able to put comments and empty lines in the skipList, when e.g. keeping a big central list of commits to skip in /etc/gitconfig, which was my motivation for `1362df0d41` ("fetch: implement fetch.fsck.*", 2018-07-27). Implement that, and document what version of Git this was changed in, since this on-disk format can be expected to be used by multiple versions of git. There is no notable performance impact from this change, using the test setup described a couple of commits back: Test HEAD~ HEAD ---------------------------------------------------------------------------------------- 1450.3: fsck with 0 skipped bad commits 7.69(7.27+0.42) 7.86(7.48+0.37) +2.2% 1450.5: fsck with 1 skipped bad commits 7.69(7.30+0.38) 7.83(7.47+0.36) +1.8% 1450.7: fsck with 10 skipped bad commits 7.76(7.38+0.38) 7.79(7.38+0.41) +0.4% 1450.9: fsck with 100 skipped bad commits 7.76(7.38+0.38) 7.74(7.36+0.38) -0.3% 1450.11: fsck with 1000 skipped bad commits 7.71(7.30+0.41) 7.72(7.34+0.38) +0.1% 1450.13: fsck with 10000 skipped bad commits 7.74(7.34+0.40) 7.72(7.34+0.38) -0.3% 1450.15: fsck with 100000 skipped bad commits 7.75(7.40+0.35) 7.70(7.29+0.40) -0.6% 1450.17: fsck with 1000000 skipped bad commits 7.12(6.86+0.26) 7.13(6.87+0.26) +0.1% Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2018-09-12 15:17:46 -07:00
René Scharfe	3b41fb0cb2	fsck: use oidset instead of oid_array for skipList Change the implementation of the skipList feature to use oidset instead of oid_array to store SHA-1s for later lookup. This list is parsed once on startup by fsck, fetch-pack or receive-pack depending on the *.skipList config in use. I.e. only once per invocation, but note that for "clone --recurse-submodules" each submodule will re-parse the list, in addition to the main project, and it will be re-parsed when checking .gitmodules blobs, see `fb16287719` ("fsck: check skiplist for object in fsck_blob()", 2018-06-27). Memory usage is a bit higher, but we don't need to keep track of the sort order anymore. Embed the oidset into struct fsck_options to make its ownership clear (no hidden sharing) and avoid unnecessary pointer indirection. The cumulative impact on performance of this & the preceding change, using the test setup described in the previous commit: Test HEAD~2 HEAD~ HEAD ---------------------------------------------------------------------------------------------------------------- 1450.3: fsck with 0 skipped bad commits 7.70(7.31+0.38) 7.72(7.33+0.38) +0.3% 7.70(7.30+0.40) +0.0% 1450.5: fsck with 1 skipped bad commits 7.84(7.47+0.37) 7.69(7.32+0.36) -1.9% 7.71(7.29+0.41) -1.7% 1450.7: fsck with 10 skipped bad commits 7.81(7.40+0.40) 7.94(7.57+0.36) +1.7% 7.92(7.55+0.37) +1.4% 1450.9: fsck with 100 skipped bad commits 7.81(7.42+0.38) 7.95(7.53+0.41) +1.8% 7.83(7.42+0.41) +0.3% 1450.11: fsck with 1000 skipped bad commits 7.99(7.62+0.36) 7.90(7.50+0.40) -1.1% 7.86(7.49+0.37) -1.6% 1450.13: fsck with 10000 skipped bad commits 7.98(7.57+0.40) 7.94(7.53+0.40) -0.5% 7.90(7.45+0.44) -1.0% 1450.15: fsck with 100000 skipped bad commits 7.97(7.57+0.39) 8.03(7.67+0.36) +0.8% 7.84(7.43+0.41) -1.6% 1450.17: fsck with 1000000 skipped bad commits 7.72(7.22+0.50) 7.28(7.07+0.20) -5.7% 7.13(6.87+0.25) -7.6% Helped-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com> Signed-off-by: Rene Scharfe <l.s.r@web.de> Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2018-09-12 15:17:46 -07:00
René Scharfe	fb8952077d	fsck: use strbuf_getline() to read skiplist file The buffer is unlikely to contain a NUL character, so printing its contents using %s in a die() format is unsafe (detected with ASan). Use an idiomatic strbuf_getline() loop instead, which ensures the buffer is always NUL-terminated, supports CRLF files as well, accepts files without a newline after the last line, supports any hash length automatically, and is shorter. This fixes a bug where emitting an error about an invalid line on say line 1 would continue printing subsequent lines, and usually continue into uninitialized memory. The performance impact of this, on a CentOS 7 box with RedHat GCC 4.8.5-28: $ GIT_PERF_REPEAT_COUNT=5 GIT_PERF_MAKE_OPTS='-j56 CFLAGS="-O3"' ./run HEAD~ HEAD p1451-fsck-skip-list.sh Test HEAD~ HEAD ---------------------------------------------------------------------------------------- 1450.3: fsck with 0 skipped bad commits 7.75(7.39+0.35) 7.68(7.29+0.39) -0.9% 1450.5: fsck with 1 skipped bad commits 7.70(7.30+0.40) 7.80(7.42+0.37) +1.3% 1450.7: fsck with 10 skipped bad commits 7.77(7.37+0.40) 7.87(7.47+0.40) +1.3% 1450.9: fsck with 100 skipped bad commits 7.82(7.41+0.40) 7.88(7.43+0.44) +0.8% 1450.11: fsck with 1000 skipped bad commits 7.88(7.49+0.39) 7.84(7.43+0.40) -0.5% 1450.13: fsck with 10000 skipped bad commits 8.02(7.63+0.39) 8.07(7.67+0.39) +0.6% 1450.15: fsck with 100000 skipped bad commits 8.01(7.60+0.41) 8.08(7.70+0.38) +0.9% 1450.17: fsck with 1000000 skipped bad commits 7.60(7.10+0.50) 7.37(7.18+0.19) -3.0% Helped-by: Jeff King <peff@peff.net> Signed-off-by: Rene Scharfe <l.s.r@web.de> Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2018-09-12 15:17:46 -07:00
Junio C Hamano	3a2a1dc170	Merge branch 'sb/object-store-lookup' lookup_commit_reference() and friends have been updated to find in-core object for a specific in-core repository instance. * sb/object-store-lookup: (32 commits) commit.c: allow lookup_commit_reference to handle arbitrary repositories commit.c: allow lookup_commit_reference_gently to handle arbitrary repositories tag.c: allow deref_tag to handle arbitrary repositories object.c: allow parse_object to handle arbitrary repositories object.c: allow parse_object_buffer to handle arbitrary repositories commit.c: allow get_cached_commit_buffer to handle arbitrary repositories commit.c: allow set_commit_buffer to handle arbitrary repositories commit.c: migrate the commit buffer to the parsed object store commit-slabs: remove realloc counter outside of slab struct commit.c: allow parse_commit_buffer to handle arbitrary repositories tag: allow parse_tag_buffer to handle arbitrary repositories tag: allow lookup_tag to handle arbitrary repositories commit: allow lookup_commit to handle arbitrary repositories tree: allow lookup_tree to handle arbitrary repositories blob: allow lookup_blob to handle arbitrary repositories object: allow lookup_object to handle arbitrary repositories object: allow object_as_type to handle arbitrary repositories tag: add repository argument to deref_tag tag: add repository argument to parse_tag_buffer tag: add repository argument to lookup_tag ...	2018-08-02 15:30:42 -07:00
Junio C Hamano	bd1a32d5c8	Merge branch 'jk/fsck-gitmodules-gently' Recent "security fix" to pay attention to contents of ".gitmodules" while accepting "git push" was a bit overly strict than necessary, which has been adjusted. * jk/fsck-gitmodules-gently: fsck: downgrade gitmodulesParse default to "info" fsck: split ".gitmodules too large" error from parse failure fsck: silence stderr when parsing .gitmodules config: add options parameter to git_config_from_mem config: add CONFIG_ERROR_SILENT handler config: turn die_on_error into caller-facing enum	2018-08-02 15:30:39 -07:00
Junio C Hamano	a9e7fe96cc	Merge branch 'rj/submodule-fsck-skip' "fsck.skipList" did not prevent a blob object listed there from being inspected for is contents (e.g. we recently started to inspect the contents of ".gitmodules" for certain malicious patterns), which has been corrected. * rj/submodule-fsck-skip: fsck: check skiplist for object in fsck_blob()	2018-07-24 14:50:42 -07:00
Junio C Hamano	00624d608c	Merge branch 'sb/object-store-grafts' The conversion to pass "the_repository" and then "a_repository" throughout the object access API continues. * sb/object-store-grafts: commit: allow lookup_commit_graft to handle arbitrary repositories commit: allow prepare_commit_graft to handle arbitrary repositories shallow: migrate shallow information into the object parser path.c: migrate global git_path_* to take a repository argument cache: convert get_graft_file to handle arbitrary repositories commit: convert read_graft_file to handle arbitrary repositories commit: convert register_commit_graft to handle arbitrary repositories commit: convert commit_graft_pos() to handle arbitrary repositories shallow: add repository argument to is_repository_shallow shallow: add repository argument to check_shallow_file_for_update shallow: add repository argument to register_shallow shallow: add repository argument to set_alternate_shallow_file commit: add repository argument to lookup_commit_graft commit: add repository argument to prepare_commit_graft commit: add repository argument to read_graft_file commit: add repository argument to register_commit_graft commit: add repository argument to commit_graft_pos object: move grafts to object parser object-store: move object access functions to object-store.h	2018-07-18 12:20:28 -07:00
Jeff King	64eb14d310	fsck: downgrade gitmodulesParse default to "info" We added an fsck check in `ed8b10f631` (fsck: check .gitmodules content, 2018-05-02) as a defense against the vulnerability from `0383bbb901` (submodule-config: verify submodule names as paths, 2018-04-30). With the idea that up-to-date hosting sites could protect downstream unpatched clients that fetch from them. As part of that defense, we reject any ".gitmodules" entry that is not syntactically valid. The theory is that if we cannot even parse the file, we cannot accurately check it for vulnerabilities. And anybody with a broken .gitmodules file would eventually want to know anyway. But there are a few reasons this is a bad tradeoff in practice: - for this particular vulnerability, the client has to be able to parse the file. So you cannot sneak an attack through using a broken file, assuming the config parsers for the process running fsck and the eventual victim are functionally equivalent. - a broken .gitmodules file is not necessarily a problem. Our fsck check detects .gitmodules in _any_ tree, not just at the root. And the presence of a .gitmodules file does not necessarily mean it will be used; you'd have to also have gitlinks in the tree. The cgit repository, for example, has a file named .gitmodules from a pre-submodule attempt at sharing code, but does not actually have any gitlinks. - when the fsck check is used to reject a push, it's often hard to work around. The pusher may not have full control over the destination repository (e.g., if it's on a hosting server, they may need to contact the hosting site's support). And the broken .gitmodules may be too far back in history for rewriting to be feasible (again, this is an issue for cgit). So we're being unnecessarily restrictive without actually improving the security in a meaningful way. It would be more convenient to downgrade this check to "info", which means we'd still comment on it, but not reject a push. Site admins can already do this via config, but we should ship sensible defaults. There are a few counterpoints to consider in favor of keeping the check as an error: - the first point above assumes that the config parsers for the victim and the fsck process are equivalent. This is pretty true now, but as time goes on will become less so. Hosting sites are likely to upgrade their version of Git, whereas vulnerable clients will be stagnant (if they did upgrade, they'd cease to be vulnerable!). So in theory we may see drift over time between what two config parsers will accept. In practice, this is probably OK. The config format is pretty established at this point and shouldn't change a lot. And the farther we get from the announcement of the vulnerability, the less interesting this extra layer of protection becomes. I.e., it was _most_ valuable on day 0, when everybody's client was still vulnerable and hosting sites could protect people. But as time goes on and people upgrade, the population of vulnerable clients becomes smaller and smaller. - In theory this could protect us from other vulnerabilities in the future. E.g., .gitmodules are the only way for a malicious repository to feed data to the config parser, so this check could similarly protect clients from a future (to-be-found) bug there. But that's trading a hypothetical case for real-world pain today. If we do find such a bug, the hosting site would need to be updated to fix it, too. At which point we could figure out whether it's possible to detect _just_ the malicious case without hurting existing broken-but-not-evil cases. - Until recently, we hadn't made any restrictions on .gitmodules content. So now in tightening that we're hitting cases where certain things used to work, but don't anymore. There's some moderate pain now. But as time goes on, we'll see more (and more varied) cases that will make tightening harder in the future. So there's some argument for putting rules in place _now_, before users grow more cases that violate them. Again, this is trading pain now for hypothetical benefit in the future. And if we try hard in the future to keep our tightening to a minimum (i.e., rejecting true maliciousness without hurting broken-but-not-evil repos), then that reduces even the hypothetical benefit. Considering both sets of arguments, it makes sense to loosen this check for now. Note that we have to tweak the test in t7415 since fsck will no longer consider this a fatal error. But we still check that it reports the warning, and that we don't get the spurious error from the config code. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2018-07-16 10:57:23 -07:00
Jeff King	0d68764d94	fsck: split ".gitmodules too large" error from parse failure Since `ed8b10f631` (fsck: check .gitmodules content, 2018-05-02), we'll report a gitmodulesParse error for two conditions: - a .gitmodules entry is not syntactically valid - a .gitmodules entry is larger than core.bigFileThreshold with the intent that we can detect malicious files and protect downstream clients. E.g., from the issue in `0383bbb901` (submodule-config: verify submodule names as paths, 2018-04-30). But these conditions are actually quite different with respect to that bug: - a syntactically invalid file cannot trigger the problem, as the victim would barf before hitting the problematic code - a too-big .gitmodules _can_ trigger the problem. Even though it is obviously silly to have a 500MB .gitmodules file, the submodule code will happily parse it if you have enough memory. So it may be reasonable to configure their severity separately. Let's add a new class for the "too large" case to allow that. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2018-07-16 10:57:22 -07:00
Ramsay Jones	fb16287719	fsck: check skiplist for object in fsck_blob() Since commit `ed8b10f631` ("fsck: check .gitmodules content", 2018-05-02), fsck will issue an error message for '.gitmodules' content that cannot be parsed correctly. This is the case, even when the corresponding blob object has been included on the skiplist. For example, using the cgit repository, we see the following: $ git fsck Checking object directories: 100% (256/256), done. error: bad config line 5 in blob .gitmodules error in blob 51dd1eff1edc663674df9ab85d2786a40f7ae3a5: gitmodulesParse: could not parse gitmodules blob Checking objects: 100% (6626/6626), done. $ $ git config fsck.skiplist '.git/skip' $ echo 51dd1eff1edc663674df9ab85d2786a40f7ae3a5 >.git/skip $ $ git fsck Checking object directories: 100% (256/256), done. error: bad config line 5 in blob .gitmodules Checking objects: 100% (6626/6626), done. $ Note that the error message issued by the config parser is still present, despite adding the object-id of the blob to the skiplist. One solution would be to provide a means of suppressing the messages issued by the config parser. However, given that (logically) we are asking fsck to ignore this object, a simpler approach is to just not call the config parser if the object is to be skipped. Add a check to the 'fsck_blob()' processing function, to determine if the object is on the skiplist and, if so, exit the function early. Signed-off-by: Ramsay Jones <ramsay@ramsayjones.plus.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2018-07-03 09:49:44 -07:00
Jeff King	de6bd9e3ea	fsck: silence stderr when parsing .gitmodules If there's a parsing error we'll already report it via the usual fsck report() function (or not, if the user has asked to skip this object or warning type). The error message from the config parser just adds confusion. Let's suppress it. Note that we didn't test this case at all, so I've added coverage in t7415. We may end up toning down or removing this fsck check in the future. So take this test as checking what happens now with a focus on stderr, and not any ironclad guarantee that we must detect and report parse failures in the future. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2018-07-03 09:36:41 -07:00
Jeff King	4574f1aace	config: add options parameter to git_config_from_mem The underlying config parser knows how to handle a config_options struct, but git_config_from_mem() always passes NULL. Let's allow our callers to specify the options struct. We could add a "_with_options" variant, but since there are only a handful of callers, let's just update them to pass NULL. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2018-07-03 09:36:06 -07:00
Stefan Beller	f86bcc7b2c	tree: add repository argument to lookup_tree Add a repository argument to allow the callers of lookup_tree to be more specific about which repository to act on. This is a small mechanical change; it doesn't change the implementation to handle repositories other than the_repository yet. As with the previous commits, use a macro to catch callers passing a repository other than the_repository at compile time. Signed-off-by: Jonathan Nieder <jrnieder@gmail.com> Signed-off-by: Stefan Beller <sbeller@google.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2018-06-29 10:43:38 -07:00
Stefan Beller	da14a7ff99	blob: add repository argument to lookup_blob Add a repository argument to allow the callers of lookup_blob to be more specific about which repository to act on. This is a small mechanical change; it doesn't change the implementation to handle repositories other than the_repository yet. As with the previous commits, use a macro to catch callers passing a repository other than the_repository at compile time. Signed-off-by: Jonathan Nieder <jrnieder@gmail.com> Signed-off-by: Stefan Beller <sbeller@google.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2018-06-29 10:43:38 -07:00
Stefan Beller	109cd76dd3	object: add repository argument to parse_object Add a repository argument to allow the callers of parse_object to be more specific about which repository to act on. This is a small mechanical change; it doesn't change the implementation to handle repositories other than the_repository yet. As with the previous commits, use a macro to catch callers passing a repository other than the_repository at compile time. Signed-off-by: Jonathan Nieder <jrnieder@gmail.com> Signed-off-by: Stefan Beller <sbeller@google.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2018-06-29 10:43:38 -07:00
Junio C Hamano	b16b60f71b	Merge branch 'sb/object-store-grafts' into sb/object-store-lookup * sb/object-store-grafts: commit: allow lookup_commit_graft to handle arbitrary repositories commit: allow prepare_commit_graft to handle arbitrary repositories shallow: migrate shallow information into the object parser path.c: migrate global git_path_* to take a repository argument cache: convert get_graft_file to handle arbitrary repositories commit: convert read_graft_file to handle arbitrary repositories commit: convert register_commit_graft to handle arbitrary repositories commit: convert commit_graft_pos() to handle arbitrary repositories shallow: add repository argument to is_repository_shallow shallow: add repository argument to check_shallow_file_for_update shallow: add repository argument to register_shallow shallow: add repository argument to set_alternate_shallow_file commit: add repository argument to lookup_commit_graft commit: add repository argument to prepare_commit_graft commit: add repository argument to read_graft_file commit: add repository argument to register_commit_graft commit: add repository argument to commit_graft_pos object: move grafts to object parser object-store: move object access functions to object-store.h	2018-06-29 10:43:28 -07:00
Junio C Hamano	ebaf0a56f3	Merge branch 'nd/complete-config-vars' Continuing with the idea to programatically enumerate various pieces of data required for command line completion, teach the codebase to report the list of configuration variables subcommands care about to help complete them. * nd/complete-config-vars: completion: complete general config vars in two steps log-tree: allow to customize 'grafted' color completion: support case-insensitive config vars completion: keep other config var completion in camelCase completion: drop the hard coded list of config vars am: move advice.amWorkDir parsing back to advice.c advice: keep config name in camelCase in advice_config[] fsck: produce camelCase config key names help: add --config to list all available config fsck: factor out msg_id_info[] lazy initialization code grep: keep all colors in an array Add and use generic name->id mapping code for color slot parsing	2018-06-25 13:22:38 -07:00
Junio C Hamano	fb6ac9e79a	Merge branch 'jk/submodule-fsck-loose-fixup' Finishing touches to a topic that already is in 'maint'. * jk/submodule-fsck-loose-fixup: fsck: avoid looking at NULL blob->object t7415: don't bother creating commit for symlink test	2018-06-13 12:50:44 -07:00
Jeff King	47cc91310a	fsck: avoid looking at NULL blob->object Commit `159e7b080b` (fsck: detect gitmodules files, 2018-05-02) taught fsck to look at the content of .gitmodules files. If the object turns out not to be a blob at all, we just complain and punt on checking the content. And since this was such an obvious and trivial code path, I didn't even bother to add a test. Except it _does_ do one non-trivial thing, which is call the report() function, which wants us to pass a pointer to a "struct object". Which we don't have (we have only a "struct object_id"). So we erroneously pass a NULL object to report(), which gets dereferenced and causes a segfault. It seems like we could refactor report() to just take the object_id itself. But we pass the object pointer along to a callback function, and indeed this ends up in builtin/fsck.c's objreport() which does want to look at other parts of the object (like the type). So instead, let's just use lookup_unknown_object() to get the real "struct object", and pass that. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2018-06-11 10:56:06 -07:00
Junio C Hamano	42c8ce1c49	Merge branch 'bc/object-id' Conversion from uchar[20] to struct object_id continues. * bc/object-id: (42 commits) merge-one-file: compute empty blob object ID add--interactive: compute the empty tree value Update shell scripts to compute empty tree object ID sha1_file: only expose empty object constants through git_hash_algo dir: use the_hash_algo for empty blob object ID sequencer: use the_hash_algo for empty tree object ID cache-tree: use is_empty_tree_oid sha1_file: convert cached object code to struct object_id builtin/reset: convert use of EMPTY_TREE_SHA1_BIN builtin/receive-pack: convert one use of EMPTY_TREE_SHA1_HEX wt-status: convert two uses of EMPTY_TREE_SHA1_HEX submodule: convert several uses of EMPTY_TREE_SHA1_HEX sequencer: convert one use of EMPTY_TREE_SHA1_HEX merge: convert empty tree constant to the_hash_algo builtin/merge: switch tree functions to use object_id builtin/am: convert uses of EMPTY_TREE_SHA1_BIN to the_hash_algo sha1-file: add functions for hex empty tree and blob OIDs builtin/receive-pack: avoid hard-coded constants for push certs diff: specify abbreviation size in terms of the_hash_algo upload-pack: replace use of several hard-coded constants ...	2018-05-30 14:04:10 +09:00
Junio C Hamano	7913f53b56	Sync with Git 2.17.1 * maint: (25 commits) Git 2.17.1 Git 2.16.4 Git 2.15.2 Git 2.14.4 Git 2.13.7 fsck: complain when .gitmodules is a symlink index-pack: check .gitmodules files with --strict unpack-objects: call fsck_finish() after fscking objects fsck: call fsck_finish() after fscking objects fsck: check .gitmodules content fsck: handle promisor objects in .gitmodules check fsck: detect gitmodules files fsck: actually fsck blob data fsck: simplify ".git" check index-pack: make fsck error message more specific verify_path: disallow symlinks in .gitmodules update-index: stat updated files earlier verify_dotfile: mention case-insensitivity in comment verify_path: drop clever fallthrough skip_prefix: add case-insensitive variant ...	2018-05-29 17:10:05 +09:00
Nguyễn Thái Ngọc Duy	a4a9cc19a2	fsck: produce camelCase config key names Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2018-05-29 14:51:28 +09:00
Nguyễn Thái Ngọc Duy	3ac68a93fd	help: add --config to list all available config Sometimes it helps to list all available config vars so the user can search for something they want. The config man page can also be used but it's harder to search if you want to focus on the variable name, for example. This is not the best way to collect the available config since it's not precise. Ideally we should have a centralized list of config in C code (pretty much like 'struct option'), but that's a lot more work. This will do for now. Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2018-05-29 14:51:28 +09:00
Nguyễn Thái Ngọc Duy	a46baac61e	fsck: factor out msg_id_info[] lazy initialization code This array will be used by some other function than parse_msg_id() in the following commit. Factor out this prep code so it could be called from that one. Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2018-05-29 14:51:28 +09:00
Junio C Hamano	c89b6e136e	Merge branch 'ds/lazy-load-trees' The code has been taught to use the duplicated information stored in the commit-graph file to learn the tree object name for a commit to avoid opening and parsing the commit object when it makes sense to do so. * ds/lazy-load-trees: coccinelle: avoid wrong transformation suggestions from commit.cocci commit-graph: lazy-load trees for commits treewide: replace maybe_tree with accessor methods commit: create get_commit_tree() method treewide: rename tree to maybe_tree	2018-05-23 14:38:13 +09:00
Jeff King	b7b1fca175	fsck: complain when .gitmodules is a symlink We've recently forbidden .gitmodules to be a symlink in verify_path(). And it's an easy way to circumvent our fsck checks for .gitmodules content. So let's complain when we see it. Signed-off-by: Jeff King <peff@peff.net>	2018-05-21 23:55:12 -04:00
Jeff King	ed8b10f631	fsck: check .gitmodules content This patch detects and blocks submodule names which do not match the policy set forth in submodule-config. These should already be caught by the submodule code itself, but putting the check here means that newer versions of Git can protect older ones from malicious entries (e.g., a server with receive.fsckObjects will block the objects, protecting clients which fetch from it). As a side effect, this means fsck will also complain about .gitmodules files that cannot be parsed (or were larger than core.bigFileThreshold). Signed-off-by: Jeff King <peff@peff.net>	2018-05-21 23:55:12 -04:00
Jeff King	2738744426	fsck: handle promisor objects in .gitmodules check If we have a tree that points to a .gitmodules blob but don't have that blob, we can't check its contents. This produces an fsck error when we encounter it. But in the case of a promisor object, this absence is expected, and we must not complain. Note that this can technically circumvent our transfer.fsckObjects check. Imagine a client fetches a tree, but not the matching .gitmodules blob. An fsck of the incoming objects will show that we don't have enough information. Later, we do fetch the actual blob. But we have no idea that it's a .gitmodules file. The only ways to get around this would be to re-scan all of the existing trees whenever new ones enter (which is expensive), or to somehow persist the gitmodules_found set between fsck runs (which is complicated). In practice, it's probably OK to ignore the problem. Any repository which has all of the objects (including the one serving the promisor packs) can perform the checks. Since promisor packs are inherently about a hierarchical topology in which clients rely on upstream repositories, those upstream repositories can protect all of their downstream clients from broken objects. Signed-off-by: Jeff King <peff@peff.net>	2018-05-21 23:55:12 -04:00

1 2 3 4

153 commits