development/git - HydraGit

mirror of https://github.com/git/git synced 2024-11-05 18:59:29 +00:00

Author	SHA1	Message	Date
Jonathan Tan	a7c28a2161	tests: ensure fsck fails on corrupt packfiles t1450-fsck.sh does not have a test that checks fsck's behavior when a packfile is invalid. It does have a test for when an object in a packfile is invalid, but in that test, the packfile itself is valid. Add such a test. Signed-off-by: Jonathan Tan <jonathantanmy@google.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2017-07-28 15:26:48 -07:00
Junio C Hamano	8f58a34cad	Merge branch 'js/fsck-name-object' Test fix. * js/fsck-name-object: t1450: use egrep for regexp "alternation"	2017-07-06 18:14:43 -07:00
Junio C Hamano	73fc2aadc7	t1450: use egrep for regexp "alternation" GNU grep allows "$A\\|B$" as alternation in BRE, but this is an extension not understood by some other implementations of grep (Michael Kebe reported an breakage on Solaris). Rewrite the offending test to ERE and use egrep instead. Noticed-by: Michael Kebe <michael.kebe@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2017-06-28 10:17:50 -07:00
Jeff Hostetler	4d9bc37fbe	t1450: avoid use of "sed" on the index, which is a binary file The previous step added a path zzzzzzzz to the index, and then used "sed" to replace this string to yyyyyyyy to create a test case where the checksum at the end of the file does not match the contents. Unfortunately, use of "sed" on a non-text file is not portable. Instead, use a Perl script that seeks to the end and modifies the last byte of the file (where we _know_ stores the trailing checksum). Signed-off-by: Jeff Hostetler <jeffhost@microsoft.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2017-04-27 09:41:19 +09:00
Jeff Hostetler	a33fc72fe9	read-cache: force_verify_index_checksum Teach git to skip verification of the SHA1-1 checksum at the end of the index file in verify_hdr() which is called from read_index() unless the "force_verify_index_checksum" global variable is set. Teach fsck to force this verification. The checksum verification is for detecting disk corruption, and for small projects, the time it takes to compute SHA-1 is not that significant, but for gigantic repositories this calculation adds significant time to every command. These effect can be seen using t/perf/p0002-read-cache.sh: Test HEAD~1 HEAD -------------------------------------------------------------------------------------- 0002.1: read_cache/discard_cache 1000 times 0.66(0.44+0.20) 0.30(0.27+0.02) -54.5% Signed-off-by: Jeff Hostetler <jeffhost@microsoft.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2017-04-15 00:58:36 -07:00
Junio C Hamano	4ba6197887	Merge branch 'jk/fsck-connectivity-check-fix' "git fsck --connectivity-check" was not working at all. * jk/fsck-connectivity-check-fix: fsck: lazily load types under --connectivity-only fsck: move typename() printing to its own function t1450: use "mv -f" within loose object directory fsck: check HAS_OBJ more consistently fsck: do not fallback "git fsck <bogus>" to "git fsck" fsck: tighten error-checks of "git fsck <head>" fsck: prepare dummy objects for --connectivity-check fsck: report trees as dangling t1450: clean up sub-objects in duplicate-entry test	2017-01-31 13:15:01 -08:00
Jeff King	c20d4d702f	t1450: use "mv -f" within loose object directory The loose objects are created with mode 0444. That doesn't prevent them being overwritten by rename(), but some versions of "mv" will be extra careful and prompt the user, even without "-i". Reportedly macOS does this, at least in the Travis builds. The prompt reads from /dev/null, defaulting to "no", and the object isn't moved. Then to make matters even more interesting, it still returns "0" and the rest of the test proceeds, but with a broken setup. We can work around it by using "mv -f" to override the prompt. This should work as it's already used in t5504 for the same purpose. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2017-01-25 12:32:32 -08:00
Jeff King	c3271a0e47	fsck: do not fallback "git fsck <bogus>" to "git fsck" Since fsck tries to continue as much as it can after seeing an error, we still do the reachability check even if some heads we were given on the command-line are bogus. But if _none_ of the heads is is valid, we fallback to checking all refs and the index, which is not what the user asked for at all. Instead of checking "heads", the number of successful heads we got, check "argc" (which we know only has non-options in it, because parse_options removed the others). Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2017-01-17 14:24:33 -08:00
Jeff King	c6c7b16d23	fsck: tighten error-checks of "git fsck <head>" Instead of checking reachability from the refs, you can ask fsck to check from a particular set of heads. However, the error checking here is quite lax. In particular: 1. It claims lookup_object() will report an error, which is not true. It only does a hash lookup, and the user has no clue that their argument was skipped. 2. When either the name or sha1 cannot be resolved, we continue to exit with a successful error code, even though we didn't check what the user asked us to. This patch fixes both of these cases. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2017-01-17 14:24:33 -08:00
Jeff King	3e3f8bd608	fsck: prepare dummy objects for --connectivity-check Normally fsck makes a pass over all objects to check their integrity, and then follows up with a reachability check to make sure we have all of the referenced objects (and to know which ones are dangling). The latter checks for the HAS_OBJ flag in obj->flags to see if we found the object in the first pass. Commit `02976bf85` (fsck: introduce `git fsck --connectivity-only`, 2015-06-22) taught fsck to skip the initial pass, and to fallback to has_sha1_file() instead of the HAS_OBJ check. However, it converted only one HAS_OBJ check to use has_sha1_file(). But there are many other places in builtin/fsck.c that assume that the flag is set (or that lookup_object() will return an object at all). This leads to several bugs with --connectivity-only: 1. mark_object() will not queue objects for examination, so recursively following links from commits to trees, etc, did nothing. I.e., we were checking the reachability of hardly anything at all. 2. When a set of heads is given on the command-line, we use lookup_object() to see if they exist. But without the initial pass, we assume nothing exists. 3. When loading reflog entries, we do a similar lookup_object() check, and complain that the reflog is broken if the object doesn't exist in our hash. So in short, --connectivity-only is broken pretty badly, and will claim that your repository is fine when it's not. Presumably nobody noticed for a few reasons. One is that the embedded test does not actually test the recursive nature of the reachability check. All of the missing objects are still in the index, and we directly check items from the index. This patch modifies the test to delete the index, which shows off breakage (1). Another is that --connectivity-only just skips the initial pass for loose objects. So on a real repository, the packed objects were still checked correctly. But on the flipside, it means that "git fsck --connectivity-only" still checks the sha1 of all of the packed objects, nullifying its original purpose of being a faster git-fsck. And of course the final problem is that the bug only shows up when there _is_ corruption, which is rare. So anybody running "git fsck --connectivity-only" proactively would assume it was being thorough, when it was not. One possibility for fixing this is to find all of the spots that rely on HAS_OBJ and tweak them for the connectivity-only case. But besides the risk that we might miss a spot (and I found three already, corresponding to the three bugs above), there are other parts of fsck that _can't_ work without a full list of objects. E.g., the list of dangling objects. Instead, let's make the connectivity-only case look more like the normal case. Rather than skip the initial pass completely, we'll do an abbreviated one that sets up the HAS_OBJ flag for each object, without actually loading the object data. That's simple and fast, and we don't have to care about the connectivity_only flag in the rest of the code at all. While we're at it, let's make sure we treat loose and packed objects the same (i.e., setting up dummy objects for both and skipping the actual sha1 check). That makes the connectivity-only check actually fast on a real repo (40 seconds versus 180 seconds on my copy of linux.git). Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2017-01-17 14:23:20 -08:00
Jeff King	b4584e4f66	fsck: report trees as dangling After checking connectivity, fsck looks through the list of any objects we've seen mentioned, and reports unreachable and un-"used" ones as dangling. However, it skips any object which is not marked as "parsed", as that is an object that we _don't_ have (but that somebody mentioned). Since `6e454b9a3` (clear parsed flag when we free tree buffers, 2013-06-05), that flag can't be relied on, and the correct method is to check the HAS_OBJ flag. The cleanup in that commit missed this callsite, though. As a result, we would generally fail to report dangling trees. We never noticed because there were no tests in this area (for trees or otherwise). Let's add some. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2017-01-17 12:49:41 -08:00
Jeff King	1ada11ee62	t1450: clean up sub-objects in duplicate-entry test This test creates a multi-level set of trees, but its cleanup routine only removes the top-level tree. After the test finishes, the inner tree and the blob it points to remain, making the inner tree dangling. A later test ("cleaned up") verifies that we've removed any cruft and "git fsck" output is clean. This passes only because of a bug in git-fsck which fails to notice dangling trees. In preparation for fixing the bug, let's teach this earlier test to clean up after itself correctly. We have to remove the inner tree (and therefore the blob, too, which becomes dangling after removing that tree). Since the setup code happens inside a subshell, we can't just set a variable for each object. However, we can stuff all of the sha1s into the $T output variable, which is not used for anything except cleanup. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2017-01-17 12:49:41 -08:00
Jeff King	cce044df7f	fsck: detect trailing garbage in all object types When a loose tree or commit is read by fsck (or any git program), unpack_sha1_rest() checks whether there is extra cruft at the end of the object file, after the zlib data. Blobs that are streamed, however, do not have this check. For normal git operations, it's not a big deal. We know the sha1 and size checked out, so we have the object bytes we wanted. The trailing garbage doesn't affect what we're trying to do. But since the point of fsck is to find corruption or other problems, it should be more thorough. This patch teaches its loose-sha1 reader to detect extra bytes after the zlib stream and complain. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2017-01-15 15:59:03 -08:00
Jeff King	c68b489e56	fsck: parse loose object paths directly When we iterate over the list of loose objects to check, we get the actual path of each object. But we then throw it away and pass just the sha1 to fsck_sha1(), which will do a fresh lookup. Usually it would find the same object, but it may not if an object exists both as a loose and a packed object. We may end up checking the packed object twice, and never look at the loose one. In practice this isn't too terrible, because if fsck doesn't complain, it means you have at least one good copy. But since the point of fsck is to look for corruption, we should be thorough. The new read_loose_object() interface can help us get the data from disk, and then we replace parse_object() with parse_object_buffer(). As a bonus, our error messages now mention the path to a corrupted object, which should make it easier to track down errors when they do happen. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2017-01-15 15:59:03 -08:00
Jeff King	118e6cead4	t1450: test fsck of packed objects The code paths in fsck for packed and loose objects are quite different, and it is not immediately obvious that the packed case behaves well. In particular: 1. The fsck_loose() function always returns "0" to tell the iterator to keep checking more objects. Whereas fsck_obj_buffer() (which handles packed objects) returns -1. This is OK, because the callback machinery for verify_pack() does not stop when it sees a non-zero return. 2. The fsck_loose() function sets the ERROR_OBJECT bit when fsck_obj() fails, whereas fsck_obj_buffer() sets it only when it sees a corrupt object. This turns out not to matter. We don't actually do anything with this bit except exit the program with a non-zero code, and that is handled already by the non-zero return from the function. So there are no bugs here, but it was certainly confusing to me. And we do not test either of the properties in t1450 (neither that a non-corruption error will caused a non-zero exit for a packed object, nor that we keep going after seeing the first error). Let's test both of those conditions, so that we'll notice if any of those assumptions becomes invalid. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2017-01-15 15:59:03 -08:00
Jeff King	771e7d578e	sha1_file: fix error message for alternate objects When we fail to open a corrupt loose object, we report an error and mention the filename via sha1_file_name(). However, that function will always give us a path in the local repository, whereas the corrupt object may have come from an alternate. The result is a very misleading error message. Teach the open_sha1_file() and stat_sha1_file() helpers to pass back the path they found, so that we can report it correctly. Note that the pointers we return go to static storage (e.g., from sha1_file_name()), which is slightly dangerous. However, these helpers are static local helpers, and the names are used for immediately generating error messages. The simplicity is an acceptable tradeoff for the danger. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2017-01-15 15:59:03 -08:00
Jeff King	0b20f1a266	t1450: refactor loose-object removal Commit `90cf590f5` (fsck: optionally show more helpful info for broken links, 2016-07-17) added a remove_loose_object() helper, but we already had a remove_object() helper that did the same thing. Let's combine these into one. The implementations had a few subtle differences, so I've tried to take the best of both: - the original used "sed", but the newer version avoids spawning an extra process - the original processed "$*", which was nonsense, as it assumed only a single sha1. Use "$1" to make that more clear. - the newer version ran an extra rev-parse, but it was not necessary; it's sole caller already converted the argument into a raw sha1 - the original used "rm -f", whereas the new one uses "rm". The latter is better because it may notice a bug or other unexpected failure in the test. (The original does check that the object exists before we remove it, which is good, but that's a subset of the possible unexpected conditions). Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2017-01-15 15:59:02 -08:00
David Turner	8354fa3d4c	fsck: handle bad trees like other errors Instead of dying when fsck hits a malformed tree object, log the error like any other and continue. Now fsck can tell the user which tree is bad, too. Signed-off-by: David Turner <dturner@twosigma.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2016-09-27 14:09:10 -07:00
Johannes Schindelin	90cf590f53	fsck: optionally show more helpful info for broken links When reporting broken links between commits/trees/blobs, it would be quite helpful at times if the user would be told how the object is supposed to be reachable. With the new --name-objects option, git-fsck will try to do exactly that: name the objects in a way that shows how they are reachable. For example, when some reflog got corrupted and a blob is missing that should not be, the user might want to remove the corresponding reflog entry. This option helps them find that entry: `git fsck` will now report something like this: broken link from tree b5eb6ff... (refs/stash@{<date>}~37:) to blob ec5cf80... Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2016-07-18 15:15:59 -07:00
Junio C Hamano	6d2d780f63	fsck: detect and warn a commit with embedded NUL Even though a Git commit object is designed to be capable of storing any binary data as its payload, in practice people use it to describe the changes in textual form, and tools like "git log" are designed to treat the payload as text. Detect and warn when we see any commit object with a NUL byte in it. Note that a NUL byte in the header part is already detected as a grave error. This change is purely about the message part. Signed-off-by: Junio C Hamano <gitster@pobox.com>	2016-05-10 10:02:06 -07:00
René Scharfe	8a272f291a	fsck: treat a NUL in a tag header as an error We check the return value of verify_header() for commits already, so do the same for tags as well. Signed-off-by: Rene Scharfe <l.s.r@web.de> Signed-off-by: Jeff King <peff@peff.net>	2015-11-20 08:02:07 -05:00
René Scharfe	80c7f5a011	t1450: add tests for NUL in headers of commits and tags Signed-off-by: Rene Scharfe <l.s.r@web.de> Signed-off-by: Jeff King <peff@peff.net>	2015-11-20 08:02:07 -05:00
Junio C Hamano	30ce3b3bbc	Merge branch 'jc/fsck-dropped-errors' There were some classes of errors that "git fsck" diagnosed to its standard error that did not cause it to exit with non-zero status. * jc/fsck-dropped-errors: fsck: exit with non-zero when problems are found	2015-10-15 15:43:35 -07:00
Junio C Hamano	122f76f574	fsck: exit with non-zero when problems are found After finding some problems (e.g. a ref refs/heads/X points at an object that is not a commit) and issuing an error message, the program failed to signal the fact that it found an error by a non-zero exit status. Signed-off-by: Junio C Hamano <gitster@pobox.com>	2015-09-23 14:29:28 -07:00
Johannes Schindelin	02976bf856	fsck: introduce `git fsck --connectivity-only` This option avoids unpacking each and all blob objects, and just verifies the connectivity. In particular with large repositories, this speeds up the operation, at the expense of missing corrupt blobs, ignoring unreachable objects and other fsck issues, if any. Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2015-06-23 14:27:37 -07:00
Johannes Schindelin	2becf00ff7	fsck: support demoting errors to warnings We already have support in `git receive-pack` to deal with some legacy repositories which have non-fatal issues. Let's make `git fsck` itself useful with such repositories, too, by allowing users to ignore known issues, or at least demote those issues to mere warnings. Example: `git -c fsck.missingEmail=ignore fsck` would hide problems with missing emails in author, committer and tagger lines. In the same spirit that `git receive-pack`'s usage of the fsck machinery differs from `git fsck`'s – some of the non-fatal warnings in `git fsck` are fatal with `git receive-pack` when receive.fsckObjects = true, for example – we strictly separate the fsck.<msg-id> from the receive.fsck.<msg-id> settings. Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2015-06-23 14:27:36 -07:00
Johannes Schindelin	71ab8fa840	fsck: report the ID of the error/warning Some repositories written by legacy code have objects with non-fatal fsck issues. To allow the user to ignore those issues, let's print out the ID (e.g. when encountering "missingEmail", the user might want to call `git config --add receive.fsck.missingEmail=warn`). Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2015-06-23 14:27:35 -07:00
Junio C Hamano	ee6e4c70f1	Merge branch 'maint' * maint: is_hfs_dotgit: loosen over-eager match of \u{..47}	2015-01-07 13:28:29 -08:00
Junio C Hamano	7ba46269a0	Merge branch 'maint-2.1' into maint * maint-2.1: is_hfs_dotgit: loosen over-eager match of \u{..47}	2015-01-07 13:28:10 -08:00
Junio C Hamano	3c84ac86fc	Merge branch 'maint-2.0' into maint-2.1 * maint-2.0: is_hfs_dotgit: loosen over-eager match of \u{..47}	2015-01-07 13:27:56 -08:00
Junio C Hamano	64a03e970a	Merge branch 'maint-1.8.5' into maint-1.9 * maint-1.8.5: is_hfs_dotgit: loosen over-eager match of \u{..47}	2015-01-07 13:27:13 -08:00
Jeff King	6aaf956b08	is_hfs_dotgit: loosen over-eager match of \u{..47} Our is_hfs_dotgit function relies on the hackily-implemented next_hfs_char to give us the next character that an HFS+ filename comparison would look at. It's hacky because it doesn't implement the full case-folding table of HFS+; it gives us just enough to see if the path matches ".git". At the end of next_hfs_char, we use tolower() to convert our 32-bit code point to lowercase. Our tolower() implementation only takes an 8-bit char, though; it throws away the upper 24 bits. This means we can't have any false negatives for is_hfs_dotgit. We only care about matching 7-bit ASCII characters in ".git", and we will correctly process 'G' or 'g'. However, we _can_ have false positives. Because we throw away the upper bits, code point \u{0147} (for example) will look like 'G' and get downcased to 'g'. It's not known whether a sequence of code points whose truncation ends up as ".git" is meaningful in any language, but it does not hurt to be more accurate here. We can just pass out the full 32-bit code point, and compare it manually to the upper and lowercase characters we care about. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2014-12-29 12:06:27 -08:00
Junio C Hamano	1cb4b3d380	Merge branch 'js/fsck-tag-validation' New tag object format validation added in 2.2 showed garbage after a tagname it reported in its error message. * js/fsck-tag-validation: index-pack: terminate object buffers with NUL fsck: properly bound "invalid tag name" error message	2014-12-22 12:27:41 -08:00
Junio C Hamano	77933f4449	Sync with v2.1.4 * maint-2.1: Git 2.1.4 Git 2.0.5 Git 1.9.5 Git 1.8.5.6 fsck: complain about NTFS ".git" aliases in trees read-cache: optionally disallow NTFS .git variants path: add is_ntfs_dotgit() helper fsck: complain about HFS+ ".git" aliases in trees read-cache: optionally disallow HFS+ .git variants utf8: add is_hfs_dotgit() helper fsck: notice .git case-insensitively t1450: refactor ".", "..", and ".git" fsck tests verify_dotfile(): reject .git case-insensitively read-tree: add tests for confusing paths like ".." and ".git" unpack-trees: propagate errors adding entries to the index	2014-12-17 11:46:57 -08:00
Junio C Hamano	58f1d950e3	Sync with v2.0.5 * maint-2.0: Git 2.0.5 Git 1.9.5 Git 1.8.5.6 fsck: complain about NTFS ".git" aliases in trees read-cache: optionally disallow NTFS .git variants path: add is_ntfs_dotgit() helper fsck: complain about HFS+ ".git" aliases in trees read-cache: optionally disallow HFS+ .git variants utf8: add is_hfs_dotgit() helper fsck: notice .git case-insensitively t1450: refactor ".", "..", and ".git" fsck tests verify_dotfile(): reject .git case-insensitively read-tree: add tests for confusing paths like ".." and ".git" unpack-trees: propagate errors adding entries to the index	2014-12-17 11:42:28 -08:00
Junio C Hamano	6898b79721	Sync with v1.8.5.6 * maint-1.8.5: Git 1.8.5.6 fsck: complain about NTFS ".git" aliases in trees read-cache: optionally disallow NTFS .git variants path: add is_ntfs_dotgit() helper fsck: complain about HFS+ ".git" aliases in trees read-cache: optionally disallow HFS+ .git variants utf8: add is_hfs_dotgit() helper fsck: notice .git case-insensitively t1450: refactor ".", "..", and ".git" fsck tests verify_dotfile(): reject .git case-insensitively read-tree: add tests for confusing paths like ".." and ".git" unpack-trees: propagate errors adding entries to the index	2014-12-17 11:20:31 -08:00
Johannes Schindelin	d08c13b947	fsck: complain about NTFS ".git" aliases in trees Now that the index can block pathnames that can be mistaken to mean ".git" on NTFS and FAT32, it would be helpful for fsck to notice such problematic paths. This lets servers which use receive.fsckObjects block them before the damage spreads. Note that the fsck check is always on, even for systems without core.protectNTFS set. This is technically more restrictive than we need to be, as a set of users on ext4 could happily use these odd filenames without caring about NTFS. However, on balance, it's helpful for all servers to block these (because the paths can be used for mischief, and servers which bother to fsck would want to stop the spread whether they are on NTFS themselves or not), and hardly anybody will be affected (because the blocked names are variants of .git or git~1, meaning mischief is almost certainly what the tree author had in mind). Ideally these would be controlled by a separate "fsck.protectNTFS" flag. However, it would be much nicer to be able to enable/disable _any_ fsck flag individually, and any scheme we choose should match such a system. Given the likelihood of anybody using such a path in practice, it is not unreasonable to wait until such a system materializes. Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2014-12-17 11:04:45 -08:00
Jeff King	a18fcc9ff2	fsck: complain about HFS+ ".git" aliases in trees Now that the index can block pathnames that case-fold to ".git" on HFS+, it would be helpful for fsck to notice such problematic paths. This lets servers which use receive.fsckObjects block them before the damage spreads. Note that the fsck check is always on, even for systems without core.protectHFS set. This is technically more restrictive than we need to be, as a set of users on ext4 could happily use these odd filenames without caring about HFS+. However, on balance, it's helpful for all servers to block these (because the paths can be used for mischief, and servers which bother to fsck would want to stop the spread whether they are on HFS+ themselves or not), and hardly anybody will be affected (because the blocked names are variants of .git with invisible Unicode code-points mixed in, meaning mischief is almost certainly what the tree author had in mind). Ideally these would be controlled by a separate "fsck.protectHFS" flag. However, it would be much nicer to be able to enable/disable _any_ fsck flag individually, and any scheme we choose should match such a system. Given the likelihood of anybody using such a path in practice, it is not unreasonable to wait until such a system materializes. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2014-12-17 11:04:45 -08:00
Jeff King	76e86fc6e3	fsck: notice .git case-insensitively We complain about ".git" in a tree because it cannot be loaded into the index or checked out. Since we now also reject ".GIT" case-insensitively, fsck should notice the same, so that errors do not propagate. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2014-12-17 11:04:39 -08:00
Jeff King	450870cba7	t1450: refactor ".", "..", and ".git" fsck tests We check that fsck notices and complains about confusing paths in trees. However, there are a few shortcomings: 1. We check only for these paths as file entries, not as intermediate paths (so ".git" and not ".git/foo"). 2. We check "." and ".." together, so it is possible that we notice only one and not the other. 3. We repeat a lot of boilerplate. Let's use some loops to be more thorough in our testing, and still end up with shorter code. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2014-12-17 11:04:39 -08:00
Jeff King	7add441984	fsck: properly bound "invalid tag name" error message When we detect an invalid tag-name header in a tag object, like, "tag foo bar\n", we feed the pointer starting at "foo bar" to a printf "%s" formatter. This shows the name, as we want, but then it keeps printing the rest of the tag buffer, rather than stopping at the end of the line. Our tests did not notice because they look only for the matching line, but the bug is that we print much more than we wanted to. So we also adjust the test to be more exact. Note that when fscking tags with "index-pack --strict", this is even worse. index-pack does not add a trailing NUL-terminator after the object, so we may actually read past the buffer and print uninitialized memory. Running t5302 with valgrind does notice the bug for that reason. Signed-off-by: Jeff King <peff@peff.net> Acked-by: Johannes Schindelin <johannes.schindelin@gmx.de> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2014-12-09 11:54:25 -08:00
Junio C Hamano	f190737f22	Merge branch 'jc/hash-object-fsck-tag' Using "hash-object --literally", test one of the new breakages js/fsck-tag-validation topic teaches "fsck" to catch is caught. * jc/hash-object-fsck-tag: t1450: make sure fsck detects a malformed tagger line	2014-09-26 14:39:44 -07:00
Junio C Hamano	13f4f04692	Merge branch 'js/fsck-tag-validation' Teach "git fsck" to inspect the contents of annotated tag objects. * js/fsck-tag-validation: Make sure that index-pack --strict checks tag objects Add regression tests for stricter tag fsck'ing fsck: check tag objects' headers Make sure fsck_commit_buffer() does not run out of the buffer fsck_object(): allow passing object data separately from the object itself Refactor type_from_string() to allow continuing after detecting an error	2014-09-26 14:39:43 -07:00
Junio C Hamano	b659605da6	t1450: make sure fsck detects a malformed tagger line With "hash-object --literally", write a tag object that is not supposed to pass one of the new checks added to "fsck", and make sure that the new check catches the breakage. Signed-off-by: Junio C Hamano <gitster@pobox.com>	2014-09-12 11:05:15 -07:00
Jeff King	30d1038d1b	fsck: return non-zero status on missing ref tips Fsck tries hard to detect missing objects, and will complain (and exit non-zero) about any inter-object links that are missing. However, it will not exit non-zero for any missing ref tips, meaning that a severely broken repository may still pass "git fsck && echo ok". The problem is that we use for_each_ref to iterate over the ref tips, which hides broken tips. It does at least print an error from the refs.c code, but fsck does not ever see the ref and cannot note the problem in its exit code. We can solve this by using for_each_rawref and noting the error ourselves. In addition to adding tests for this case, we add tests for all types of missing-object links (all of which worked, but which we were not testing). Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2014-09-12 10:45:49 -07:00
Johannes Schindelin	90e3e5f057	Add regression tests for stricter tag fsck'ing The intent of the new test case is to catch general breakages in the fsck_tag() function, not so much to test it extensively, trying to strike the proper balance between thoroughness and speed. While it would have been nice to test the code path where fsck_object() encounters an invalid tag object, this is not possible using git fsck: tag objects are parsed already before fsck'ing (and the parser already fails upon such objects). Even worse: we would not even be able write out invalid tag objects because git hash-object parses those objects, too, unless we resorted to really ugly hacks such as using something like this in the unit tests (essentially depending on Perl and Compress::Zlib): hash_invalid_object () { contents="$(printf '%s %d\0%s' "$1" ${#2} "$2")" && sha1=$(echo "$contents" \| test-sha1) && suffix=${sha1#??} && mkdir -p .git/objects/${sha1%$suffix} && echo "$contents" \| perl -MCompress::Zlib -e 'undef $/; print compress(<>)' \ > .git/objects/${sha1%$suffix}/$suffix && echo $sha1 } Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2014-09-11 14:19:09 -07:00
Jeff King	2e770fe47e	fsck: exit with non-zero status upon error from fsck_obj() Upon finding a corrupt loose object, we forgot to note the error to signal it with the exit status of the entire process. [jc: adjusted t1450 and added another test] Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2014-09-10 09:40:53 -07:00
Jeff King	d4b8de0420	fsck: report integer overflow in author timestamps When we check commit objects, we complain if commit->date is ULONG_MAX, which is an indication that we saw integer overflow when parsing it. However, we do not do any check at all for author lines, which also contain a timestamp. Let's actually check the timestamps on each ident line with strtoul. This catches both author and committer lines, and we can get rid of the now-redundant commit->date check. Note that like the existing check, we compare only against ULONG_MAX. Now that we are calling strtoul at the site of the check, we could be slightly more careful and also check that errno is set to ERANGE. However, this will make further refactoring in future patches a little harder, and it doesn't really matter in practice. For 32-bit systems, one would have to create a commit at the exact wrong second in 2038. But by the time we get close to that, all systems will hopefully have moved to 64-bit (and if they haven't, they have a real problem one second later). For 64-bit systems, by the time we get close to ULONG_MAX, all systems will hopefully have been consumed in the fiery wrath of our expanding Sun. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2014-02-24 10:12:58 -08:00
Jeff King	5c17f51270	fsck: warn about ".git" in trees Having a ".git" entry inside a tree can cause confusing results on checkout. At the top-level, you could not checkout such a tree, as it would complain about overwriting the real ".git" directory. In a subdirectory, you might check it out, but performing operations in the subdirectory would confusingly consider the in-tree ".git" directory as the repository. The regular git tools already make it hard to accidentally add such an entry to a tree, and do not allow such entries to enter the index at all. Teaching fsck about it provides an additional safety check, and let's us avoid propagating any such bogosity when transfer.fsckObjects is on. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2012-11-28 13:52:54 -08:00
Jeff King	5d34a4359d	fsck: warn about '.' and '..' in trees A tree with meta-paths like '.' or '..' does not work well with git; the index will refuse to load it or check it out to the filesystem (and even if we did not have that safety, it would look like we were overwriting an untracked directory). For the same reason, it is difficult to create such a tree with regular git. Let's warn about these dubious entries during fsck, just in case somebody has created a bogus tree (and this also lets us prevent them from propagating when transfer.fsckObjects is set). Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2012-11-28 10:41:08 -08:00

1 2

66 commits