Allow ignoring fsck errors on specific set of known-to-be-bad
objects, and also tweaking warning level of various kinds of non
critical breakages reported.
* js/fsck-opt:
fsck: support ignoring objects in `git fsck` via fsck.skiplist
fsck: git receive-pack: support excluding objects from fsck'ing
fsck: introduce `git fsck --connectivity-only`
fsck: support demoting errors to warnings
fsck: document the new receive.fsck.<msg-id> options
fsck: allow upgrading fsck warnings to errors
fsck: optionally ignore specific fsck issues completely
fsck: disallow demoting grave fsck errors to warnings
fsck: add a simple test for receive.fsck.<msg-id>
fsck: make fsck_tag() warn-friendly
fsck: handle multiple authors in commits specially
fsck: make fsck_commit() warn-friendly
fsck: make fsck_ident() warn-friendly
fsck: report the ID of the error/warning
fsck (receive-pack): allow demoting errors to warnings
fsck: offer a function to demote fsck errors to warnings
fsck: provide a function to parse fsck message IDs
fsck: introduce identifiers for fsck messages
fsck: introduce fsck options
"git fsck" used to ignore missing or invalid objects recorded in reflog.
* mh/fsck-reflog-entries:
fsck: report errors if reflog entries point at invalid objects
fsck_handle_reflog_sha1(): new function
Identical to support in `git receive-pack for the config option
`receive.fsck.skiplist`, we now support ignoring given objects in
`git fsck` via `fsck.skiplist` altogether.
This is extremely handy in case of legacy repositories where it would
cause more pain to change incorrect objects than to live with them
(e.g. a duplicate 'author' line in an early commit object).
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
This option avoids unpacking each and all blob objects, and just
verifies the connectivity. In particular with large repositories, this
speeds up the operation, at the expense of missing corrupt blobs,
ignoring unreachable objects and other fsck issues, if any.
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
We already have support in `git receive-pack` to deal with some legacy
repositories which have non-fatal issues.
Let's make `git fsck` itself useful with such repositories, too, by
allowing users to ignore known issues, or at least demote those issues
to mere warnings.
Example: `git -c fsck.missingEmail=ignore fsck` would hide
problems with missing emails in author, committer and tagger lines.
In the same spirit that `git receive-pack`'s usage of the fsck machinery
differs from `git fsck`'s – some of the non-fatal warnings in `git fsck`
are fatal with `git receive-pack` when receive.fsckObjects = true, for
example – we strictly separate the fsck.<msg-id> from the
receive.fsck.<msg-id> settings.
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Instead of specifying whether a message by the fsck machinery constitutes
an error or a warning, let's specify an identifier relating to the
concrete problem that was encountered. This is necessary for upcoming
support to be able to demote certain errors to warnings.
In the process, simplify the requirements on the calling code: instead of
having to handle full-blown varargs in every callback, we now send a
string buffer ready to be used by the callback.
We could use a simple enum for the message IDs here, but we want to
guarantee that the enum values are associated with the appropriate
message types (i.e. error or warning?). Besides, we want to introduce a
parser in the next commit that maps the string representation to the
enum value, hence we use the slightly ugly preprocessor construct that
is extensible for use with said parser.
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Just like the diff machinery, we are about to introduce more settings,
therefore it makes sense to carry them around as a (pointer to a) struct
containing all of them.
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Previously, if a reflog entry's old or new SHA-1 was not resolvable to
an object, that SHA-1 was silently ignored. Instead, report such cases
as errors.
Signed-off-by: Michael Haggerty <mhagger@alum.mit.edu>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
New function, extracted from fsck_handle_reflog_ent(). The extra
is_null_sha1() test for the new reference is currently unnecessary, as
reflogs are deleted when the reference itself is deleted. But it
doesn't hurt, either.
Signed-off-by: Michael Haggerty <mhagger@alum.mit.edu>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Signed-off-by: Michael Haggerty <mhagger@alum.mit.edu>
Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Change typedef each_ref_fn to take a "const struct object_id *oid"
parameter instead of "const unsigned char *sha1".
To aid this transition, implement an adapter that can be used to wrap
old-style functions matching the old typedef, which is now called
"each_ref_sha1_fn"), and make such functions callable via the new
interface. This requires the old function and its cb_data to be
wrapped in a "struct each_ref_fn_sha1_adapter", and that object to be
used as the cb_data for an adapter function, each_ref_fn_adapter().
This is an enormous diff, but most of it consists of simple,
mechanical changes to the sites that call any of the "for_each_ref"
family of functions. Subsequent to this change, the call sites can be
rewritten one by one to use the new interface.
Signed-off-by: Michael Haggerty <mhagger@alum.mit.edu>
Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
A replacement for contrib/workdir/git-new-workdir that does not
rely on symbolic links and make sharing of objects and refs safer
by making the borrowee and borrowers aware of each other.
* nd/multiple-work-trees: (41 commits)
prune --worktrees: fix expire vs worktree existence condition
t1501: fix test with split index
t2026: fix broken &&-chain
t2026 needs procondition SANITY
git-checkout.txt: a note about multiple checkout support for submodules
checkout: add --ignore-other-wortrees
checkout: pass whole struct to parse_branchname_arg instead of individual flags
git-common-dir: make "modules/" per-working-directory directory
checkout: do not fail if target is an empty directory
t2025: add a test to make sure grafts is working from a linked checkout
checkout: don't require a work tree when checking out into a new one
git_path(): keep "info/sparse-checkout" per work-tree
count-objects: report unused files in $GIT_DIR/worktrees/...
gc: support prune --worktrees
gc: factor out gc.pruneexpire parsing code
gc: style change -- no SP before closing parenthesis
checkout: clean up half-prepared directories in --to mode
checkout: reject if the branch is already checked out elsewhere
prune: strategies for linked checkouts
checkout: support checking out into a new working directory
...
This patch puts the usage info strings that were not already in docopt-
like format into docopt-like format, which will be a litle easier for
end users and a lot easier for translators. Changes include:
- Placing angle brackets around fill-in-the-blank parameters
- Putting dashes in multiword parameter names
- Adding spaces to [-f|--foobar] to make [-f | --foobar]
- Replacing <foobar>* with [<foobar>...]
Signed-off-by: Alex Henrie <alexhenrie24@gmail.com>
Reviewed-by: Matthieu Moy <Matthieu.Moy@imag.fr>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Before the previous commit, get_pathname returns an array of PATH_MAX
length. Even if git_path() and similar functions does not use the
whole array, git_path() caller can, in theory.
After the commit, get_pathname() may return a buffer that has just
enough room for the returned string and git_path() caller should never
write beyond that.
Make git_path(), mkpath() and git_path_submodule() return a const
buffer to make sure callers do not write in it at all.
This could have been part of the previous commit, but the "const"
conversion is too much distraction from the core changes in path.c.
Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
resolve_ref_unsafe takes a boolean argument for reading (a nonexistent ref
resolves successfully for writing but not for reading). Change this to be
a flags field instead, and pass the new constant RESOLVE_REF_READING when
we want this behaviour.
While at it, swap two of the arguments in the function to put output
arguments at the end. As a nice side effect, this ensures that we can
catch callers that were unaware of the new API so they can be audited.
Give the wrapper functions resolve_refdup and read_ref_full the same
treatment for consistency.
Signed-off-by: Ronnie Sahlberg <sahlberg@google.com>
Signed-off-by: Jonathan Nieder <jrnieder@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Teach "git fsck" to inspect the contents of annotated tag objects.
* js/fsck-tag-validation:
Make sure that index-pack --strict checks tag objects
Add regression tests for stricter tag fsck'ing
fsck: check tag objects' headers
Make sure fsck_commit_buffer() does not run out of the buffer
fsck_object(): allow passing object data separately from the object itself
Refactor type_from_string() to allow continuing after detecting an error
"git fsck" failed to report that it found corrupt objects via its
exit status in some cases.
* jk/fsck-exit-code-fix:
fsck: return non-zero status on missing ref tips
fsck: exit with non-zero status upon error from fsck_obj()
Fsck tries hard to detect missing objects, and will complain
(and exit non-zero) about any inter-object links that are
missing. However, it will not exit non-zero for any missing
ref tips, meaning that a severely broken repository may
still pass "git fsck && echo ok".
The problem is that we use for_each_ref to iterate over the
ref tips, which hides broken tips. It does at least print an
error from the refs.c code, but fsck does not ever see the
ref and cannot note the problem in its exit code. We can solve
this by using for_each_rawref and noting the error ourselves.
In addition to adding tests for this case, we add tests for
all types of missing-object links (all of which worked, but
which we were not testing).
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
When fsck'ing an incoming pack, we need to fsck objects that cannot be
read via read_sha1_file() because they are not local yet (and might even
be rejected if transfer.fsckobjects is set to 'true').
For commits, there is a hack in place: we basically cache commit
objects' buffers anyway, but the same is not true, say, for tag objects.
By refactoring fsck_object() to take the object buffer and size as
optional arguments -- optional, because we still fall back to the
previous method to look at the cached commit objects if the caller
passes NULL -- we prepare the machinery for the upcoming handling of tag
objects.
The assumption that such buffers are inherently NUL terminated is now
wrong, of course, hence we pass the size of the buffer so that we can
add a sanity check later, to prevent running past the end of the buffer.
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Upon finding a corrupt loose object, we forgot to note the error to
signal it with the exit status of the entire process.
[jc: adjusted t1450 and added another test]
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Both refs.c and fsck.c have their own private copies of the is_branch function.
Delete the is_branch function from fsck.c and make the version in refs.c
public.
Signed-off-by: Ronnie Sahlberg <sahlberg@google.com>
Reviewed-by: Jonathan Nieder <jrnieder@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
This converts two lines into one at each caller. But more
importantly, it abstracts the concept of freeing the buffer,
which will make it easier to change later.
Note that we also need to provide a "detach" mechanism for a
tricky case in index-pack. We are passed a buffer for the
object generated by processing the incoming pack. If we are
not using --strict, we just calculate the sha1 on that
buffer and return, leaving the caller to free it. But if we
are using --strict, we actually attach that buffer to an
object, pass the object to the fsck functions, and then
detach the buffer from the object again (so that the caller
can free it as usual). In this case, we don't want to free
the buffer ourselves, but just make sure it is no longer
associated with the commit.
Note that we are making the assumption here that the
attach/detach process does not impact the buffer at all
(e.g., it is never reallocated or modified). That holds true
now, and we have no plans to change that. However, as we
abstract the commit_buffer code, this dependency becomes
less obvious. So when we detach, let's also make sure that
we get back the same buffer that we gave to the
commit_buffer code.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
* mh/replace-refs-variable-rename:
Document some functions defined in object.c
Add docstrings for lookup_replace_object() and do_lookup_replace_object()
rename read_replace_refs to check_replace_refs
The semantics of this flag was changed in commit
e1111cef23 inline lookup_replace_object() calls
but wasn't renamed at the time to minimize code churn. Rename it now,
and add a comment explaining its use.
Signed-off-by: Michael Haggerty <mhagger@alum.mit.edu>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Leaving only the function definitions and declarations so that any
new topic in flight can still make use of the old functions, replace
existing uses of the prefixcmp() and suffixcmp() with new API
functions.
The change can be recreated by mechanically applying this:
$ git grep -l -e prefixcmp -e suffixcmp -- \*.c |
grep -v strbuf\\.c |
xargs perl -pi -e '
s|!prefixcmp\(|starts_with\(|g;
s|prefixcmp\(|!starts_with\(|g;
s|!suffixcmp\(|ends_with\(|g;
s|suffixcmp\(|!ends_with\(|g;
'
on the result of preparatory changes in this series.
Signed-off-by: Christian Couder <chriscool@tuxfamily.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
This task emerged from b04ba2bb (parse-options: deprecate OPT_BOOLEAN,
2011-09-27). All occurrences of the respective variables have
been reviewed and none of them relied on the counting up mechanism,
but all of them were using the variable as a true boolean.
This patch does not change semantics of any command intentionally.
Signed-off-by: Stefan Beller <stefanbeller@googlemail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Many code paths will free a tree object's buffer and set it
to NULL after finishing with it in order to keep memory
usage down during a traversal. However, out of 8 sites that
do this, only one actually unsets the "parsed" flag back.
Those sites that don't are setting a trap for later users of
the tree object; even after calling parse_tree, the buffer
will remain NULL, causing potential segfaults.
It is not known whether this is triggerable in the current
code. Most commands do not do an in-memory traversal
followed by actually using the objects again. However, it
does not hurt to be safe for future callers.
In most cases, we can abstract this out to a
"free_tree_buffer" helper. However, there are two
exceptions:
1. The fsck code relies on the parsed flag to know that we
were able to parse the object at one point. We can
switch this to using a flag in the "flags" field.
2. The index-pack code sets the buffer to NULL but does
not free it (it is freed by a caller). We should still
unset the parsed flag here, but we cannot use our
helper, as we do not want to free the buffer.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The source of this nonsense was
04d3975937 fsck: reduce stack footprint
, which wedged a pointer to parent into the object_array_entry's name
field. The parent pointer was passed to traverse_one_object(), even
though that function *didn't use it*.
The useless code has been deleted over time. Commit
a1cdc25172 fsck: drop unused parameter from traverse_one_object()
removed the parent pointer from traverse_one_object()'s
signature. Commit
c0aa335c95 Remove unused variables
removed the code that read the parent pointer back out of the name
field.
This commit takes the last step: don't write the parent pointer into
the name field in the first place.
Signed-off-by: Michael Haggerty <mhagger@alum.mit.edu>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The default output from "fsck" is often overwhelmed by informational
message on dangling objects, especially if you do not repack often, and a
real error can easily be buried.
Add "--no-dangling" option to omit them, and update the user manual to
demonstrate its use.
Based on a patch by Clemens Buchacher, but reverted the part to change
the default to --no-dangling, which is unsuitable for the first patch.
The usual three-step procedure to break the backward compatibility over
time needs to happen on top of this, if we were to go in that direction.
Signed-off-by: Junio C Hamano <gitster@pobox.com>
resolve_ref() may return a pointer to a shared buffer and can be
overwritten by the next resolve_ref() calls. Callers need to
pay attention, not to keep the pointer when the next call happens.
Rename with "_unsafe" suffix to warn developers (or reviewers) before
introducing new call sites.
This patch is generated using the following command
git grep -l 'resolve_ref(' -- '*.[ch]'|xargs sed -i 's/resolve_ref(/resolve_ref_unsafe(/g'
Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
fsck is usually a long process and it would be nice if it prints
progress from time to time.
Progress meter is not printed when --verbose is given because
--verbose prints a lot, there's no need for "alive" indicator.
Progress meter may provide "% complete" information but it would
be lost anyway in the flood of text.
Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
During verify_pack() all objects are read for SHA-1 check. Then
fsck_sha1() is called on every object, which read the object again
(fsck_sha1 -> parse_object -> read_sha1_file).
Avoid reading an object twice, do fsck_sha1 while we have an object
uncompressed data in verify_pack.
On git.git, with this patch I got:
$ /usr/bin/time ./git fsck >/dev/null
98.97user 0.90system 1:40.01elapsed 99%CPU (0avgtext+0avgdata 616624maxresident)k
0inputs+0outputs (0major+194186minor)pagefaults 0swaps
Without it:
$ /usr/bin/time ./git fsck >/dev/null
231.23user 2.35system 3:53.82elapsed 99%CPU (0avgtext+0avgdata 636688maxresident)k
0inputs+0outputs (0major+461629minor)pagefaults 0swaps
Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Asking fwrite() to write one item of size bytes results in fwrite()
reporting "I wrote zero item", when size is zero. Instead, we could
ask it to write "size" items of 1 byte and expect it to report that
"I wrote size items" when it succeeds, with any value of size,
including zero.
Noticed and reported by BJ Hargrave.
Signed-off-by: Junio C Hamano <gitster@pobox.com>
When there is a random garbage file whose name happens to be 38-byte
long in a .git/objects/??/ directory, the loop terminated prematurely
without marking all the other files that it hasn't checked in the
readdir() loop.
Treat such a file just like any other garbage file, and do not break out
of the readdir() loop.
While at it, replace repeated sprintf() calls to a single one outside the
loop.
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Allows better help text to be defined than "be verbose". Also make use
of the macro in places that already had a different description. No
object code changes intended.
Signed-off-by: Rene Scharfe <rene.scharfe@lsrfire.ath.cx>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
This shrinks the top-level directory a bit, and makes it much more
pleasant to use auto-completion on the thing. Instead of
[torvalds@nehalem git]$ em buil<tab>
Display all 180 possibilities? (y or n)
[torvalds@nehalem git]$ em builtin-sh
builtin-shortlog.c builtin-show-branch.c builtin-show-ref.c
builtin-shortlog.o builtin-show-branch.o builtin-show-ref.o
[torvalds@nehalem git]$ em builtin-shor<tab>
builtin-shortlog.c builtin-shortlog.o
[torvalds@nehalem git]$ em builtin-shortlog.c
you get
[torvalds@nehalem git]$ em buil<tab> [type]
builtin/ builtin.h
[torvalds@nehalem git]$ em builtin [auto-completes to]
[torvalds@nehalem git]$ em builtin/sh<tab> [type]
shortlog.c shortlog.o show-branch.c show-branch.o show-ref.c show-ref.o
[torvalds@nehalem git]$ em builtin/sho [auto-completes to]
[torvalds@nehalem git]$ em builtin/shor<tab> [type]
shortlog.c shortlog.o
[torvalds@nehalem git]$ em builtin/shortlog.c
which doesn't seem all that different, but not having that annoying
break in "Display all 180 possibilities?" is quite a relief.
NOTE! If you do this in a clean tree (no object files etc), or using an
editor that has auto-completion rules that ignores '*.o' files, you
won't see that annoying 'Display all 180 possibilities?' message - it
will just show the choices instead. I think bash has some cut-off
around 100 choices or something.
So the reason I see this is that I'm using an odd editory, and thus
don't have the rules to cut down on auto-completion. But you can
simulate that by using 'ls' instead, or something similar.
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>