Work to support a repository that work with both SHA-1 and SHA-256
hash algorithms has started.
* eb/hash-transition: (30 commits)
t1016-compatObjectFormat: add tests to verify the conversion between objects
t1006: test oid compatibility with cat-file
t1006: rename sha1 to oid
test-lib: compute the compatibility hash so tests may use it
builtin/ls-tree: let the oid determine the output algorithm
object-file: handle compat objects in check_object_signature
tree-walk: init_tree_desc take an oid to get the hash algorithm
builtin/cat-file: let the oid determine the output algorithm
rev-parse: add an --output-object-format parameter
repository: implement extensions.compatObjectFormat
object-file: update object_info_extended to reencode objects
object-file-convert: convert commits that embed signed tags
object-file-convert: convert commit objects when writing
object-file-convert: don't leak when converting tag objects
object-file-convert: convert tag objects when writing
object-file-convert: add a function to convert trees between algorithms
object: factor out parse_mode out of fast-import and tree-walk into in object.h
cache: add a function to read an OID of a specific algorithm
tag: sign both hashes
commit: export add_header_signature to support handling signatures on tags
...
To make it possible for git ls-tree to display the tree encoded
in the hash algorithm of the oid specified to git ls-tree, update
init_tree_desc to take as a parameter the oid of the tree object.
Update all callers of init_tree_desc and init_tree_desc_gently
to pass the oid of the tree object.
Use the oid of the tree object to discover the hash algorithm
of the oid and store that hash algorithm in struct tree_desc.
Use the hash algorithm in decode_tree_entry and
update_tree_entry_internal to handle reading a tree object encoded in
a hash algorithm that differs from the repositories hash algorithm.
Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Since the previous commit dropped the hard-coded limit in
traverse_trees(), we don't need this macro there anymore (the code can
handle any number of trees in parallel).
We do define MAX_UNPACK_TREES using MAX_TRAVERSE_TREES, due to
5290d45134 (tree-walk.c: break circular dependency with unpack-trees,
2020-02-01). So we can just directly define that as "8" now; we know
traverse_trees() can handle whatever we throw at it.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The tree_entry_interesting() function takes base_offset, allowing
its callers to potentially pass a non-zero number to skip the early
part of the path string.
The feature is never exercised and we do not even know what bugs are
lurking there, as all callers pass 0 to the parameter.
Signed-off-by: Junio C Hamano <gitster@pobox.com>
hash.h depends upon and includes repository.h, due to the definition and
use of the_hash_algo (defined as the_repository->hash_algo). However,
most headers trying to include hash.h are only interested in the layout
of the structs like object_id. Move the parts of hash.h that do not
depend upon repository.h into a new file hash-ll.h (the "low level"
parts of hash.h), and adjust other files to use this new header where
the convenience inline functions aren't needed.
This allows hash.h and object.h to be fairly small, minimal headers. It
also exposes a lot of hidden dependencies on both path.h (which was
brought in by repository.h) and repository.h (which was previously
implicitly brought in by object.h), so also adjust other files to be
more explicit about what they depend upon.
Signed-off-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
These defines and enum are all oid-related and as such seem to make
more sense being included in hash.h. Further, moving them there
allows us to remove some includes of cache.h in other files.
The change to line-log.h might look unrelated, but line-log.h includes
diffcore.h, which previously included cache.h, which included the
kitchen sink. Since this patch makes diffcore.h no longer include
cache.h, the compiler complains about the 'struct string_list *'
function parameter. Add a forward declaration for struct string_list to
address this.
Signed-off-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
When using init_tree_desc() and tree_entry() to iterate over a tree, we
always canonicalize the modes coming out of the tree. This is a good
thing to prevent bugs or oddities in normal code paths, but it's
counter-productive for tools like fsck that want to see the exact
contents.
We can address this by adding an option to avoid the extra
canonicalization. A few notes on the implementation:
- I've attached the new option to the tree_desc struct itself. The
actual code change is in decode_tree_entry(), which is in turn
called by the public update_tree_entry(), tree_entry(), and
init_tree_desc() functions, plus their "gently" counterparts.
By letting it ride along in the struct, we can avoid changing the
signature of those functions, which are called many times. Plus it's
conceptually simpler: you really want a particular iteration of a
tree to be "raw" or not, rather than individual calls.
- We still have to set the new option somewhere. The struct is
initialized by init_tree_desc(). I added the new flags field only to
the "gently" version. That avoids disturbing the much more numerous
non-gentle callers, and it makes sense that anybody being careful
about looking at raw modes would also be careful about bogus trees
(i.e., the caller will be something like fsck in the first place).
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The unpack-trees API depends on the tree-walk API. But we've recently
introduced a dependency in tree-walk.c on MAX_UNPACK_TREES, which
doesn't otherwise care about unpack-trees at all.
Let's break that dependency by reversing the constants: we'll introduce
a new MAX_TRAVERSE_TREES which belongs to the tree-walk API. And then we
can define MAX_UNPACK_TREES in terms of that (since unpack-trees cannot
possibly work with more trees than it can traverse at once via
tree-walk).
The value for both will remain at 8. This is somewhat arbitrary and
probably more than is necessary, per ca885a4fe6 (read-tree() and
unpack_trees(): use consistent limit, 2008-03-13), but there's not
really any pressing need to reduce it.
Suggested-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Jeff King <peff@peff.net>
Acked-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Move the documentation from Documentation/technical/api-tree-walking.txt
to tree-walk.h as it's easier for the developers to find the usage
information beside the code instead of looking for it in another doc file.
Documentation/technical/api-tree-walking.txt is removed because the
information it has is now redundant and it'll be hard to keep it up to
date and synchronized with the documentation in the header file.
Signed-off-by: Heba Waly <heba.waly@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Codepaths to walk tree objects have been audited for integer
overflows and hardened.
* jk/tree-walk-overflow:
tree-walk: harden make_traverse_path() length computations
tree-walk: add a strbuf wrapper for make_traverse_path()
tree-walk: accept a raw length for traverse_path_len()
tree-walk: use size_t consistently
tree-walk: drop oid from traverse_info
setup_traverse_info(): stop copying oid
The make_traverse_path() function isn't very careful about checking its
output buffer boundaries. In fact, it doesn't even _know_ the size of
the buffer it's writing to, and just assumes that the caller used
traverse_path_len() correctly. And even then we assume that our
traverse_info.pathlen components are all correct, and just blindly write
into the buffer.
Let's improve this situation a bit:
- have the caller pass in their allocated buffer length, which we'll
check against our own computations
- check for integer underflow as we do our backwards-insertion of
pathnames into the buffer
- check that we do not run out items in our list to traverse before
we've filled the expected number of bytes
None of these should be triggerable in practice (especially since our
switch to size_t everywhere in a previous commit), but it doesn't hurt
to check our assumptions.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
All but one of the callers of make_traverse_path() allocate a new heap
buffer to store the path. Let's give them an easy way to write to a
strbuf, which saves them from computing the length themselves (which is
especially tricky when they want to add to the path). It will also make
it easier for us to change the make_traverse_path() interface in a
future patch to improve its bounds-checking.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
We take a "struct name_entry", but only care about the length of the
path name. Let's just take that length directly, making it easier to use
the function from callers that sometimes do not have a name_entry at
all.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
We store and manipulate the cumulative traverse_info.pathlen as an
"int", which can overflow when we are fed ridiculously long pathnames
(e.g., ones at the edge of 2GB or 4GB, even if the individual tree entry
names are smaller than that). The results can be confusing, though
after some prodding I was not able to use this integer overflow to cause
an under-allocated buffer.
Let's consistently use size_t to generate and store these, and make
sure our addition doesn't overflow.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
As the previous commit shows, the presence of an oid in each level of
the traverse_info is confusing and ultimately not necessary. Let's drop
it to make it clear that it will not always be set (as well as convince
us that it's unused, and let the compiler catch any merges with other
branches that do add new uses).
Since the oid is part of name_entry, we'll actually stop embedding a
name_entry entirely, and instead just separately hold the pathname, its
length, and the mode.
This makes the resulting code slightly more verbose as we have to pass
those elements around individually. But it also makes it more clear what
each code path is going to use (and in most of the paths, we really only
care about the pathname itself).
A few of these conversions are noisier than they need to be, as they
also take the opportunity to rename "len" to "namelen" for clarity
(especially where we also have "pathlen" or "ce_len" alongside).
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
While at there, clean up the_repo usage in builtin/merge-tree.c a tiny
bit.
Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Mechanically and systematically drop "extern" from function
declarlation.
* dl/no-extern-in-func-decl:
*.[ch]: manually align parameter lists
*.[ch]: remove extern from function declarations using sed
*.[ch]: remove extern from function declarations using spatch
There has been a push to remove extern from function declarations.
Remove some instances of "extern" for function declarations which are
caught by Coccinelle. Note that Coccinelle has some difficulty with
processing functions with `__attribute__` or varargs so some `extern`
declarations are left behind to be dealt with in a future patch.
This was the Coccinelle patch used:
@@
type T;
identifier f;
@@
- extern
T f(...);
and it was run with:
$ git ls-files \*.{c,h} |
grep -v ^compat/ |
xargs spatch --sp-file contrib/coccinelle/noextern.cocci --in-place
Files under `compat/` are intentionally excluded as some are directly
copied from external sources and we should avoid churning them as much
as possible.
Signed-off-by: Denton Liu <liu.denton@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
struct diff_filespec defines mode to be an 'unsigned short'. Several
other places in the API which we'd like to interact with using a
diff_filespec used a plain unsigned (or unsigned int). This caused
problems when taking addresses, so switch to unsigned short.
Signed-off-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
"git cat-file --batch" reported a dangling symbolic link by
mistake, when it wanted to report that a given name is ambiguous.
* dt/cat-file-batch-ambiguous:
t1512: test ambiguous cat-file --batch and --batch-output
Do not print 'dangling' for cat-file in case of ambiguity
The code to walk tree objects has been taught that we may be
working with object names that are not computed with SHA-1.
* bc/tree-walk-oid:
cache: make oidcpy always copy GIT_MAX_RAWSZ bytes
tree-walk: store object_id in a separate member
match-trees: use hashcpy to splice trees
match-trees: compute buffer offset correctly when splicing
tree-walk: copy object ID before use
The return values -1 and -2 from get_oid could mean two different
things, depending on whether they were from an enum returned by
get_tree_entry_follow_symlinks, or from a different code path. This
caused 'dangling' to be printed from a git cat-file in the case of an
ambiguous (-2) result.
Unify the results of get_oid* and get_tree_entry_follow_symlinks to be
one common type, with unambiguous values.
Signed-off-by: David Turner <novalis@novalis.org>
Reported-by: Eric Wong <e@80x24.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
When parsing a tree, we read the object ID directly out of the tree
buffer. This is normally fine, but such an object ID cannot be used with
oidcpy, which copies GIT_MAX_RAWSZ bytes, because if we are using SHA-1,
there may not be that many bytes to copy.
Instead, store the object ID in a separate struct member. Since we can
no longer efficiently compute the path length, store that information as
well in struct name_entry. Ensure we only copy the object ID into the
new buffer if the path length is nonzero, as some callers will pass us
an empty path with no object ID following it, and we will not want to
read past the end of the buffer.
Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
In order to support :(attr) when matching pathspec on a tree,
tree_entry_interesting() needs to take an index (because
git_check_attr() needs it). This is the preparation step for it. This
also makes it clearer what index we fall back to when looking up
attributes during an unpack-trees operation: the source index.
This also fixes revs->pruning.repo initialization that should have
been done in 2abf350385 (revision.c: remove implicit dependency on
the_index - 2018-09-21). Without it, skip_uninteresting() will
dereference a NULL pointer through this call chain
get_revision(revs)
get_revision_internal
get_revision_1
try_to_simplify_commit
rev_compare_tree
diff_tree_oid(..., &revs->pruning)
ll_diff_tree_oid
diff_tree_paths
ll_diff_tree
skip_uninteresting
Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
I looped over the toplevel header files, creating a temporary two-line C
program for each consisting of
#include "git-compat-util.h"
#include $HEADER
This patch is the result of manually fixing errors in compiling those
tiny programs.
Signed-off-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Since the only caller of this function already uses struct object_id,
update get_tree_entry_follow_symlinks to use it in parameters and
internally.
Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Convert get_tree_entry and find_tree_entry to take pointers to struct
object_id.
Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
All callers of fill_tree_descriptor() have been converted to object_id
already, so convert that function as well. As a nice side-effect we get
rid of NULL checks in tree-diff.c, as fill_tree_descriptor() already
does them for us.
Helped-by: Johannes Sixt <j6t@kdbg.org>
Signed-off-by: Rene Scharfe <l.s.r@web.de>
Reviewed-by: brian m. carlson <sandals@crustytoothpaste.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Instead of dying when fsck hits a malformed tree object, log the error
like any other and continue. Now fsck can tell the user which tree is
bad, too.
Signed-off-by: David Turner <dturner@twosigma.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
In traverse_trees, we generate the complete traverse path for a
traverse_info. Later, in do_compare_entry, we used to go do a bunch
of work to compare the traverse_info to a cache_entry's name without
computing that path. But since we already have that path, we don't
need to do all that work. Instead, we can just put the generated
path into the traverse_info, and do the comparison more directly.
We copy the path because prune_traversal might mutate `base`. This
doesn't happen in any codepaths where do_compare_entry is called,
but it's better to be safe.
This makes git checkout much faster -- about 25% on Twitter's
monorepo. Deeper directory trees are likely to benefit more than
shallower ones.
Signed-off-by: David Turner <dturner@twopensource.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Add a new function, get_tree_entry_follow_symlinks, to tree-walk.[ch].
The function is not yet used. It will be used to implement git
cat-file --batch --follow-symlinks.
The function locates an object by path, following symlinks in the
repository. If the symlinks lead outside the repository, the function
reports this to the caller.
Signed-off-by: David Turner <dturner@twopensource.com>
Signed-off-by: Ramsay Jones <ramsay@ramsay1.demon.co.uk>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
This continues 4651ece8 (Switch over tree descriptors to contain a
pre-parsed entry) and moves the only rest computational part
mode = canon_mode(mode)
from tree_entry_extract() to tree entry decode phase - to
decode_tree_entry().
The reason to do it, is that canon_mode() is at least 2 conditional
jumps for regular files, and that could be noticeable should canon_mode()
be invoked several times.
That does not matter for current Git codebase, where typical tree
traversal is
while (t->size) {
sha1 = tree_entry_extract(t, &path, &mode);
...
update_tree_entry(t);
}
i.e. we do t -> sha1,path.mode "extraction" only once per entry. In such
cases, it does not matter performance-wise, where that mode
canonicalization is done - either once in tree_entry_extract(), or once
in decode_tree_entry() called by update_tree_entry() - it is
approximately the same.
But for future code, which could need to work with several tree_desc's
in parallel, it could be handy to operate on tree_desc descriptors, and
do "extracts" only when needed, or at all, access only relevant part of
it through structure fields directly.
And for such situations, having canon_mode() be done once in decode
phase is better - we won't need to pay the performance price of 2 extra
conditional jumps on every t->mode access.
So let's move mode canonicalization to decode_tree_entry(). That was the
final bit. Now after tree entry is decoded, it is fully ready and could
be accessed either directly via field, or through tree_entry_extract()
which this time got really "totally trivial".
Signed-off-by: Kirill Smelkov <kirr@mns.spb.ru>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
If o->merge is set, the struct traverse_info member conflicts is shifted
left in unpack_callback, then passed through traverse_trees_recursive
to unpack_nondirectories, where it is shifted right before use. Stop
the shifting and just pass the conflict bit mask as is. Rename the
member to df_conflicts to prove that it isn't used anywhere else.
Signed-off-by: René Scharfe <rene.scharfe@lsrfire.ath.cx>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
It is a basic code hygiene to avoid magic constants that are unnamed.
Besides, this helps extending the value later on for "interesting, but
cannot decide if the entry truely matches yet" (ie. prefix matches)
Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
tree_entry_len() does not simply take two random arguments and return
a tree length. The two pointers must point to a tree item structure,
or struct name_entry. Passing random pointers will return incorrect
value.
Force callers to pass struct name_entry instead of two pointers (with
hope that they don't manually construct struct name_entry themselves)
Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The traverse_trees() machinery is primarily meant for merging two (or
more) trees, and because a merge is a full tree operation, it doesn't
support any pruning with pathspec.
Since d1f2d7e (Make run_diff_index() use unpack_trees(), not read_tree(),
2008-01-19), however, we use unpack_trees() to traverse_trees() callchain
to perform "diff-index", which could waste a lot of work traversing trees
outside the user-supplied pathspec, only to discard at the blob comparison
level in diff-lib.c::oneway_diff() which is way too late.
Signed-off-by: Junio C Hamano <gitster@pobox.com>
never_interesting optimization is disabled if there is any wildcard
pathspec, even if it only matches exactly on trees.
Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
In traversing trees, a full path is splitted into two parts: base
directory and entry. They are however quite often concatenated
whenever a full path is needed. Current code allocates a new buffer,
do two memcpy(), use it, then release.
Instead this patch turns "base" to a writable, extendable buffer. When
a concatenation is needed, the callee only needs to append "entry" to
base, use it, then truncate the entry out again. "base" must remain
unchanged before and after entering a function.
This avoids quite a bit of malloc() and memcpy().
Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
There was a code comment that referred to the "above two functions" but
over time the functions immediately preceding the comment have changed.
Just mention the relevant functions by name.
Signed-off-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
When an error is encountered, it calls add_rejected_file() which either
- directly displays the error message and stops if in plumbing mode
(i.e. if show_all_errors is not initialized at 1)
- or stores it so that it will be displayed at the end with display_error_msgs(),
Storing the files by error type permits to have a list of files for
which there is the same error instead of having a serie of almost
identical errors.
As each bind_overlap error combines a file and an old file, a list cannot be
done, therefore, theses errors are not stored but directly displayed.
Signed-off-by: Matthieu Moy <Matthieu.Moy@imag.fr>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
This makes the traverse_trees() entry comparator routine use the more
relaxed form of name comparison that considers files and directories
with the same name identical.
We pass in a separate mask for just the directory entries, so that the
callback routine can decide (if it wants to) to only handle one or the
other type, but generally most (all?) users are expected to really want
to see the case of a name 'foo' showing up in one tree as a file and in
another as a directory at the same time.
In particular, moving 'unpack_trees()' over to use this tree traversal
mechanism requires this.
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
This allows the callback to return an error value, but it can also
specify which of the tree entries that it actually used up by returning
a positive mask value.
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
This makes the calling convention a bit less obvious, but a lot more
flexible. Instead of allocating and extending a new 'base' string, we
just link the top-most name into a linked list of the 'info' structure
when traversing a subdirectory, and we can generate the basename by
following the list.
Perhaps even more importantly, the linked list of info structures also
gives us a place to naturally save off other information than just the
directory name.
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>