The read_tree() function reads trees recursively (via its read_tree_at()
helper). This can cause it to run out of stack space on very deep trees.
Let's teach it about the new core.maxTreeDepth option.
The easiest way to demonstrate this is via "ls-tree -r", which the test
covers. Note that I needed a tree depth of ~30k to trigger a segfault on
my Linux system, not the 4100 used by our "big" test in t6700. However,
that test still tells us what we want: that the default 4096 limit is
enough to prevent segfaults on all platforms. We could bump it, but that
increases the cost of the test setup for little gain.
As an interesting side-note: when I originally wrote this patch about 4
years ago, I needed a depth of ~50k to segfault. But porting it forward,
the number is much lower. Seemingly little things like cf0983213c (hash:
add an algo member to struct object_id, 2021-04-26) take it from 32,722
to 29,080.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
None of base_name_compare(), df_name_compare(), or name_compare()
depended upon a cache_entry or index_state in any way. By moving these
functions to tree.h, half a dozen other files can stop depending upon
cache.h (though that change will be made in a later commit).
Signed-off-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Since cmp_cache_name_compare() was comparing cache_entry structs, it
was associated with the cache rather than with trees. Move the
function. As a side effect, we can make cache_name_stage_compare()
static as well.
Signed-off-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
hash.h depends upon and includes repository.h, due to the definition and
use of the_hash_algo (defined as the_repository->hash_algo). However,
most headers trying to include hash.h are only interested in the layout
of the structs like object_id. Move the parts of hash.h that do not
depend upon repository.h into a new file hash-ll.h (the "low level"
parts of hash.h), and adjust other files to use this new header where
the convenience inline functions aren't needed.
This allows hash.h and object.h to be fairly small, minimal headers. It
also exposes a lot of hidden dependencies on both path.h (which was
brought in by repository.h) and repository.h (which was previously
implicitly brought in by object.h), so also adjust other files to be
more explicit about what they depend upon.
Signed-off-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Simplify the signature of read_tree_recursive() to omit the "base",
"baselen" and "stage" arguments. No callers of it use these parameters
for anything anymore.
The last function to call read_tree_recursive() with a non-"" path was
read_tree_recursive() itself, but that was changed in
ffd31f661d (Reimplement read_tree_recursive() using
tree_entry_interesting(), 2011-03-25).
The last user of the "stage" parameter went away in the last commit,
and even that use was mere boilerplate.
So let's remove those and rename the read_tree_recursive() function to
just read_tree(). We had another read_tree() function that I've
refactored away in preceding commits, since all in-tree users read
trees recursively with a callback we can change the name to signify
that this is the norm.
Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Rename the static read_tree_1() function to read_tree_at(). This
function works just like read_tree_recursive(), except you provide
your own strbuf.
This step doesn't make much sense now, but in follow-up commits I'll
remove the base/baselen/stage arguments to read_tree_recursive(). At
that point an anticipated in-tree user[1] for the old
read_tree_recursive() couldn't provide a path to start the
traversal.
Let's give them a function to do so with an API that makes more sense
for them, by taking a strbuf we should be able to avoid more casting
and/or reallocations in the future.
1. https://lore.kernel.org/git/xmqqft106sok.fsf@gitster.g
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Since the read_tree() API was added around the same time as
read_tree_recursive() in 94537c78a8 (Move "read_tree()" to
"tree.c"[...], 2005-04-22) and b12ec373b8 ([PATCH] Teach read-tree
about commit objects, 2005-04-20) things have gradually migrated over
to the read_tree_recursive() version.
Now builtin/ls-files.c is the last user of this code, let's move all
the relevant code there. This allows for subsequent simplification of
it, and an eventual move to read_tree_recursive().
Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
These functions call tree_entry_interesting() which will soon require
a 'struct index_state *' to be passed in. Instead of just changing the
function signature to take an index, update to take a repo instead
because these functions do need object database access.
Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Add a repository argument to allow the callers of lookup_tree
to be more specific about which repository to act on. This is a small
mechanical change; it doesn't change the implementation to handle
repositories other than the_repository yet.
As with the previous commits, use a macro to catch callers passing a
repository other than the_repository at compile time.
Signed-off-by: Jonathan Nieder <jrnieder@gmail.com>
Signed-off-by: Stefan Beller <sbeller@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Convert the callback functions for read_tree_recursive to take a pointer
to struct object_id.
Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Convert parse_tree_indirect to take a pointer to struct object_id.
Update all the callers. This transformation was achieved using the
following semantic patch and manual updates to the declaration and
definition. Update builtin/checkout.c manually as well, since it uses a
ternary expression not handled by the semantic patch.
@@
expression E1;
@@
- parse_tree_indirect(E1.hash)
+ parse_tree_indirect(&E1)
@@
expression E1;
@@
- parse_tree_indirect(E1->hash)
+ parse_tree_indirect(E1)
Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Convert the lookup_tree function to take a pointer to struct object_id.
The commit was created with manual changes to tree.c, tree.h, and
object.c, plus the following semantic patch:
@@
@@
- lookup_tree(EMPTY_TREE_SHA1_BIN)
+ lookup_tree(&empty_tree_oid)
@@
expression E1;
@@
- lookup_tree(E1.hash)
+ lookup_tree(&E1)
@@
expression E1;
@@
- lookup_tree(E1->hash)
+ lookup_tree(E1)
Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Recent "git prune" traverses young unreachable objects to safekeep
old objects in the reachability chain from them, which sometimes
caused error messages that are unnecessarily alarming.
* jk/squelch-missing-link-warning-for-unreachable:
suppress errors on missing UNINTERESTING links
silence broken link warnings with revs->ignore_missing_links
add quieter versions of parse_{tree,commit}
When we call parse_commit, it will complain to stderr if the
object does not exist or cannot be read. This means that we
may produce useless error messages if this situation is
expected (e.g., because the object is marked UNINTERESTING,
or because revs->ignore_missing_links is set).
We can fix this by adding a new "parse_X_gently" form that
takes a flag to suppress the messages. The existing
"parse_X" form is already gentle in the sense that it
returns an error rather than dying, and we could in theory
just add a "quiet" flag to it (with existing callers passing
"0"). But doing it this way means we do not have to disturb
existing callers.
Note also that the new flag is "quiet_on_missing", and not
just "quiet". We could add a flag to suppress _all_ errors,
but besides being a more invasive change (we would have to
pass the flag down to sub-functions, too), there is a good
reason not to: we would never want to use it. Missing a
linked object is expected in some circumstances, but it is
never expected to have a malformed commit, or to get a tree
when we wanted a commit. We should always complain about
these corruptions.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
This allows the callback to use 'base' as a temporary buffer to
quickly assemble full path "without" extra allocation. The callback
has to restore it afterwards of course.
Helped-by: Eric Sunshine <sunshine@sunshineco.com>
Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Many code paths will free a tree object's buffer and set it
to NULL after finishing with it in order to keep memory
usage down during a traversal. However, out of 8 sites that
do this, only one actually unsets the "parsed" flag back.
Those sites that don't are setting a trap for later users of
the tree object; even after calling parse_tree, the buffer
will remain NULL, causing potential segfaults.
It is not known whether this is triggerable in the current
code. Most commands do not do an in-memory traversal
followed by actually using the objects again. However, it
does not hurt to be safe for future callers.
In most cases, we can abstract this out to a
"free_tree_buffer" helper. However, there are two
exceptions:
1. The fsck code relies on the parsed flag to know that we
were able to parse the object at one point. We can
switch this to using a flag in the "flags" field.
2. The index-pack code sets the buffer to NULL but does
not free it (it is freed by a caller). We should still
unset the parsed flag here, but we cannot use our
helper, as we do not want to free the buffer.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
This patch changes behavior of the two functions. Previously it does
prefix matching only. Now it can also do wildcard matching.
All callers are updated. Some gain wildcard matching (archive,
checkout), others reset pathspec_item.has_wildcard to retain old
behavior (ls-files, ls-tree as they are plumbing).
Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Add a pointer parameter to read_tree_recursive(), which is passed to the
callback function. This allows callers of read_tree_recursive() to
share data with the callback without resorting to global variables. All
current callers pass NULL.
Signed-off-by: Rene Scharfe <rene.scharfe@lsrfire.ath.cx>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The old tree_entry_list is dead, long live the unified single tree
parser.
Yes, we now still have a compatibility function to create a bogus
tree_entry_list in builtin-read-tree.c, but that is now entirely local
to that very messy piece of code.
I'd love to clean read-tree.c up too, but I'm too scared right now, so
the best I can do is to just contain the damage, and try to make sure
that no new users of the tree_entry_list sprout up by not having it as
an exported interface any more.
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Signed-off-by: Junio C Hamano <junkio@cox.net>
That was a hack, only needed because 'git fsck-objects' didn't look at
the raw tree format. Now that fsck traverses the tree itself, we can
drop it.
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Signed-off-by: Junio C Hamano <junkio@cox.net>
Instead, just use the tree buffer directly, and use the tree-walk
infrastructure to walk the buffers instead of the tree-entry list.
The tree-entry list is inefficient, and generates tons of small
allocations for no good reason. The tree-walk infrastructure is
generally no harder to use than following a linked list, and allows
us to do most tree parsing in-place.
Some programs still use the old tree-entry lists, and are a bit
painful to convert without major surgery. For them we have a helper
function that creates a temporary tree-entry list on demand.
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Signed-off-by: Junio C Hamano <junkio@cox.net>
This is preparatory work for further cleanups, where we try to make
tree_entry look more like the more efficient tree-walk descriptor.
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Signed-off-by: Junio C Hamano <junkio@cox.net>
This allows us to avoid allocating information for names etc, because
we can just use the information from the tree buffer directly.
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Signed-off-by: Junio C Hamano <junkio@cox.net>
This makes read_tree_recursive and read_tree take a struct tree
instead of a buffer. It also move the declaration of read_tree into
tree.h (where struct tree is defined), and updates ls-tree and
diff-index (the only places that presently use read_tree*()) to use
the new versions.
Signed-off-by: Daniel Barkalow <barkalow@iabervon.org>
Signed-off-by: Junio C Hamano <junkio@cox.net>
git-ls-tree should be rewritten to use a pathspec the same way everybody
else does. Right now it's the odd man out: if you do
git-ls-tree HEAD divers/char drivers/
it will show the same files _twice_, which is not how pathspecs in general
work.
How about this patch? It breaks some of the git-ls-tree tests, but it
makes git-ls-tree work a lot more like other git pathspec commands, and it
removes more than 150 lines by re-using the recursive tree traversal (but
the "-d" flag is gone for good, so I'm not pushing this too hard).
Linus
Add function to look up an object which is entirely unknown, so that
it can be put in a list. Various other functions related to lists of
objects.
Signed-off-by: Daniel Barkalow <barkalow@iabervon.org>
Signed-off-by: Junio C Hamano <junkio@cox.net>
In particular, warn about things like zero-padding of the mode bits,
which is a big no-no, since it makes otherwise identical trees have
different representations (and thus different SHA1 numbers).
Also make the warnings more regular.
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Make 'sha1' parameters const where possible
Signed-off-by: Jason McMullan <jason.mcmullan@timesys.com>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
This is a complete rewrite of ls-tree to make it behave more
like what "/bin/ls -a" does in the current working directory.
Namely, the changes are:
- Unlike the old ls-tree behaviour that used paths arguments to
restrict output (not that it worked as intended---as pointed
out in the mailing list discussion, it was quite incoherent),
this rewrite uses paths arguments to specify what to show.
- Without arguments, it implicitly uses the root level as its
sole argument ("/bin/ls -a" behaves as if "." is given
without argument).
- Without -r (recursive) flag, it shows the named blob (either
file or symlink), or the named tree and its immediate
children.
- With -r flag, it shows the named path, and recursively
descends into it if it is a tree.
- With -d flag, it shows the named path and does not show its
children even if the path is a tree, nor descends into it
recursively.
This is still request-for-comments patch. There is no mailing
list consensus that this proposed new behaviour is a good one.
The patch to t/t3100-ls-tree-restrict.sh illustrates
user-visible behaviour changes. Namely:
* "git-ls-tree $tree path1 path0" lists path1 first and then
path0. It used to use paths as an output restrictor and
showed output in cache entry order (i.e. path0 first and then
path1) regardless of the order of paths arguments.
* "git-ls-tree $tree path2" lists path2 and its immediate
children but having explicit paths argument does not imply
recursive behaviour anymore, hence paths/baz is shown but not
paths/baz/b.
Signed-off-by: Junio C Hamano <junkio@cox.net>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
It turns out that parse_object() is loading and decompressing given
object to free it just before calling the specific object parsing
function which does mmap and decompress the same object again. This
patch introduces the ability to parse specific objects directly from a
memory buffer.
Without this patch, running git-fsck-cache on the kernel repositorytake:
real 0m13.006s
user 0m11.421s
sys 0m1.218s
With this patch applied:
real 0m8.060s
user 0m7.071s
sys 0m0.710s
The performance increase is significant, and this is kind of a
prerequisite for sane delta object support with fsck.
Signed-off-by: Nicolas Pitre <nico@cam.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
The tree object parsing used to get the executable bit wrong,
and didn't know about symlinks. Also, fsck really wants the
full mode value so that it can verify the other bits for sanity,
so save it all in struct tree_entry.
We check the ordering of the entries, and we verify that none
of the entries has a slash in it (this allows us to remove the
hacky "has_full_path" member from the tree structure, since we
now just test it by walking the tree entries instead).
This adds the contents of trees to struct tree.
Signed-Off-By: Daniel Barkalow <barkalow@iabervon.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
This adds the structs and function declarations for parsing git objects.
Signed-Off-By: Daniel Barkalow <barkalow@iabervon.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>