1c56fc2084 (name-rev: pre-size buffer in get_parent_name(), 2020-02-04)
got a big performance boost in an unusual repository by calculating the
name length in advance. This is a bit awkward, as it references the
name components twice.
Use a memory pool to store the strings for the struct rev_name member
tip_name. Using mem_pool_strfmt() allows efficient allocation without
explicit size calculation. This simplifies the formatting part of the
code without giving up performance:
Benchmark 1: ./git_2.44.0 -C ../chromium/src name-rev --all
Time (mean ± σ): 1.231 s ± 0.013 s [User: 1.082 s, System: 0.136 s]
Range (min … max): 1.214 s … 1.252 s 10 runs
Benchmark 2: ./git -C ../chromium/src name-rev --all
Time (mean ± σ): 1.220 s ± 0.020 s [User: 1.083 s, System: 0.130 s]
Range (min … max): 1.197 s … 1.254 s 10 runs
Don't bother discarding the memory pool just before exiting. The effort
for that would be very low, but actually measurable in the above
example, with no benefit to users. At least UNLEAK it to calm down leak
checkers. This addresses the leaks that 45a14f578e (Revert "name-rev:
release unused name strings", 2022-04-22) brought back.
Signed-off-by: René Scharfe <l.s.r@web.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
adfc1857bd (describe: fix --contains when a tag is given as input,
2013-07-18) added the option --peel-tag, defining it using a positional
struct option initializer and a comment indicating that it's intended to
be a hidden OPT_BOOL. 4741edd549 (Remove deprecated OPTION_BOOLEAN for
parsing arguments, 2013-08-03) added the macro OPT_HIDDEN_BOOL, which
allows to express this more succinctly. Use it.
Signed-off-by: René Scharfe <l.s.r@web.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
alloc_nr, ALLOC_GROW, and ALLOC_GROW_BY are commonly used macros for
dynamic array allocation. Moving these macros to git-compat-util.h with
the other alloc macros focuses alloc.[ch] to allocation for Git objects
and additionally allows us to remove inclusions to alloc.h from files
that solely used the above macros.
Signed-off-by: Calvin Wan <calvinwan@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The include of wildmatch.h in git-compat-util.h was added in cebcab189a
(Makefile: add USE_WILDMATCH to use wildmatch as fnmatch, 2013-01-01) as
a way to be able to compile-time force any calls to fnmatch() to instead
invoke wildmatch(). The defines and inline function were removed in
70a8fc999d (stop using fnmatch (either native or compat), 2014-02-15),
and this include in git-compat-util.h has been unnecessary ever since.
Remove the include from git-compat-util.h, but add it to the .c files
that had omitted the direct #include they needed.
Signed-off-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
In 34ae3b70 (name-rev: deprecate --stdin in favor of --annotate-stdin),
we renamed --stdin to --annotate-stdin for the sake of a clearer name
for the option, and added text that indicates --stdin is deprecated. The
next step is to hide --stdin completely.
Make the option hidden. Also, update documentation to remove all
mentions of --stdin.
Signed-off-by: "John Cai" <johncai86@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
* ab/remove-implicit-use-of-the-repository:
libs: use "struct repository *" argument, not "the_repository"
post-cocci: adjust comments for recent repo_* migration
cocci: apply the "revision.h" part of "the_repository.pending"
cocci: apply the "rerere.h" part of "the_repository.pending"
cocci: apply the "refs.h" part of "the_repository.pending"
cocci: apply the "promisor-remote.h" part of "the_repository.pending"
cocci: apply the "packfile.h" part of "the_repository.pending"
cocci: apply the "pretty.h" part of "the_repository.pending"
cocci: apply the "object-store.h" part of "the_repository.pending"
cocci: apply the "diff.h" part of "the_repository.pending"
cocci: apply the "commit.h" part of "the_repository.pending"
cocci: apply the "commit-reach.h" part of "the_repository.pending"
cocci: apply the "cache.h" part of "the_repository.pending"
cocci: add missing "the_repository" macros to "pending"
cocci: sort "the_repository" rules by header
cocci: fix incorrect & verbose "the_repository" rules
cocci: remove dead rule from "the_repository.pending.cocci"
Apply the part of "the_repository.pending.cocci" pertaining to
"commit.h".
Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Apply the part of "the_repository.pending.cocci" pertaining to
"cache.h".
Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Dozens of files made use of gettext functions, without explicitly
including gettext.h. This made it more difficult to find which files
could remove a dependence on cache.h. Make C files explicitly include
gettext.h if they are using it.
However, while compat/fsmonitor/fsm-ipc-darwin.c should also gain an
include of gettext.h, it was left out to avoid conflicting with an
in-flight topic.
Signed-off-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
This allows us to replace includes of cache.h with includes of the much
smaller alloc.h in many places. It does mean that we also need to add
includes of alloc.h in a number of C files.
Signed-off-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Leak fixes.
* ab/various-leak-fixes:
push: free_refs() the "local_refs" in set_refspecs()
push: refactor refspec_append_mapped() for subsequent leak-fix
receive-pack: release the linked "struct command *" list
grep API: plug memory leaks by freeing "header_list"
grep.c: refactor free_grep_patterns()
builtin/merge.c: free "&buf" on "Your local changes..." error
builtin/merge.c: use fixed strings, not "strbuf", fix leak
show-branch: free() allocated "head" before return
commit-graph: fix a parse_options_concat() leak
http-backend.c: fix cmd_main() memory leak, refactor reg{exec,free}()
http-backend.c: fix "dir" and "cmd_arg" leaks in cmd_main()
worktree: fix a trivial leak in prune_worktrees()
repack: fix leaks on error with "goto cleanup"
name-rev: don't xstrdup() an already dup'd string
various: add missing clear_pathspec(), fix leaks
clone: use free() instead of UNLEAK()
commit-graph: use free_commit_graph() instead of UNLEAK()
bundle.c: don't leak the "args" in the "struct child_process"
tests: mark tests as passing with SANITIZE=leak
Commit 7550424804 ("name-rev: include taggerdate in considering the best
name", 2016-04-22) introduced the idea of using taggerdate in the
criteria for selecting the best name. At the time, a certain commit in
linux.git -- namely, aed06b9cfcab -- was being named by name-rev as
v4.6-rc1~9^2~792
which, while correct, was very suboptimal. Some investigation found
that tweaking the MERGE_TRAVERSAL_WEIGHT to lower it could give
alternate answers such as
v3.13-rc7~9^2~14^2~42
or
v3.13~5^2~4^2~2^2~1^2~42
A manual solution involving looking at tagger dates came up with
v3.13-rc1~65^2^2~42
which is much nicer. That workaround was then implemented in name-rev.
Unfortunately, the taggerdate heuristic is causing bugs. I was pointed
to a case in a private repository where name-rev reports a name of the
form
v2022.10.02~86
when users expected to see one of the form
v2022.10.01~2
(I've modified the names and numbers a bit from the real testcase.) As
you can probably guess, v2022.10.01 was created after v2022.10.02 (by a
few hours), even though it pointed to an older commit. While the
condition is unusual even in the repository in question, it is not the
only problematic set of tags in that repository. The taggerdate logic
is causing problems.
Further, it turns out that this taggerdate heuristic isn't even helping
anymore. Due to the fix to naming logic in 3656f84278 ("name-rev:
prefer shorter names over following merges", 2021-12-04), we get
improved names without the taggerdate heuristic. For the original
commit of interest in linux.git, a modern git without the taggerdate
heuristic still provides the same optimal answer of interest, namely:
v3.13-rc1~65^2^2~42
So, the taggerdate is no longer providing benefit, and it is causing
problems. Simply get rid of it.
However, note that "taggerdate" as a variable is used to store things
besides a taggerdate these days. Ever since commit ef1e74065c
("name-rev: favor describing with tags and use committer date to
tiebreak", 2017-03-29), this has been used to store committer dates and
there it is used as a fallback tiebreaker (as opposed to a primary
criteria overriding effective distance calculations). We do not want to
remove that fallback tiebreaker, so not all instances of "taggerdate"
are removed in this change.
Signed-off-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
When "add_to_tip_table()" is called with a non-zero
"shorten_unambiguous" we always return an xstrdup()'d string, which
we'd then xstrdup() again, leaking memory. See [1] and [2] for how
this leak came about.
We could xstrdup() only if "shorten_unambiguous" wasn't true, but
let's instead inline this code, so that information on whether we need
to xstrdup() is contained within add_to_tip_table().
1. 98c5c4ad01 (name-rev: allow to specify a subpath for --refs
option, 2013-06-18)
2. b23e0b9353 (name-rev: allow converting the exact object name at
the tip of a ref, 2013-07-07)
Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
As reported in [1] the "UNUSED(var)" macro introduced in
2174b8c75d (Merge branch 'jk/unused-annotation' into next,
2022-08-24) breaks coccinelle's parsing of our sources in files where
it occurs.
Let's instead partially go with the approach suggested in [2] of
making this not take an argument. As noted in [1] "coccinelle" will
ignore such tokens in argument lists that it doesn't know about, and
it's less of a surprise to syntax highlighters.
This undoes the "help us notice when a parameter marked as unused is
actually use" part of 9b24034754 (git-compat-util: add UNUSED macro,
2022-08-19), a subsequent commit will further tweak the macro to
implement a replacement for that functionality.
1. https://lore.kernel.org/git/220825.86ilmg4mil.gmgdl@evledraar.gmail.com/
2. https://lore.kernel.org/git/220819.868rnk54ju.gmgdl@evledraar.gmail.com/
Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Functions used with for_each_ref(), etc, need to conform to the
each_ref_fn interface. But most of them don't need every parameter;
let's annotate the unused ones to quiet -Wunused-parameter.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
This is an option rather than command. Make the message convey this
similar to the other messages in the file.
Signed-off-by: Alexander Shopov <ash@kambanaria.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Regression fix for 2.36 where "git name-rev" started to sometimes
reference strings after they are freed.
* rs/name-rev-fix-free-after-use:
Revert "name-rev: release unused name strings"
This reverts commit 2d53975488.
3656f84278 (name-rev: prefer shorter names over following merges,
2021-12-04) broke the assumption of 2d53975488 (name-rev: release unused
name strings, 2020-02-04) that a better name for a child is a better
name for all of its ancestors as well, because it added a penalty for
generation > 0. This leads to strings being free(3)'d that are still
needed.
079f970971 (name-rev: sort tip names before applying, 2020-02-05)
already reduced the number of free(3) calls for the use case that
motivated the original patch (name-rev --all in the Chromium repository)
from ca. 44000 to 5, and 3656f84278 eliminated even those few. So this
revert won't affect name-rev's performance on that particular repo.
Reported-by: Thomas Hurst <tom@hur.st>
Helped-by: Elijah Newren <newren@gmail.com>
Signed-off-by: René Scharfe <l.s.r@web.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
If a commit in a sequence of linear history has a non-monotonically
increasing commit timestamp, git name-rev might not properly name the
commit.
This occurs because name-rev uses a heuristic of the commit date to
avoid searching down tags which lead to commits that are older than the
named commit. This is intended to avoid work on larger repositories.
This heuristic impacts git name-rev, and by extension git describe
--contains which is built on top of name-rev.
Further more, if --all or --annotate-stdin is used, the heuristic is not
enabled because the full history has to be analyzed anyways. This
results in some confusion if a user sees that --annotate-stdin works but
a normal name-rev does not.
If the repository has a commit graph, we can use the generation numbers
instead of using the commit dates. This is essentially the same check
except that generation numbers make it exact, where the commit date
heuristic could be incorrect due to clock errors.
Since we're extending the notion of cutoff to more than one variable,
create a series of functions for setting and checking the cutoff. This
avoids duplication and moves access of the global cutoff and
generation_cutoff to as few functions as possible.
Add several test cases including a test that covers the new commitGraph
behavior, as well as tests for --all and --annotate-stdin with and
without commitGraphs.
Signed-off-by: Jacob Keller <jacob.keller@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
34ae3b70 (name-rev: deprecate --stdin in favor of --annotate-stdin,
2022-01-05) added --annotate-stdin to replace --stdin as a clearer
flag name. Since --stdin is to be deprecated, we should replace
--stdin in the output from "git name-rev -h".
Signed-off-by: John Cai <johncai86@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Using a buffer limited to 2048 is unnecessarily limiting. Switch to
using a string buffer to read in stdin for annotation.
Signed-off-by: "John Cai" <johncai86@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Introduce a --annotate-stdin that is functionally equivalent of --stdin.
--stdin does not behave as --stdin in other subcommands, such as
pack-objects whereby it takes one argument per line. Since --stdin can
be a confusing and misleading name, rename it to --annotate-stdin.
This change adds a warning to --stdin warning that it will be removed in
the future.
Signed-off-by: "John Cai" <johncai86@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
name-rev has a MERGE_TRAVERSAL_WEIGHT to say that traversing a second or
later parent of a merge should be 65535 times more expensive than a
first-parent traversal, as per ac076c29ae (name-rev: Fix non-shortest
description, 2007-08-27). The point of this weight is to prefer names
like
v2.32.0~1471^2
over names like
v2.32.0~43^2~15^2~11^2~20^2~31^2
which are two equally valid names in git.git for the same commit. Note
that the first follows 1472 parent traversals compared to a mere 125 for
the second. Weighting all traversals equally would clearly prefer the
second name since it has fewer parent traversals, but humans aren't
going to be traversing commits and they tend to have an easier time
digesting names with fewer segments. The fact that the former only has
two segments (~1471, ^2) makes it much simpler than the latter which has
six segments (~43, ^2, ~15, etc.). Since name-rev is meant to "find
symbolic names suitable for human digestion", we prefer fewer segments.
However, the particular rule implemented in name-rev would actually
prefer
v2.33.0-rc0~11^2~1
over
v2.33.0-rc0~20^2
because both have precisely one second parent traversal, and it gives
the tie breaker to shortest number of total parent traversals. Fewer
segments is more important for human consumption than number of hops, so
we'd rather see the latter which has one fewer segment.
Include the generation in is_better_name() and use a new
effective_distance() calculation so that we prefer fewer segments in
the printed name over fewer total parent traversals performed to get the
answer.
== Side-note on tie-breakers ==
When there are the same number of segments for two different names, we
actually use the name of an ancestor commit as a tie-breaker as well.
For example, for the commit cbdca289fb in the git.git repository, we
prefer the name v2.33.0-rc0~112^2~1 over v2.33.0-rc0~57^2~5. This is
because:
* cbdca289fb is the parent of 25e65b6dd5, which implies the name for
cbdca289fb should be the first parent of the preferred name for
25e65b6dd5
* 25e65b6dd5 could be named either v2.33.0-rc0~112^2 or
v2.33.0-rc0~57^2~4, but the former is preferred over the latter due
to fewer segments
* combine the two previous facts, and the name we get for cbdca289fb
is "v2.33.0-rc0~112^2~1" rather than "v2.33.0-rc0~57^2~5".
Technically, we get this for free out of the implementation since we
only keep track of one name for each commit as we walk history (and
re-add parents to the queue if we find a better name for those parents),
but the first bullet point above ensures users get results that feel
more consistent.
== Alternative Ideas and Meanings Discussed ==
One suggestion that came up during review was that shortest
string-length might be easiest for users to consume. However, such a
scheme would be rather computationally expensive (we'd have to track all
names for each commit as we traversed the graph) and would additionally
come with the possibly perplexing result that on a linear segment of
history we could rapidly swap back and forth on names:
MYTAG~3^2 would be preferred over MYTAG~9998
MYTAG~3^2~1 would NOT be preferred over MYTAG~9999
MYTAG~3^2~2 might be preferred over MYTAG~10000
Another item that came up was possible auxiliary semantic meanings for
name-rev results either before or after this patch. The basic answer
was that the previous implementation had no known useful auxiliary
semantics, but that for many repositories (most in my experience), the
new scheme does. In particular, the new name-rev output can often be
used to answer the question, "How or when did this commit get merged?"
Since that usefulness depends on how merges happen within the repository
and thus isn't universally applicable, details are omitted here but you
can see them at [1].
[1] https://lore.kernel.org/git/CABPp-BEeUM+3NLKDVdak90_UUeNghYCx=Dgir6=8ixvYmvyq3Q@mail.gmail.com/
Finally, it was noted that the algorithm could be improved by just
explicitly tracking the number of segments and using both it and
distance in the comparison, instead of giving a magic number that tries
to blend the two (and which therefore might give suboptimal results in
repositories with really huge numbers of commits that periodically merge
older code). However, "[this patch] seems to give us a much better
results than the current code, so let's take it and leave further
futzing outside the scope."
Signed-off-by: Elijah Newren <newren@gmail.com>
Acked-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Acked-by: Johannes Schindelin <Johannes.Schindelin@gmx.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
When we are looking up an oid in an array, we obviously don't need to
write to the array. Let's mark it as const in the function interfaces,
as well as in the local variables we use to derference the void pointer
(note a few cases use pointers-to-pointers, so we mark everything
const).
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
All of our callers are actually looking up an object_id, not a bare
hash. Likewise, the arrays they are looking in are actual arrays of
object_id (not just raw bytes of hashes, as we might find in a pack
.idx; those are handled by bsearch_hash()).
Using an object_id gives us more type safety, and makes the callers
slightly shorter. It also gets rid of the word "sha1" from several
access functions, though we could obviously also rename those with
s/sha1/hash/.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Change all remnants of "sha1" in hash-lookup.c and .h and rename them to
reflect that we're not just able to handle SHA-1 these days.
Signed-off-by: Martin Ågren <martin.agren@gmail.com>
Reviewed-by: Derrick Stolee <dstolee@microsoft.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Rename this function to reflect that we're not just able to handle SHA-1
these days. There are a few instances of "sha1" left in sha1-lookup.[ch]
after this, but those will be addressed in the next commit.
Signed-off-by: Martin Ågren <martin.agren@gmail.com>
Reviewed-by: Derrick Stolee <dstolee@microsoft.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
There are still a handful mentions of SHA-1 when we meant the
(hexadecimal) object names in end-user facing messages. Rewrite
them.
I was hoping that this can mostly be s/SHA-1/object name/, but
a few messages needed rephrasing to keep the result readable.
Signed-off-by: Junio C Hamano <gitster@pobox.com>
name_ref() is called for each ref and checks if its a better name for
the referenced commit. If that's the case it remembers it and checks if
a name based on it is better for its ancestors as well. This in done in
the the order for_each_ref() imposes on us.
That might not be optimal. If bad names happen to be encountered first
(as defined by is_better_name()), names derived from them may spread to
a lot of commits, only to be replaced by better names later. Setting
better names first can avoid that.
is_better_name() prefers tags, short distances and old references. The
distance is a measure that we need to calculate for each candidate
commit, but the other two properties are not dependent on the
relationships of commits. Sorting the refs by them should yield better
performance than the essentially random order we currently use.
And applying older references first should also help to reduce rework
due to the fact that older commits have less ancestors than newer ones.
So add all details of names to the tip table first, then sort them
to prefer tags and older references and then apply them in this order.
Here's the performance as measures by hyperfine for the Linux repo
before:
Benchmark #1: ./git -C ../linux/ name-rev --all
Time (mean ± σ): 851.1 ms ± 4.5 ms [User: 806.7 ms, System: 44.4 ms]
Range (min … max): 845.9 ms … 859.5 ms 10 runs
... and with this patch:
Benchmark #1: ./git -C ../linux/ name-rev --all
Time (mean ± σ): 736.2 ms ± 8.7 ms [User: 688.4 ms, System: 47.5 ms]
Range (min … max): 726.0 ms … 755.2 ms 10 runs
Signed-off-by: René Scharfe <l.s.r@web.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
name_rev() assigns a name to a commit and its parents and grandparents
and so on. Commits share their name string with their first parent,
which in turn does the same, recursively to the root. That saves a lot
of allocations. When a better name is found, the old name is replaced,
but its memory is not released. That leakage can become significant.
Can we release these old strings exactly once even though they are
referenced multiple times? Yes, indeed -- we can make use of the fact
that name_rev() visits the ancestors of a commit after it set a new name
for it and tries to update their names as well.
Members of the first ancestral line have the same taggerdate and
from_tag values, but a higher distance value than their child commit at
generation 0. These are the only criteria used by is_better_name().
Lower distance values are considered better, so a name that is better
for a child will also be better for its parent and grandparent etc.
That means we can free(3) an inferior name at generation 0 and rely on
name_rev() to replace all references in ancestors as well.
If we do that then we need to stop using the string pointer alone to
distinguish new empty rev_name slots from initialized ones, though, as
it technically becomes invalid after the free(3) call -- even though its
value is still different from NULL.
We can check the generation value first, as empty slots will have it
initialized to 0, and for the actual generation 0 we'll set a new valid
name right after the create_or_update_name() call that releases the
string.
For the Chromium repo, releasing superceded names reduces the memory
footprint of name-rev --all significantly. Here's the output of GNU
time before:
0.98user 0.48system 0:01.46elapsed 99%CPU (0avgtext+0avgdata 2601812maxresident)k
0inputs+0outputs (0major+571470minor)pagefaults 0swaps
... and with this patch:
1.01user 0.26system 0:01.28elapsed 100%CPU (0avgtext+0avgdata 1559196maxresident)k
0inputs+0outputs (0major+314370minor)pagefaults 0swaps
It also gets faster; hyperfine before:
Benchmark #1: ./git -C ../chromium/src name-rev --all
Time (mean ± σ): 1.534 s ± 0.006 s [User: 1.039 s, System: 0.494 s]
Range (min … max): 1.522 s … 1.542 s 10 runs
... and with this patch:
Benchmark #1: ./git -C ../chromium/src name-rev --all
Time (mean ± σ): 1.338 s ± 0.006 s [User: 1.047 s, System: 0.291 s]
Range (min … max): 1.327 s … 1.346 s 10 runs
For the Linux repo it doesn't pay off; memory usage only gets down from:
0.76user 0.03system 0:00.80elapsed 99%CPU (0avgtext+0avgdata 292848maxresident)k
0inputs+0outputs (0major+44579minor)pagefaults 0swaps
... to:
0.78user 0.03system 0:00.81elapsed 100%CPU (0avgtext+0avgdata 284696maxresident)k
0inputs+0outputs (0major+44892minor)pagefaults 0swaps
The runtime actually increases slightly from:
Benchmark #1: ./git -C ../linux/ name-rev --all
Time (mean ± σ): 828.8 ms ± 5.0 ms [User: 797.2 ms, System: 31.6 ms]
Range (min … max): 824.1 ms … 838.9 ms 10 runs
... to:
Benchmark #1: ./git -C ../linux/ name-rev --all
Time (mean ± σ): 847.6 ms ± 3.4 ms [User: 807.9 ms, System: 39.6 ms]
Range (min … max): 843.4 ms … 854.3 ms 10 runs
Why is that? In the Chromium repo, ca. 44000 free(3) calls in
create_or_update_name() release almost 1GB, while in the Linux repo
240000+ calls release a bit more than 5MB, so the average discarded
name is ca. 1000x longer in the latter.
Overall I think it's the right tradeoff to make, as it helps curb the
memory usage in repositories with big discarded names, and the added
overhead is small.
Signed-off-by: René Scharfe <l.s.r@web.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Leave setting the tip_name member of struct rev_name to callers of
create_or_update_name(). This avoids allocations for names that are
rejected by that function. Here's how this affects the runtime when
working with a fresh clone of Git's own repository; performance numbers
by hyperfine before:
Benchmark #1: ./git -C ../git-pristine/ name-rev --all
Time (mean ± σ): 437.8 ms ± 4.0 ms [User: 422.5 ms, System: 15.2 ms]
Range (min … max): 432.8 ms … 446.3 ms 10 runs
... and with this patch:
Benchmark #1: ./git -C ../git-pristine/ name-rev --all
Time (mean ± σ): 408.5 ms ± 1.4 ms [User: 387.2 ms, System: 21.2 ms]
Range (min … max): 407.1 ms … 411.7 ms 10 runs
Signed-off-by: René Scharfe <l.s.r@web.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
We can calculate the size of new name easily and precisely. Open-code
the xstrfmt() calls and grow the buffers as needed before filling them.
This provides a surprisingly large benefit when working with the
Chromium repository; here are the numbers measured using hyperfine
before:
Benchmark #1: ./git -C ../chromium/src name-rev --all
Time (mean ± σ): 5.822 s ± 0.013 s [User: 5.304 s, System: 0.516 s]
Range (min … max): 5.803 s … 5.837 s 10 runs
... and with this patch:
Benchmark #1: ./git -C ../chromium/src name-rev --all
Time (mean ± σ): 1.527 s ± 0.003 s [User: 1.015 s, System: 0.511 s]
Range (min … max): 1.524 s … 1.535 s 10 runs
Signed-off-by: René Scharfe <l.s.r@web.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Reduce nesting by moving code to come up with a name for the parent into
its own function.
Signed-off-by: René Scharfe <l.s.r@web.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The commit slab commit_rev_name contains a pointer to a struct rev_name,
and the actual struct is allocated separatly. Avoid that allocation and
pointer indirection by storing the full struct in the commit slab. Use
the tip_name member pointer to determine if the returned struct is
initialized.
Performance in the Linux repository measured with hyperfine before:
Benchmark #1: ./git -C ../linux/ name-rev --all
Time (mean ± σ): 953.5 ms ± 6.3 ms [User: 901.2 ms, System: 52.1 ms]
Range (min … max): 945.2 ms … 968.5 ms 10 runs
... and with this patch:
Benchmark #1: ./git -C ../linux/ name-rev --all
Time (mean ± σ): 851.0 ms ± 3.1 ms [User: 807.4 ms, System: 43.6 ms]
Range (min … max): 846.7 ms … 857.0 ms 10 runs
Signed-off-by: René Scharfe <l.s.r@web.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Look up the commit slab slot for the commit once using
commit_rev_name_at() and populate it in case it is empty, instead of
checking for emptiness in a separate step using commit_rev_name_peek()
via get_commit_rev_name().
Signed-off-by: René Scharfe <l.s.r@web.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
name_ref() duplicates the path string and passes it to name_rev(), which
either puts it into a commit slab or ignores it if there is already a
better name, leaking it. Move the duplication to name_rev() and release
the copy in the latter case.
Signed-off-by: René Scharfe <l.s.r@web.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Keep the const qualifier of the first parameter of get_rev_name() even
when casting the object pointer to a commit pointer, and further for the
parameter of get_commit_rev_name(), as all these uses are read-only.
Signed-off-by: René Scharfe <l.s.r@web.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The type alias became unused with bf43abc6e6 (name-rev: use sizeof(*ptr)
instead of sizeof(type) in allocation, 2019-11-12); remove it.
Signed-off-by: René Scharfe <l.s.r@web.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
This code was moved straight out of name_rev(). As such, we inherited
the "goto" to jump from an if into an else-if. We also inherited the
fact that "nothing to do -- return NULL" is handled last.
Rewrite the function to first handle the "nothing to do" case. Then we
can handle the conditional allocation early before going on to populate
the struct. No need for goto-ing.
Signed-off-by: Martin Ågren <martin.agren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Redo "git name-rev" to avoid recursive calls.
* sg/name-rev-wo-recursion:
name-rev: cleanup name_ref()
name-rev: eliminate recursion in name_rev()
name-rev: use 'name->tip_name' instead of 'tip_name'
name-rev: drop name_rev()'s 'generation' and 'distance' parameters
name-rev: restructure creating/updating 'struct rev_name' instances
name-rev: restructure parsing commits and applying date cutoff
name-rev: pull out deref handling from the recursion
name-rev: extract creating/updating a 'struct name_rev' into a helper
t6120: add a test to cover inner conditions in 'git name-rev's name_rev()
name-rev: use sizeof(*ptr) instead of sizeof(type) in allocation
name-rev: avoid unnecessary cast in name_ref()
name-rev: use strbuf_strip_suffix() in get_rev_name()
t6120-describe: modernize the 'check_describe' helper
t6120-describe: correct test repo history graph in comment
Earlier patches in this series moved a couple of conditions from the
recursive name_rev() function into its caller name_ref(), for no other
reason than to make eliminating the recursion a bit easier to follow.
Since the previous patch name_rev() is not recursive anymore, so let's
move all those conditions back into name_rev().
Signed-off-by: SZEDER Gábor <szeder.dev@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The name_rev() function calls itself recursively for each interesting
parent of the commit it got as parameter, and, consequently, it can
segfault when processing a deep history if it exhausts the available
stack space. E.g. running 'git name-rev --all' and 'git name-rev
HEAD~100000' in the gcc, gecko-dev, llvm, and WebKit repositories
results in segfaults on my machine ('ulimit -s' reports 8192kB of
stack size limit), and nowadays the former segfaults in the Linux repo
as well (it reached the necessasry depth sometime between v5.3-rc4 and
-rc5).
Eliminate the recursion by inserting the interesting parents into a
LIFO 'prio_queue' [1] and iterating until the queue becomes empty.
Note that the parent commits must be added in reverse order to the
LIFO 'prio_queue', so their relative order is preserved during
processing, i.e. the first parent should come out first from the
queue, because otherwise performance greatly suffers on mergy
histories [2].
The stacksize-limited test 'name-rev works in a deep repo' in
't6120-describe.sh' demonstrated this issue and expected failure. Now
the recursion is gone, so flip it to expect success. Also gone are
the dmesg entries logging the segfault of that segfaulting 'git
name-rev' process on every execution of the test suite.
Note that this slightly changes the order of lines in the output of
'git name-rev --all', usually swapping two lines every 35 lines in
git.git or every 150 lines in linux.git. This shouldn't matter in
practice, because the output has always been unordered anyway.
This patch is best viewed with '--ignore-all-space'.
[1] Early versions of this patch used a 'commit_list', resulting in
~15% performance penalty for 'git name-rev --all' in 'linux.git',
presumably because of the memory allocation and release for each
insertion and removal. Using a LIFO 'prio_queue' has basically no
effect on performance.
[2] We prefer shorter names, i.e. 'v0.1~234' is preferred over
'v0.1^2~5', meaning that usually following the first parent of a
merge results in the best name for its ancestors. So when later
we follow the remaining parent(s) of a merge, and reach an already
named commit, then we usually find that we can't give that commit
a better name, and thus we don't have to visit any of its
ancestors again.
OTOH, if we were to follow the Nth parent of the merge first, then
the name of all its ancestors would include a corresponding '^N'.
Those are not the best names for those commits, so when later we
reach an already named commit following the first parent of that
merge, then we would have to update the name of that commit and
the names of all of its ancestors as well. Consequently, we would
have to visit many commits several times, resulting in a
significant slowdown.
Signed-off-by: SZEDER Gábor <szeder.dev@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Following the previous patches in this series we can get the value of
'name_rev()'s 'tip_name' parameter from the 'struct rev_name'
associated with the commit as well.
So let's use 'name->tip_name' instead, which makes the patch
eliminating the recursion of name_rev() a bit easier to follow.
Note that at this point we could drop the 'tip_name' parameter as
well, but that parameter will be necessary later, after the recursion
is eliminated.
Signed-off-by: SZEDER Gábor <szeder.dev@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Following the previous patches in this series we can get the values of
name_rev()'s 'generation' and 'distance' parameters from the 'stuct
rev_name' associated with the commit as well.
Let's simplify the function's signature and remove these two
unnecessary parameters.
Note that at this point we could do the same with the 'tip_name',
'taggerdate' and 'from_tag' parameters as well, but those parameters
will be necessary later, after the recursion is eliminated.
Signed-off-by: SZEDER Gábor <szeder.dev@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
At the beginning of the recursive name_rev() function it creates a new
'struct rev_name' instance for each previously unvisited commit or, if
this visit results in better name for an already visited commit, then
updates the 'struct rev_name' instance attached to the commit, or
returns early.
Restructure this so it's caller creates or updates the 'struct
rev_name' instance associated with the commit to be passed as
parameter, i.e. both name_ref() before calling name_rev() and
name_rev() itself as it iterates over the parent commits.
This makes eliminating the recursion a bit easier to follow, and the
condition moved to name_ref() will be moved back to name_rev() after
the recursion is eliminated.
This change also plugs the memory leak that was temporarily unplugged
in the earlier "name-rev: pull out deref handling from the recursion"
patch in this series.
Signed-off-by: SZEDER Gábor <szeder.dev@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
At the beginning of the recursive name_rev() function it parses the
commit it got as parameter, and returns early if the commit is older
than a cutoff limit.
Restructure this so the caller parses the commit and checks its date,
and doesn't invoke name_rev() if the commit to be passed as parameter
is older than the cutoff, i.e. both name_ref() before calling
name_rev() and name_rev() itself as it iterates over the parent
commits.
This makes eliminating the recursion a bit easier to follow, and the
condition moved to name_ref() will be moved back to name_rev() after
the recursion is eliminated.
Signed-off-by: SZEDER Gábor <szeder.dev@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>