git/builtin/for-each-ref.c

129 lines
4.1 KiB
C
Raw Normal View History

#include "builtin.h"
#include "config.h"
#include "gettext.h"
#include "refs.h"
#include "object.h"
#include "parse-options.h"
#include "ref-filter.h"
#include "strbuf.h"
#include "strvec.h"
for-each-ref: add ahead-behind format atom The previous change implemented the ahead_behind() method, including an algorithm to compute the ahead/behind values for a number of commit tips relative to a number of commit bases. Now, integrate that algorithm as part of 'git for-each-ref' hidden behind a new format atom, ahead-behind. This naturally extends to 'git branch' and 'git tag' builtins, as well. This format allows specifying multiple bases, if so desired, and all matching references are compared against all of those bases. For this reason, failing to read a reference provided from these atoms results in an error. In order to translate the ahead_behind() method information to the format output code in ref-filter.c, we must populate arrays of ahead_behind_count structs. In struct ref_array, we store the full array that will be passed to ahead_behind(). In struct ref_array_item, we store an array of pointers that point to the relvant items within the full array. In this way, we can pull all relevant ahead/behind values directly when formatting output for a specific item. It also ensures the lifetime of the ahead_behind_count structs matches the time that the array is being used. Add specific tests of the ahead/behind counts in t6600-test-reach.sh, as it has an interesting repository shape. In particular, its merging strategy and its use of different commit-graphs would demonstrate over- counting if the ahead_behind() method did not already account for that possibility. Also add tests for the specific for-each-ref, branch, and tag builtins. In the case of 'git tag', there are intersting cases that happen when some of the selected tips are not commits. This requires careful logic around commits_nr in the second loop of filter_ahead_behind(). Also, the test in t7004 is carefully located to avoid being dependent on the GPG prereq. It also avoids using the test_commit helper, as that will add ticks to the time and disrupt the expected timestamps in later tag tests. Also add performance tests in a new p1300-graph-walks.sh script. This will be useful for more uses in the future, but for now compare the ahead-behind counting algorithm in 'git for-each-ref' to the naive implementation by running 'git rev-list --count' processes for each input. For the Git source code repository, the improvement is already obvious: Test this tree --------------------------------------------------------------- 1500.2: ahead-behind counts: git for-each-ref 0.07(0.07+0.00) 1500.3: ahead-behind counts: git branch 0.07(0.06+0.00) 1500.4: ahead-behind counts: git tag 0.07(0.06+0.00) 1500.5: ahead-behind counts: git rev-list 1.32(1.04+0.27) But the standard performance benchmark is the Linux kernel repository, which demosntrates a significant improvement: Test this tree --------------------------------------------------------------- 1500.2: ahead-behind counts: git for-each-ref 0.27(0.24+0.02) 1500.3: ahead-behind counts: git branch 0.27(0.24+0.03) 1500.4: ahead-behind counts: git tag 0.28(0.27+0.01) 1500.5: ahead-behind counts: git rev-list 4.57(4.03+0.54) The 'git rev-list' test exists in this change as a demonstration, but it will be removed in the next change to avoid wasting time on this comparison. Signed-off-by: Derrick Stolee <derrickstolee@github.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-03-20 11:26:54 +00:00
#include "commit-reach.h"
static char const * const for_each_ref_usage[] = {
N_("git for-each-ref [<options>] [<pattern>]"),
N_("git for-each-ref [--points-at <object>]"),
N_("git for-each-ref [--merged [<commit>]] [--no-merged [<commit>]]"),
ref-filter: add --no-contains option to tag/branch/for-each-ref Change the tag, branch & for-each-ref commands to have a --no-contains option in addition to their longstanding --contains options. This allows for finding the last-good rollout tag given a known-bad <commit>. Given a hypothetically bad commit cf5c7253e0, the git version to revert to can be found with this hacky two-liner: (git tag -l 'v[0-9]*'; git tag -l --contains cf5c7253e0 'v[0-9]*') | sort | uniq -c | grep -E '^ *1 ' | awk '{print $2}' | tail -n 10 With this new --no-contains option the same can be achieved with: git tag -l --no-contains cf5c7253e0 'v[0-9]*' | sort | tail -n 10 As the filtering machinery is shared between the tag, branch & for-each-ref commands, implement this for those commands too. A practical use for this with "branch" is e.g. finding branches which were branched off between v2.8.0 and v2.10.0: git branch --contains v2.8.0 --no-contains v2.10.0 The "describe" command also has a --contains option, but its semantics are unrelated to what tag/branch/for-each-ref use --contains for. A --no-contains option for "describe" wouldn't make any sense, other than being exactly equivalent to not supplying --contains at all, which would be confusing at best. Add a --without option to "tag" as an alias for --no-contains, for consistency with --with and --contains. The --with option is undocumented, and possibly the only user of it is Junio (<xmqqefy71iej.fsf@gitster.mtv.corp.google.com>). But it's trivial to support, so let's do that. The additions to the the test suite are inverse copies of the corresponding --contains tests. With this change --no-contains for tag, branch & for-each-ref is just as well tested as the existing --contains option. In addition to those tests, add a test for "tag" which asserts that --no-contains won't find tree/blob tags, which is slightly unintuitive, but consistent with how --contains works & is documented. Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2017-03-24 18:40:57 +00:00
N_("git for-each-ref [--contains [<commit>]] [--no-contains [<commit>]]"),
NULL
};
int cmd_for_each_ref(int argc, const char **argv, const char *prefix)
{
int i;
for-each-ref: delay parsing of --sort=<atom> options The for-each-ref family of commands invoke parsers immediately when it sees each --sort=<atom> option, and die before even seeing the other options on the command line when the <atom> is unrecognised. Instead, accumulate them in a string list, and have them parsed into a ref_sorting structure after the command line parsing is done. As a consequence, "git branch --sort=bogus -h" used to fail to give the brief help, which arguably may have been a feature, now does so, which is more consistent with how other options work. The patch is smaller than the actual extent of the "damage" to the codebase, thanks to the fact that the original code consistently used OPT_REF_SORT() macro to handle command line options. We only needed to replace the variable used for the list, and implementation of the callback function used in the macro. The old rule was for the users of the API to: - Declare ref_sorting and ref_sorting_tail variables; - OPT_REF_SORT() macro will instantiate ref_sorting instance (which may barf and die) and append it to the tail; - Append to the tail each ref_sorting read from the configuration by parsing in the config callback (which may barf and die); - See if ref_sorting is null and use ref_sorting_default() instead. Now the rule is not all that different but is simpler: - Declare ref_sorting_options string list. - OPT_REF_SORT() macro will append it to the string list; - Append to the string list the sort key read from the configuration; - call ref_sorting_options() to turn the string list to ref_sorting structure (which also deals with the default value). As side effects, this change also cleans up a few issues: - 95be717c (parse_opt_ref_sorting: always use with NONEG flag, 2019-03-20) muses that "git for-each-ref --no-sort" should simply clear the sort keys accumulated so far; it now does. - The implementation detail of "struct ref_sorting" and the helper function parse_ref_sorting() can now be private to the ref-filter API implementation. - If you set branch.sort to a bogus value, the any "git branch" invocation, not only the listing mode, would abort with the original code; now it doesn't Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-10-20 19:23:53 +00:00
struct ref_sorting *sorting;
struct string_list sorting_options = STRING_LIST_INIT_DUP;
int maxcount = 0, icase = 0, omit_empty = 0;
struct ref_array array;
struct ref_filter filter = REF_FILTER_INIT;
struct ref_format format = REF_FORMAT_INIT;
struct strbuf output = STRBUF_INIT;
struct strbuf err = STRBUF_INIT;
int from_stdin = 0;
struct strvec vec = STRVEC_INIT;
struct option opts[] = {
OPT_BIT('s', "shell", &format.quote_style,
N_("quote placeholders suitably for shells"), QUOTE_SHELL),
OPT_BIT('p', "perl", &format.quote_style,
N_("quote placeholders suitably for perl"), QUOTE_PERL),
OPT_BIT(0 , "python", &format.quote_style,
N_("quote placeholders suitably for python"), QUOTE_PYTHON),
OPT_BIT(0 , "tcl", &format.quote_style,
N_("quote placeholders suitably for Tcl"), QUOTE_TCL),
OPT_BOOL(0, "omit-empty", &omit_empty,
N_("do not output a newline after empty formatted refs")),
OPT_GROUP(""),
OPT_INTEGER( 0 , "count", &maxcount, N_("show only <n> matched refs")),
OPT_STRING( 0 , "format", &format.format, N_("format"), N_("format to use for the output")),
OPT__COLOR(&format.use_color, N_("respect format colors")),
builtin/for-each-ref.c: add `--exclude` option When using `for-each-ref`, it is sometimes convenient for the caller to be able to exclude certain parts of the references. For example, if there are many `refs/__hidden__/*` references, the caller may want to emit all references *except* the hidden ones. Currently, the only way to do this is to post-process the output, like: $ git for-each-ref --format='%(refname)' | grep -v '^refs/hidden/' Which is do-able, but requires processing a potentially large quantity of references. Teach `git for-each-ref` a new `--exclude=<pattern>` option, which excludes references from the results if they match one or more excluded patterns. This patch provides a naive implementation where the `ref_filter` still sees all references (including ones that it will discard) and is left to check whether each reference matches any excluded pattern(s) before emitting them. By culling out references we know the caller doesn't care about, we can avoid allocating memory for their storage, as well as spending time sorting the output (among other things). Even the naive implementation provides a significant speed-up on a modified copy of linux.git (that has a hidden ref pointing at each commit): $ hyperfine \ 'git.compile for-each-ref --format="%(objectname) %(refname)" | grep -vE "[0-9a-f]{40} refs/pull/"' \ 'git.compile for-each-ref --format="%(objectname) %(refname)" --exclude refs/pull/' Benchmark 1: git.compile for-each-ref --format="%(objectname) %(refname)" | grep -vE "[0-9a-f]{40} refs/pull/" Time (mean ± σ): 820.1 ms ± 2.0 ms [User: 703.7 ms, System: 152.0 ms] Range (min … max): 817.7 ms … 823.3 ms 10 runs Benchmark 2: git.compile for-each-ref --format="%(objectname) %(refname)" --exclude refs/pull/ Time (mean ± σ): 106.6 ms ± 1.1 ms [User: 99.4 ms, System: 7.1 ms] Range (min … max): 104.7 ms … 109.1 ms 27 runs Summary 'git.compile for-each-ref --format="%(objectname) %(refname)" --exclude refs/pull/' ran 7.69 ± 0.08 times faster than 'git.compile for-each-ref --format="%(objectname) %(refname)" | grep -vE "[0-9a-f]{40} refs/pull/"' Subsequent patches will improve on this by avoiding visiting excluded sections of the `packed-refs` file in certain cases. Co-authored-by: Jeff King <peff@peff.net> Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Taylor Blau <me@ttaylorr.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-07-10 21:12:19 +00:00
OPT_REF_FILTER_EXCLUDE(&filter),
for-each-ref: delay parsing of --sort=<atom> options The for-each-ref family of commands invoke parsers immediately when it sees each --sort=<atom> option, and die before even seeing the other options on the command line when the <atom> is unrecognised. Instead, accumulate them in a string list, and have them parsed into a ref_sorting structure after the command line parsing is done. As a consequence, "git branch --sort=bogus -h" used to fail to give the brief help, which arguably may have been a feature, now does so, which is more consistent with how other options work. The patch is smaller than the actual extent of the "damage" to the codebase, thanks to the fact that the original code consistently used OPT_REF_SORT() macro to handle command line options. We only needed to replace the variable used for the list, and implementation of the callback function used in the macro. The old rule was for the users of the API to: - Declare ref_sorting and ref_sorting_tail variables; - OPT_REF_SORT() macro will instantiate ref_sorting instance (which may barf and die) and append it to the tail; - Append to the tail each ref_sorting read from the configuration by parsing in the config callback (which may barf and die); - See if ref_sorting is null and use ref_sorting_default() instead. Now the rule is not all that different but is simpler: - Declare ref_sorting_options string list. - OPT_REF_SORT() macro will append it to the string list; - Append to the string list the sort key read from the configuration; - call ref_sorting_options() to turn the string list to ref_sorting structure (which also deals with the default value). As side effects, this change also cleans up a few issues: - 95be717c (parse_opt_ref_sorting: always use with NONEG flag, 2019-03-20) muses that "git for-each-ref --no-sort" should simply clear the sort keys accumulated so far; it now does. - The implementation detail of "struct ref_sorting" and the helper function parse_ref_sorting() can now be private to the ref-filter API implementation. - If you set branch.sort to a bogus value, the any "git branch" invocation, not only the listing mode, would abort with the original code; now it doesn't Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-10-20 19:23:53 +00:00
OPT_REF_SORT(&sorting_options),
OPT_CALLBACK(0, "points-at", &filter.points_at,
N_("object"), N_("print only refs which points at the given object"),
parse_opt_object_name),
OPT_MERGED(&filter, N_("print only refs that are merged")),
OPT_NO_MERGED(&filter, N_("print only refs that are not merged")),
OPT_CONTAINS(&filter.with_commit, N_("print only refs which contain the commit")),
ref-filter: add --no-contains option to tag/branch/for-each-ref Change the tag, branch & for-each-ref commands to have a --no-contains option in addition to their longstanding --contains options. This allows for finding the last-good rollout tag given a known-bad <commit>. Given a hypothetically bad commit cf5c7253e0, the git version to revert to can be found with this hacky two-liner: (git tag -l 'v[0-9]*'; git tag -l --contains cf5c7253e0 'v[0-9]*') | sort | uniq -c | grep -E '^ *1 ' | awk '{print $2}' | tail -n 10 With this new --no-contains option the same can be achieved with: git tag -l --no-contains cf5c7253e0 'v[0-9]*' | sort | tail -n 10 As the filtering machinery is shared between the tag, branch & for-each-ref commands, implement this for those commands too. A practical use for this with "branch" is e.g. finding branches which were branched off between v2.8.0 and v2.10.0: git branch --contains v2.8.0 --no-contains v2.10.0 The "describe" command also has a --contains option, but its semantics are unrelated to what tag/branch/for-each-ref use --contains for. A --no-contains option for "describe" wouldn't make any sense, other than being exactly equivalent to not supplying --contains at all, which would be confusing at best. Add a --without option to "tag" as an alias for --no-contains, for consistency with --with and --contains. The --with option is undocumented, and possibly the only user of it is Junio (<xmqqefy71iej.fsf@gitster.mtv.corp.google.com>). But it's trivial to support, so let's do that. The additions to the the test suite are inverse copies of the corresponding --contains tests. With this change --no-contains for tag, branch & for-each-ref is just as well tested as the existing --contains option. In addition to those tests, add a test for "tag" which asserts that --no-contains won't find tree/blob tags, which is slightly unintuitive, but consistent with how --contains works & is documented. Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2017-03-24 18:40:57 +00:00
OPT_NO_CONTAINS(&filter.no_commit, N_("print only refs which don't contain the commit")),
OPT_BOOL(0, "ignore-case", &icase, N_("sorting and filtering are case insensitive")),
OPT_BOOL(0, "stdin", &from_stdin, N_("read reference patterns from stdin")),
OPT_END(),
};
memset(&array, 0, sizeof(array));
format.format = "%(objectname) %(objecttype)\t%(refname)";
git_config(git_default_config, NULL);
parse_options(argc, argv, prefix, opts, for_each_ref_usage, 0);
if (maxcount < 0) {
error("invalid --count argument: `%d'", maxcount);
usage_with_options(for_each_ref_usage, opts);
}
if (HAS_MULTI_BITS(format.quote_style)) {
error("more than one quoting style?");
usage_with_options(for_each_ref_usage, opts);
}
if (verify_ref_format(&format))
usage_with_options(for_each_ref_usage, opts);
for-each-ref: delay parsing of --sort=<atom> options The for-each-ref family of commands invoke parsers immediately when it sees each --sort=<atom> option, and die before even seeing the other options on the command line when the <atom> is unrecognised. Instead, accumulate them in a string list, and have them parsed into a ref_sorting structure after the command line parsing is done. As a consequence, "git branch --sort=bogus -h" used to fail to give the brief help, which arguably may have been a feature, now does so, which is more consistent with how other options work. The patch is smaller than the actual extent of the "damage" to the codebase, thanks to the fact that the original code consistently used OPT_REF_SORT() macro to handle command line options. We only needed to replace the variable used for the list, and implementation of the callback function used in the macro. The old rule was for the users of the API to: - Declare ref_sorting and ref_sorting_tail variables; - OPT_REF_SORT() macro will instantiate ref_sorting instance (which may barf and die) and append it to the tail; - Append to the tail each ref_sorting read from the configuration by parsing in the config callback (which may barf and die); - See if ref_sorting is null and use ref_sorting_default() instead. Now the rule is not all that different but is simpler: - Declare ref_sorting_options string list. - OPT_REF_SORT() macro will append it to the string list; - Append to the string list the sort key read from the configuration; - call ref_sorting_options() to turn the string list to ref_sorting structure (which also deals with the default value). As side effects, this change also cleans up a few issues: - 95be717c (parse_opt_ref_sorting: always use with NONEG flag, 2019-03-20) muses that "git for-each-ref --no-sort" should simply clear the sort keys accumulated so far; it now does. - The implementation detail of "struct ref_sorting" and the helper function parse_ref_sorting() can now be private to the ref-filter API implementation. - If you set branch.sort to a bogus value, the any "git branch" invocation, not only the listing mode, would abort with the original code; now it doesn't Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-10-20 19:23:53 +00:00
sorting = ref_sorting_options(&sorting_options);
ref_sorting_set_sort_flags_all(sorting, REF_SORTING_ICASE, icase);
filter.ignore_case = icase;
if (from_stdin) {
struct strbuf line = STRBUF_INIT;
if (argv[0])
die(_("unknown arguments supplied with --stdin"));
while (strbuf_getline(&line, stdin) != EOF)
strvec_push(&vec, line.buf);
strbuf_release(&line);
/* vec.v is NULL-terminated, just like 'argv'. */
filter.name_patterns = vec.v;
} else {
filter.name_patterns = argv;
}
filter.match_as_path = 1;
ref-filter: stop setting FILTER_REFS_INCLUDE_BROKEN Of the ref-filter callers, for-each-ref and git-branch both set the INCLUDE_BROKEN flag (but git-tag does not, which is a weird inconsistency). But now that GIT_REF_PARANOIA is on by default, that produces almost the same outcome for all three. The one exception is that GIT_REF_PARANOIA will omit dangling symrefs. That's a better behavior for these tools, as they would never include such a symref in the main output anyway (they can't, as it doesn't point to an object). Instead they issue a warning to stderr. But that warning is somewhat useless; a dangling symref is a perfectly reasonable thing to have in your repository, and is not a sign of corruption. It's much friendlier to just quietly ignore it. And in terms of robustness, the warning gains us little. It does not impact the exit code of either tool. So while the warning _might_ clue in a user that they have an unexpected broken symref, it would not help any kind of scripted use. This patch converts for-each-ref and git-branch to stop using the INCLUDE_BROKEN flag. That gives them more reasonable behavior, and harmonizes them with git-tag. We have to change one test to adapt to the situation. t1430 tries to trigger all of the REF_ISBROKEN behaviors from the underlying ref code. It uses for-each-ref to do so (because there isn't any other mechanism). That will no longer issue a warning about the symref which points to an invalid name, as it's considered dangling (and we can instead be sure that it's _not_ mentioned on stderr). Note that we do still complain about the illegally named "broken..symref"; its problem is not that it's dangling, but the name of the symref itself is illegal. Signed-off-by: Jeff King <peff@peff.net> Reviewed-by: Jonathan Tan <jonathantanmy@google.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-09-24 18:48:05 +00:00
filter_refs(&array, &filter, FILTER_REFS_ALL);
for-each-ref: add ahead-behind format atom The previous change implemented the ahead_behind() method, including an algorithm to compute the ahead/behind values for a number of commit tips relative to a number of commit bases. Now, integrate that algorithm as part of 'git for-each-ref' hidden behind a new format atom, ahead-behind. This naturally extends to 'git branch' and 'git tag' builtins, as well. This format allows specifying multiple bases, if so desired, and all matching references are compared against all of those bases. For this reason, failing to read a reference provided from these atoms results in an error. In order to translate the ahead_behind() method information to the format output code in ref-filter.c, we must populate arrays of ahead_behind_count structs. In struct ref_array, we store the full array that will be passed to ahead_behind(). In struct ref_array_item, we store an array of pointers that point to the relvant items within the full array. In this way, we can pull all relevant ahead/behind values directly when formatting output for a specific item. It also ensures the lifetime of the ahead_behind_count structs matches the time that the array is being used. Add specific tests of the ahead/behind counts in t6600-test-reach.sh, as it has an interesting repository shape. In particular, its merging strategy and its use of different commit-graphs would demonstrate over- counting if the ahead_behind() method did not already account for that possibility. Also add tests for the specific for-each-ref, branch, and tag builtins. In the case of 'git tag', there are intersting cases that happen when some of the selected tips are not commits. This requires careful logic around commits_nr in the second loop of filter_ahead_behind(). Also, the test in t7004 is carefully located to avoid being dependent on the GPG prereq. It also avoids using the test_commit helper, as that will add ticks to the time and disrupt the expected timestamps in later tag tests. Also add performance tests in a new p1300-graph-walks.sh script. This will be useful for more uses in the future, but for now compare the ahead-behind counting algorithm in 'git for-each-ref' to the naive implementation by running 'git rev-list --count' processes for each input. For the Git source code repository, the improvement is already obvious: Test this tree --------------------------------------------------------------- 1500.2: ahead-behind counts: git for-each-ref 0.07(0.07+0.00) 1500.3: ahead-behind counts: git branch 0.07(0.06+0.00) 1500.4: ahead-behind counts: git tag 0.07(0.06+0.00) 1500.5: ahead-behind counts: git rev-list 1.32(1.04+0.27) But the standard performance benchmark is the Linux kernel repository, which demosntrates a significant improvement: Test this tree --------------------------------------------------------------- 1500.2: ahead-behind counts: git for-each-ref 0.27(0.24+0.02) 1500.3: ahead-behind counts: git branch 0.27(0.24+0.03) 1500.4: ahead-behind counts: git tag 0.28(0.27+0.01) 1500.5: ahead-behind counts: git rev-list 4.57(4.03+0.54) The 'git rev-list' test exists in this change as a demonstration, but it will be removed in the next change to avoid wasting time on this comparison. Signed-off-by: Derrick Stolee <derrickstolee@github.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-03-20 11:26:54 +00:00
filter_ahead_behind(the_repository, &format, &array);
ref_array_sort(sorting, &array);
if (!maxcount || array.nr < maxcount)
maxcount = array.nr;
for (i = 0; i < maxcount; i++) {
strbuf_reset(&err);
strbuf_reset(&output);
if (format_ref_array_item(array.items[i], &format, &output, &err))
die("%s", err.buf);
fwrite(output.buf, 1, output.len, stdout);
if (output.len || !omit_empty)
putchar('\n');
}
strbuf_release(&err);
strbuf_release(&output);
ref_array_clear(&array);
ref_filter_clear(&filter);
ref_sorting_release(sorting);
strvec_clear(&vec);
return 0;
}