git/ref-filter.h

210 lines
6.3 KiB
C
Raw Normal View History

#ifndef REF_FILTER_H
#define REF_FILTER_H
#include "gettext.h"
#include "oid-array.h"
#include "commit.h"
for-each-ref: add ahead-behind format atom The previous change implemented the ahead_behind() method, including an algorithm to compute the ahead/behind values for a number of commit tips relative to a number of commit bases. Now, integrate that algorithm as part of 'git for-each-ref' hidden behind a new format atom, ahead-behind. This naturally extends to 'git branch' and 'git tag' builtins, as well. This format allows specifying multiple bases, if so desired, and all matching references are compared against all of those bases. For this reason, failing to read a reference provided from these atoms results in an error. In order to translate the ahead_behind() method information to the format output code in ref-filter.c, we must populate arrays of ahead_behind_count structs. In struct ref_array, we store the full array that will be passed to ahead_behind(). In struct ref_array_item, we store an array of pointers that point to the relvant items within the full array. In this way, we can pull all relevant ahead/behind values directly when formatting output for a specific item. It also ensures the lifetime of the ahead_behind_count structs matches the time that the array is being used. Add specific tests of the ahead/behind counts in t6600-test-reach.sh, as it has an interesting repository shape. In particular, its merging strategy and its use of different commit-graphs would demonstrate over- counting if the ahead_behind() method did not already account for that possibility. Also add tests for the specific for-each-ref, branch, and tag builtins. In the case of 'git tag', there are intersting cases that happen when some of the selected tips are not commits. This requires careful logic around commits_nr in the second loop of filter_ahead_behind(). Also, the test in t7004 is carefully located to avoid being dependent on the GPG prereq. It also avoids using the test_commit helper, as that will add ticks to the time and disrupt the expected timestamps in later tag tests. Also add performance tests in a new p1300-graph-walks.sh script. This will be useful for more uses in the future, but for now compare the ahead-behind counting algorithm in 'git for-each-ref' to the naive implementation by running 'git rev-list --count' processes for each input. For the Git source code repository, the improvement is already obvious: Test this tree --------------------------------------------------------------- 1500.2: ahead-behind counts: git for-each-ref 0.07(0.07+0.00) 1500.3: ahead-behind counts: git branch 0.07(0.06+0.00) 1500.4: ahead-behind counts: git tag 0.07(0.06+0.00) 1500.5: ahead-behind counts: git rev-list 1.32(1.04+0.27) But the standard performance benchmark is the Linux kernel repository, which demosntrates a significant improvement: Test this tree --------------------------------------------------------------- 1500.2: ahead-behind counts: git for-each-ref 0.27(0.24+0.02) 1500.3: ahead-behind counts: git branch 0.27(0.24+0.03) 1500.4: ahead-behind counts: git tag 0.28(0.27+0.01) 1500.5: ahead-behind counts: git rev-list 4.57(4.03+0.54) The 'git rev-list' test exists in this change as a demonstration, but it will be removed in the next change to avoid wasting time on this comparison. Signed-off-by: Derrick Stolee <derrickstolee@github.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-03-20 11:26:54 +00:00
#include "string-list.h"
builtin/for-each-ref.c: add `--exclude` option When using `for-each-ref`, it is sometimes convenient for the caller to be able to exclude certain parts of the references. For example, if there are many `refs/__hidden__/*` references, the caller may want to emit all references *except* the hidden ones. Currently, the only way to do this is to post-process the output, like: $ git for-each-ref --format='%(refname)' | grep -v '^refs/hidden/' Which is do-able, but requires processing a potentially large quantity of references. Teach `git for-each-ref` a new `--exclude=<pattern>` option, which excludes references from the results if they match one or more excluded patterns. This patch provides a naive implementation where the `ref_filter` still sees all references (including ones that it will discard) and is left to check whether each reference matches any excluded pattern(s) before emitting them. By culling out references we know the caller doesn't care about, we can avoid allocating memory for their storage, as well as spending time sorting the output (among other things). Even the naive implementation provides a significant speed-up on a modified copy of linux.git (that has a hidden ref pointing at each commit): $ hyperfine \ 'git.compile for-each-ref --format="%(objectname) %(refname)" | grep -vE "[0-9a-f]{40} refs/pull/"' \ 'git.compile for-each-ref --format="%(objectname) %(refname)" --exclude refs/pull/' Benchmark 1: git.compile for-each-ref --format="%(objectname) %(refname)" | grep -vE "[0-9a-f]{40} refs/pull/" Time (mean ± σ): 820.1 ms ± 2.0 ms [User: 703.7 ms, System: 152.0 ms] Range (min … max): 817.7 ms … 823.3 ms 10 runs Benchmark 2: git.compile for-each-ref --format="%(objectname) %(refname)" --exclude refs/pull/ Time (mean ± σ): 106.6 ms ± 1.1 ms [User: 99.4 ms, System: 7.1 ms] Range (min … max): 104.7 ms … 109.1 ms 27 runs Summary 'git.compile for-each-ref --format="%(objectname) %(refname)" --exclude refs/pull/' ran 7.69 ± 0.08 times faster than 'git.compile for-each-ref --format="%(objectname) %(refname)" | grep -vE "[0-9a-f]{40} refs/pull/"' Subsequent patches will improve on this by avoiding visiting excluded sections of the `packed-refs` file in certain cases. Co-authored-by: Jeff King <peff@peff.net> Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Taylor Blau <me@ttaylorr.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-07-10 21:12:19 +00:00
#include "strvec.h"
#include "commit-reach.h"
/* Quoting styles */
#define QUOTE_NONE 0
#define QUOTE_SHELL 1
#define QUOTE_PERL 2
#define QUOTE_PYTHON 4
#define QUOTE_TCL 8
#define FILTER_REFS_TAGS 0x0002
#define FILTER_REFS_BRANCHES 0x0004
#define FILTER_REFS_REMOTES 0x0008
#define FILTER_REFS_OTHERS 0x0010
#define FILTER_REFS_REGULAR (FILTER_REFS_TAGS | FILTER_REFS_BRANCHES | \
FILTER_REFS_REMOTES | FILTER_REFS_OTHERS)
#define FILTER_REFS_DETACHED_HEAD 0x0020
#define FILTER_REFS_PSEUDOREFS 0x0040
#define FILTER_REFS_ROOT_REFS (FILTER_REFS_DETACHED_HEAD | FILTER_REFS_PSEUDOREFS)
#define FILTER_REFS_KIND_MASK (FILTER_REFS_REGULAR | FILTER_REFS_DETACHED_HEAD | \
FILTER_REFS_PSEUDOREFS)
struct atom_value;
for-each-ref: delay parsing of --sort=<atom> options The for-each-ref family of commands invoke parsers immediately when it sees each --sort=<atom> option, and die before even seeing the other options on the command line when the <atom> is unrecognised. Instead, accumulate them in a string list, and have them parsed into a ref_sorting structure after the command line parsing is done. As a consequence, "git branch --sort=bogus -h" used to fail to give the brief help, which arguably may have been a feature, now does so, which is more consistent with how other options work. The patch is smaller than the actual extent of the "damage" to the codebase, thanks to the fact that the original code consistently used OPT_REF_SORT() macro to handle command line options. We only needed to replace the variable used for the list, and implementation of the callback function used in the macro. The old rule was for the users of the API to: - Declare ref_sorting and ref_sorting_tail variables; - OPT_REF_SORT() macro will instantiate ref_sorting instance (which may barf and die) and append it to the tail; - Append to the tail each ref_sorting read from the configuration by parsing in the config callback (which may barf and die); - See if ref_sorting is null and use ref_sorting_default() instead. Now the rule is not all that different but is simpler: - Declare ref_sorting_options string list. - OPT_REF_SORT() macro will append it to the string list; - Append to the string list the sort key read from the configuration; - call ref_sorting_options() to turn the string list to ref_sorting structure (which also deals with the default value). As side effects, this change also cleans up a few issues: - 95be717c (parse_opt_ref_sorting: always use with NONEG flag, 2019-03-20) muses that "git for-each-ref --no-sort" should simply clear the sort keys accumulated so far; it now does. - The implementation detail of "struct ref_sorting" and the helper function parse_ref_sorting() can now be private to the ref-filter API implementation. - If you set branch.sort to a bogus value, the any "git branch" invocation, not only the listing mode, would abort with the original code; now it doesn't Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-10-20 19:23:53 +00:00
struct ref_sorting;
for-each-ref: add ahead-behind format atom The previous change implemented the ahead_behind() method, including an algorithm to compute the ahead/behind values for a number of commit tips relative to a number of commit bases. Now, integrate that algorithm as part of 'git for-each-ref' hidden behind a new format atom, ahead-behind. This naturally extends to 'git branch' and 'git tag' builtins, as well. This format allows specifying multiple bases, if so desired, and all matching references are compared against all of those bases. For this reason, failing to read a reference provided from these atoms results in an error. In order to translate the ahead_behind() method information to the format output code in ref-filter.c, we must populate arrays of ahead_behind_count structs. In struct ref_array, we store the full array that will be passed to ahead_behind(). In struct ref_array_item, we store an array of pointers that point to the relvant items within the full array. In this way, we can pull all relevant ahead/behind values directly when formatting output for a specific item. It also ensures the lifetime of the ahead_behind_count structs matches the time that the array is being used. Add specific tests of the ahead/behind counts in t6600-test-reach.sh, as it has an interesting repository shape. In particular, its merging strategy and its use of different commit-graphs would demonstrate over- counting if the ahead_behind() method did not already account for that possibility. Also add tests for the specific for-each-ref, branch, and tag builtins. In the case of 'git tag', there are intersting cases that happen when some of the selected tips are not commits. This requires careful logic around commits_nr in the second loop of filter_ahead_behind(). Also, the test in t7004 is carefully located to avoid being dependent on the GPG prereq. It also avoids using the test_commit helper, as that will add ticks to the time and disrupt the expected timestamps in later tag tests. Also add performance tests in a new p1300-graph-walks.sh script. This will be useful for more uses in the future, but for now compare the ahead-behind counting algorithm in 'git for-each-ref' to the naive implementation by running 'git rev-list --count' processes for each input. For the Git source code repository, the improvement is already obvious: Test this tree --------------------------------------------------------------- 1500.2: ahead-behind counts: git for-each-ref 0.07(0.07+0.00) 1500.3: ahead-behind counts: git branch 0.07(0.06+0.00) 1500.4: ahead-behind counts: git tag 0.07(0.06+0.00) 1500.5: ahead-behind counts: git rev-list 1.32(1.04+0.27) But the standard performance benchmark is the Linux kernel repository, which demosntrates a significant improvement: Test this tree --------------------------------------------------------------- 1500.2: ahead-behind counts: git for-each-ref 0.27(0.24+0.02) 1500.3: ahead-behind counts: git branch 0.27(0.24+0.03) 1500.4: ahead-behind counts: git tag 0.28(0.27+0.01) 1500.5: ahead-behind counts: git rev-list 4.57(4.03+0.54) The 'git rev-list' test exists in this change as a demonstration, but it will be removed in the next change to avoid wasting time on this comparison. Signed-off-by: Derrick Stolee <derrickstolee@github.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-03-20 11:26:54 +00:00
struct ahead_behind_count;
struct option;
for-each-ref: delay parsing of --sort=<atom> options The for-each-ref family of commands invoke parsers immediately when it sees each --sort=<atom> option, and die before even seeing the other options on the command line when the <atom> is unrecognised. Instead, accumulate them in a string list, and have them parsed into a ref_sorting structure after the command line parsing is done. As a consequence, "git branch --sort=bogus -h" used to fail to give the brief help, which arguably may have been a feature, now does so, which is more consistent with how other options work. The patch is smaller than the actual extent of the "damage" to the codebase, thanks to the fact that the original code consistently used OPT_REF_SORT() macro to handle command line options. We only needed to replace the variable used for the list, and implementation of the callback function used in the macro. The old rule was for the users of the API to: - Declare ref_sorting and ref_sorting_tail variables; - OPT_REF_SORT() macro will instantiate ref_sorting instance (which may barf and die) and append it to the tail; - Append to the tail each ref_sorting read from the configuration by parsing in the config callback (which may barf and die); - See if ref_sorting is null and use ref_sorting_default() instead. Now the rule is not all that different but is simpler: - Declare ref_sorting_options string list. - OPT_REF_SORT() macro will append it to the string list; - Append to the string list the sort key read from the configuration; - call ref_sorting_options() to turn the string list to ref_sorting structure (which also deals with the default value). As side effects, this change also cleans up a few issues: - 95be717c (parse_opt_ref_sorting: always use with NONEG flag, 2019-03-20) muses that "git for-each-ref --no-sort" should simply clear the sort keys accumulated so far; it now does. - The implementation detail of "struct ref_sorting" and the helper function parse_ref_sorting() can now be private to the ref-filter API implementation. - If you set branch.sort to a bogus value, the any "git branch" invocation, not only the listing mode, would abort with the original code; now it doesn't Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-10-20 19:23:53 +00:00
enum ref_sorting_order {
REF_SORTING_REVERSE = 1<<0,
REF_SORTING_ICASE = 1<<1,
REF_SORTING_VERSION = 1<<2,
REF_SORTING_DETACHED_HEAD_FIRST = 1<<3,
};
struct ref_array_item {
struct object_id objectname;
const char *rest;
int flag;
unsigned int kind;
const char *symref;
struct commit *commit;
struct atom_value *value;
for-each-ref: add ahead-behind format atom The previous change implemented the ahead_behind() method, including an algorithm to compute the ahead/behind values for a number of commit tips relative to a number of commit bases. Now, integrate that algorithm as part of 'git for-each-ref' hidden behind a new format atom, ahead-behind. This naturally extends to 'git branch' and 'git tag' builtins, as well. This format allows specifying multiple bases, if so desired, and all matching references are compared against all of those bases. For this reason, failing to read a reference provided from these atoms results in an error. In order to translate the ahead_behind() method information to the format output code in ref-filter.c, we must populate arrays of ahead_behind_count structs. In struct ref_array, we store the full array that will be passed to ahead_behind(). In struct ref_array_item, we store an array of pointers that point to the relvant items within the full array. In this way, we can pull all relevant ahead/behind values directly when formatting output for a specific item. It also ensures the lifetime of the ahead_behind_count structs matches the time that the array is being used. Add specific tests of the ahead/behind counts in t6600-test-reach.sh, as it has an interesting repository shape. In particular, its merging strategy and its use of different commit-graphs would demonstrate over- counting if the ahead_behind() method did not already account for that possibility. Also add tests for the specific for-each-ref, branch, and tag builtins. In the case of 'git tag', there are intersting cases that happen when some of the selected tips are not commits. This requires careful logic around commits_nr in the second loop of filter_ahead_behind(). Also, the test in t7004 is carefully located to avoid being dependent on the GPG prereq. It also avoids using the test_commit helper, as that will add ticks to the time and disrupt the expected timestamps in later tag tests. Also add performance tests in a new p1300-graph-walks.sh script. This will be useful for more uses in the future, but for now compare the ahead-behind counting algorithm in 'git for-each-ref' to the naive implementation by running 'git rev-list --count' processes for each input. For the Git source code repository, the improvement is already obvious: Test this tree --------------------------------------------------------------- 1500.2: ahead-behind counts: git for-each-ref 0.07(0.07+0.00) 1500.3: ahead-behind counts: git branch 0.07(0.06+0.00) 1500.4: ahead-behind counts: git tag 0.07(0.06+0.00) 1500.5: ahead-behind counts: git rev-list 1.32(1.04+0.27) But the standard performance benchmark is the Linux kernel repository, which demosntrates a significant improvement: Test this tree --------------------------------------------------------------- 1500.2: ahead-behind counts: git for-each-ref 0.27(0.24+0.02) 1500.3: ahead-behind counts: git branch 0.27(0.24+0.03) 1500.4: ahead-behind counts: git tag 0.28(0.27+0.01) 1500.5: ahead-behind counts: git rev-list 4.57(4.03+0.54) The 'git rev-list' test exists in this change as a demonstration, but it will be removed in the next change to avoid wasting time on this comparison. Signed-off-by: Derrick Stolee <derrickstolee@github.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-03-20 11:26:54 +00:00
struct ahead_behind_count **counts;
char refname[FLEX_ARRAY];
};
struct ref_array {
int nr, alloc;
struct ref_array_item **items;
struct rev_info *revs;
for-each-ref: add ahead-behind format atom The previous change implemented the ahead_behind() method, including an algorithm to compute the ahead/behind values for a number of commit tips relative to a number of commit bases. Now, integrate that algorithm as part of 'git for-each-ref' hidden behind a new format atom, ahead-behind. This naturally extends to 'git branch' and 'git tag' builtins, as well. This format allows specifying multiple bases, if so desired, and all matching references are compared against all of those bases. For this reason, failing to read a reference provided from these atoms results in an error. In order to translate the ahead_behind() method information to the format output code in ref-filter.c, we must populate arrays of ahead_behind_count structs. In struct ref_array, we store the full array that will be passed to ahead_behind(). In struct ref_array_item, we store an array of pointers that point to the relvant items within the full array. In this way, we can pull all relevant ahead/behind values directly when formatting output for a specific item. It also ensures the lifetime of the ahead_behind_count structs matches the time that the array is being used. Add specific tests of the ahead/behind counts in t6600-test-reach.sh, as it has an interesting repository shape. In particular, its merging strategy and its use of different commit-graphs would demonstrate over- counting if the ahead_behind() method did not already account for that possibility. Also add tests for the specific for-each-ref, branch, and tag builtins. In the case of 'git tag', there are intersting cases that happen when some of the selected tips are not commits. This requires careful logic around commits_nr in the second loop of filter_ahead_behind(). Also, the test in t7004 is carefully located to avoid being dependent on the GPG prereq. It also avoids using the test_commit helper, as that will add ticks to the time and disrupt the expected timestamps in later tag tests. Also add performance tests in a new p1300-graph-walks.sh script. This will be useful for more uses in the future, but for now compare the ahead-behind counting algorithm in 'git for-each-ref' to the naive implementation by running 'git rev-list --count' processes for each input. For the Git source code repository, the improvement is already obvious: Test this tree --------------------------------------------------------------- 1500.2: ahead-behind counts: git for-each-ref 0.07(0.07+0.00) 1500.3: ahead-behind counts: git branch 0.07(0.06+0.00) 1500.4: ahead-behind counts: git tag 0.07(0.06+0.00) 1500.5: ahead-behind counts: git rev-list 1.32(1.04+0.27) But the standard performance benchmark is the Linux kernel repository, which demosntrates a significant improvement: Test this tree --------------------------------------------------------------- 1500.2: ahead-behind counts: git for-each-ref 0.27(0.24+0.02) 1500.3: ahead-behind counts: git branch 0.27(0.24+0.03) 1500.4: ahead-behind counts: git tag 0.28(0.27+0.01) 1500.5: ahead-behind counts: git rev-list 4.57(4.03+0.54) The 'git rev-list' test exists in this change as a demonstration, but it will be removed in the next change to avoid wasting time on this comparison. Signed-off-by: Derrick Stolee <derrickstolee@github.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-03-20 11:26:54 +00:00
struct ahead_behind_count *counts;
size_t counts_nr;
};
struct ref_filter {
const char **name_patterns;
builtin/for-each-ref.c: add `--exclude` option When using `for-each-ref`, it is sometimes convenient for the caller to be able to exclude certain parts of the references. For example, if there are many `refs/__hidden__/*` references, the caller may want to emit all references *except* the hidden ones. Currently, the only way to do this is to post-process the output, like: $ git for-each-ref --format='%(refname)' | grep -v '^refs/hidden/' Which is do-able, but requires processing a potentially large quantity of references. Teach `git for-each-ref` a new `--exclude=<pattern>` option, which excludes references from the results if they match one or more excluded patterns. This patch provides a naive implementation where the `ref_filter` still sees all references (including ones that it will discard) and is left to check whether each reference matches any excluded pattern(s) before emitting them. By culling out references we know the caller doesn't care about, we can avoid allocating memory for their storage, as well as spending time sorting the output (among other things). Even the naive implementation provides a significant speed-up on a modified copy of linux.git (that has a hidden ref pointing at each commit): $ hyperfine \ 'git.compile for-each-ref --format="%(objectname) %(refname)" | grep -vE "[0-9a-f]{40} refs/pull/"' \ 'git.compile for-each-ref --format="%(objectname) %(refname)" --exclude refs/pull/' Benchmark 1: git.compile for-each-ref --format="%(objectname) %(refname)" | grep -vE "[0-9a-f]{40} refs/pull/" Time (mean ± σ): 820.1 ms ± 2.0 ms [User: 703.7 ms, System: 152.0 ms] Range (min … max): 817.7 ms … 823.3 ms 10 runs Benchmark 2: git.compile for-each-ref --format="%(objectname) %(refname)" --exclude refs/pull/ Time (mean ± σ): 106.6 ms ± 1.1 ms [User: 99.4 ms, System: 7.1 ms] Range (min … max): 104.7 ms … 109.1 ms 27 runs Summary 'git.compile for-each-ref --format="%(objectname) %(refname)" --exclude refs/pull/' ran 7.69 ± 0.08 times faster than 'git.compile for-each-ref --format="%(objectname) %(refname)" | grep -vE "[0-9a-f]{40} refs/pull/"' Subsequent patches will improve on this by avoiding visiting excluded sections of the `packed-refs` file in certain cases. Co-authored-by: Jeff King <peff@peff.net> Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Taylor Blau <me@ttaylorr.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-07-10 21:12:19 +00:00
struct strvec exclude;
struct oid_array points_at;
struct commit_list *with_commit;
ref-filter: add --no-contains option to tag/branch/for-each-ref Change the tag, branch & for-each-ref commands to have a --no-contains option in addition to their longstanding --contains options. This allows for finding the last-good rollout tag given a known-bad <commit>. Given a hypothetically bad commit cf5c7253e0, the git version to revert to can be found with this hacky two-liner: (git tag -l 'v[0-9]*'; git tag -l --contains cf5c7253e0 'v[0-9]*') | sort | uniq -c | grep -E '^ *1 ' | awk '{print $2}' | tail -n 10 With this new --no-contains option the same can be achieved with: git tag -l --no-contains cf5c7253e0 'v[0-9]*' | sort | tail -n 10 As the filtering machinery is shared between the tag, branch & for-each-ref commands, implement this for those commands too. A practical use for this with "branch" is e.g. finding branches which were branched off between v2.8.0 and v2.10.0: git branch --contains v2.8.0 --no-contains v2.10.0 The "describe" command also has a --contains option, but its semantics are unrelated to what tag/branch/for-each-ref use --contains for. A --no-contains option for "describe" wouldn't make any sense, other than being exactly equivalent to not supplying --contains at all, which would be confusing at best. Add a --without option to "tag" as an alias for --no-contains, for consistency with --with and --contains. The --with option is undocumented, and possibly the only user of it is Junio (<xmqqefy71iej.fsf@gitster.mtv.corp.google.com>). But it's trivial to support, so let's do that. The additions to the the test suite are inverse copies of the corresponding --contains tests. With this change --no-contains for tag, branch & for-each-ref is just as well tested as the existing --contains option. In addition to those tests, add a test for "tag" which asserts that --no-contains won't find tree/blob tags, which is slightly unintuitive, but consistent with how --contains works & is documented. Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2017-03-24 18:40:57 +00:00
struct commit_list *no_commit;
struct commit_list *reachable_from;
struct commit_list *unreachable_from;
unsigned int with_commit_tag_algo : 1,
match_as_path : 1,
ignore_case : 1,
detached : 1;
unsigned int kind,
lines;
int abbrev,
verbose;
struct {
struct contains_cache contains_cache;
struct contains_cache no_contains_cache;
} internal;
};
struct ref_format {
/*
* Set these to define the format; make sure you call
* verify_ref_format() afterwards to finalize.
*/
const char *format;
const char *rest;
int quote_style;
int use_color;
/* Internal state to ref-filter */
int need_color_reset_at_eol;
for-each-ref: add ahead-behind format atom The previous change implemented the ahead_behind() method, including an algorithm to compute the ahead/behind values for a number of commit tips relative to a number of commit bases. Now, integrate that algorithm as part of 'git for-each-ref' hidden behind a new format atom, ahead-behind. This naturally extends to 'git branch' and 'git tag' builtins, as well. This format allows specifying multiple bases, if so desired, and all matching references are compared against all of those bases. For this reason, failing to read a reference provided from these atoms results in an error. In order to translate the ahead_behind() method information to the format output code in ref-filter.c, we must populate arrays of ahead_behind_count structs. In struct ref_array, we store the full array that will be passed to ahead_behind(). In struct ref_array_item, we store an array of pointers that point to the relvant items within the full array. In this way, we can pull all relevant ahead/behind values directly when formatting output for a specific item. It also ensures the lifetime of the ahead_behind_count structs matches the time that the array is being used. Add specific tests of the ahead/behind counts in t6600-test-reach.sh, as it has an interesting repository shape. In particular, its merging strategy and its use of different commit-graphs would demonstrate over- counting if the ahead_behind() method did not already account for that possibility. Also add tests for the specific for-each-ref, branch, and tag builtins. In the case of 'git tag', there are intersting cases that happen when some of the selected tips are not commits. This requires careful logic around commits_nr in the second loop of filter_ahead_behind(). Also, the test in t7004 is carefully located to avoid being dependent on the GPG prereq. It also avoids using the test_commit helper, as that will add ticks to the time and disrupt the expected timestamps in later tag tests. Also add performance tests in a new p1300-graph-walks.sh script. This will be useful for more uses in the future, but for now compare the ahead-behind counting algorithm in 'git for-each-ref' to the naive implementation by running 'git rev-list --count' processes for each input. For the Git source code repository, the improvement is already obvious: Test this tree --------------------------------------------------------------- 1500.2: ahead-behind counts: git for-each-ref 0.07(0.07+0.00) 1500.3: ahead-behind counts: git branch 0.07(0.06+0.00) 1500.4: ahead-behind counts: git tag 0.07(0.06+0.00) 1500.5: ahead-behind counts: git rev-list 1.32(1.04+0.27) But the standard performance benchmark is the Linux kernel repository, which demosntrates a significant improvement: Test this tree --------------------------------------------------------------- 1500.2: ahead-behind counts: git for-each-ref 0.27(0.24+0.02) 1500.3: ahead-behind counts: git branch 0.27(0.24+0.03) 1500.4: ahead-behind counts: git tag 0.28(0.27+0.01) 1500.5: ahead-behind counts: git rev-list 4.57(4.03+0.54) The 'git rev-list' test exists in this change as a demonstration, but it will be removed in the next change to avoid wasting time on this comparison. Signed-off-by: Derrick Stolee <derrickstolee@github.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-03-20 11:26:54 +00:00
/* List of bases for ahead-behind counts. */
struct string_list bases;
struct {
int max_count;
int omit_empty;
} array_opts;
};
#define REF_FILTER_INIT { \
.points_at = OID_ARRAY_INIT, \
builtin/for-each-ref.c: add `--exclude` option When using `for-each-ref`, it is sometimes convenient for the caller to be able to exclude certain parts of the references. For example, if there are many `refs/__hidden__/*` references, the caller may want to emit all references *except* the hidden ones. Currently, the only way to do this is to post-process the output, like: $ git for-each-ref --format='%(refname)' | grep -v '^refs/hidden/' Which is do-able, but requires processing a potentially large quantity of references. Teach `git for-each-ref` a new `--exclude=<pattern>` option, which excludes references from the results if they match one or more excluded patterns. This patch provides a naive implementation where the `ref_filter` still sees all references (including ones that it will discard) and is left to check whether each reference matches any excluded pattern(s) before emitting them. By culling out references we know the caller doesn't care about, we can avoid allocating memory for their storage, as well as spending time sorting the output (among other things). Even the naive implementation provides a significant speed-up on a modified copy of linux.git (that has a hidden ref pointing at each commit): $ hyperfine \ 'git.compile for-each-ref --format="%(objectname) %(refname)" | grep -vE "[0-9a-f]{40} refs/pull/"' \ 'git.compile for-each-ref --format="%(objectname) %(refname)" --exclude refs/pull/' Benchmark 1: git.compile for-each-ref --format="%(objectname) %(refname)" | grep -vE "[0-9a-f]{40} refs/pull/" Time (mean ± σ): 820.1 ms ± 2.0 ms [User: 703.7 ms, System: 152.0 ms] Range (min … max): 817.7 ms … 823.3 ms 10 runs Benchmark 2: git.compile for-each-ref --format="%(objectname) %(refname)" --exclude refs/pull/ Time (mean ± σ): 106.6 ms ± 1.1 ms [User: 99.4 ms, System: 7.1 ms] Range (min … max): 104.7 ms … 109.1 ms 27 runs Summary 'git.compile for-each-ref --format="%(objectname) %(refname)" --exclude refs/pull/' ran 7.69 ± 0.08 times faster than 'git.compile for-each-ref --format="%(objectname) %(refname)" | grep -vE "[0-9a-f]{40} refs/pull/"' Subsequent patches will improve on this by avoiding visiting excluded sections of the `packed-refs` file in certain cases. Co-authored-by: Jeff King <peff@peff.net> Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Taylor Blau <me@ttaylorr.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-07-10 21:12:19 +00:00
.exclude = STRVEC_INIT, \
}
for-each-ref: add ahead-behind format atom The previous change implemented the ahead_behind() method, including an algorithm to compute the ahead/behind values for a number of commit tips relative to a number of commit bases. Now, integrate that algorithm as part of 'git for-each-ref' hidden behind a new format atom, ahead-behind. This naturally extends to 'git branch' and 'git tag' builtins, as well. This format allows specifying multiple bases, if so desired, and all matching references are compared against all of those bases. For this reason, failing to read a reference provided from these atoms results in an error. In order to translate the ahead_behind() method information to the format output code in ref-filter.c, we must populate arrays of ahead_behind_count structs. In struct ref_array, we store the full array that will be passed to ahead_behind(). In struct ref_array_item, we store an array of pointers that point to the relvant items within the full array. In this way, we can pull all relevant ahead/behind values directly when formatting output for a specific item. It also ensures the lifetime of the ahead_behind_count structs matches the time that the array is being used. Add specific tests of the ahead/behind counts in t6600-test-reach.sh, as it has an interesting repository shape. In particular, its merging strategy and its use of different commit-graphs would demonstrate over- counting if the ahead_behind() method did not already account for that possibility. Also add tests for the specific for-each-ref, branch, and tag builtins. In the case of 'git tag', there are intersting cases that happen when some of the selected tips are not commits. This requires careful logic around commits_nr in the second loop of filter_ahead_behind(). Also, the test in t7004 is carefully located to avoid being dependent on the GPG prereq. It also avoids using the test_commit helper, as that will add ticks to the time and disrupt the expected timestamps in later tag tests. Also add performance tests in a new p1300-graph-walks.sh script. This will be useful for more uses in the future, but for now compare the ahead-behind counting algorithm in 'git for-each-ref' to the naive implementation by running 'git rev-list --count' processes for each input. For the Git source code repository, the improvement is already obvious: Test this tree --------------------------------------------------------------- 1500.2: ahead-behind counts: git for-each-ref 0.07(0.07+0.00) 1500.3: ahead-behind counts: git branch 0.07(0.06+0.00) 1500.4: ahead-behind counts: git tag 0.07(0.06+0.00) 1500.5: ahead-behind counts: git rev-list 1.32(1.04+0.27) But the standard performance benchmark is the Linux kernel repository, which demosntrates a significant improvement: Test this tree --------------------------------------------------------------- 1500.2: ahead-behind counts: git for-each-ref 0.27(0.24+0.02) 1500.3: ahead-behind counts: git branch 0.27(0.24+0.03) 1500.4: ahead-behind counts: git tag 0.28(0.27+0.01) 1500.5: ahead-behind counts: git rev-list 4.57(4.03+0.54) The 'git rev-list' test exists in this change as a demonstration, but it will be removed in the next change to avoid wasting time on this comparison. Signed-off-by: Derrick Stolee <derrickstolee@github.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-03-20 11:26:54 +00:00
#define REF_FORMAT_INIT { \
.use_color = -1, \
.bases = STRING_LIST_INIT_DUP, \
}
/* Macros for checking --merged and --no-merged options */
#define _OPT_MERGED_NO_MERGED(option, filter, h) \
{ OPTION_CALLBACK, 0, option, (filter), N_("commit"), (h), \
PARSE_OPT_LASTARG_DEFAULT | PARSE_OPT_NONEG, \
parse_opt_merge_filter, (intptr_t) "HEAD" \
}
#define OPT_MERGED(f, h) _OPT_MERGED_NO_MERGED("merged", f, h)
#define OPT_NO_MERGED(f, h) _OPT_MERGED_NO_MERGED("no-merged", f, h)
#define OPT_REF_SORT(var) \
for-each-ref: delay parsing of --sort=<atom> options The for-each-ref family of commands invoke parsers immediately when it sees each --sort=<atom> option, and die before even seeing the other options on the command line when the <atom> is unrecognised. Instead, accumulate them in a string list, and have them parsed into a ref_sorting structure after the command line parsing is done. As a consequence, "git branch --sort=bogus -h" used to fail to give the brief help, which arguably may have been a feature, now does so, which is more consistent with how other options work. The patch is smaller than the actual extent of the "damage" to the codebase, thanks to the fact that the original code consistently used OPT_REF_SORT() macro to handle command line options. We only needed to replace the variable used for the list, and implementation of the callback function used in the macro. The old rule was for the users of the API to: - Declare ref_sorting and ref_sorting_tail variables; - OPT_REF_SORT() macro will instantiate ref_sorting instance (which may barf and die) and append it to the tail; - Append to the tail each ref_sorting read from the configuration by parsing in the config callback (which may barf and die); - See if ref_sorting is null and use ref_sorting_default() instead. Now the rule is not all that different but is simpler: - Declare ref_sorting_options string list. - OPT_REF_SORT() macro will append it to the string list; - Append to the string list the sort key read from the configuration; - call ref_sorting_options() to turn the string list to ref_sorting structure (which also deals with the default value). As side effects, this change also cleans up a few issues: - 95be717c (parse_opt_ref_sorting: always use with NONEG flag, 2019-03-20) muses that "git for-each-ref --no-sort" should simply clear the sort keys accumulated so far; it now does. - The implementation detail of "struct ref_sorting" and the helper function parse_ref_sorting() can now be private to the ref-filter API implementation. - If you set branch.sort to a bogus value, the any "git branch" invocation, not only the listing mode, would abort with the original code; now it doesn't Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-10-20 19:23:53 +00:00
OPT_STRING_LIST(0, "sort", (var), \
N_("key"), N_("field name to sort on"))
builtin/for-each-ref.c: add `--exclude` option When using `for-each-ref`, it is sometimes convenient for the caller to be able to exclude certain parts of the references. For example, if there are many `refs/__hidden__/*` references, the caller may want to emit all references *except* the hidden ones. Currently, the only way to do this is to post-process the output, like: $ git for-each-ref --format='%(refname)' | grep -v '^refs/hidden/' Which is do-able, but requires processing a potentially large quantity of references. Teach `git for-each-ref` a new `--exclude=<pattern>` option, which excludes references from the results if they match one or more excluded patterns. This patch provides a naive implementation where the `ref_filter` still sees all references (including ones that it will discard) and is left to check whether each reference matches any excluded pattern(s) before emitting them. By culling out references we know the caller doesn't care about, we can avoid allocating memory for their storage, as well as spending time sorting the output (among other things). Even the naive implementation provides a significant speed-up on a modified copy of linux.git (that has a hidden ref pointing at each commit): $ hyperfine \ 'git.compile for-each-ref --format="%(objectname) %(refname)" | grep -vE "[0-9a-f]{40} refs/pull/"' \ 'git.compile for-each-ref --format="%(objectname) %(refname)" --exclude refs/pull/' Benchmark 1: git.compile for-each-ref --format="%(objectname) %(refname)" | grep -vE "[0-9a-f]{40} refs/pull/" Time (mean ± σ): 820.1 ms ± 2.0 ms [User: 703.7 ms, System: 152.0 ms] Range (min … max): 817.7 ms … 823.3 ms 10 runs Benchmark 2: git.compile for-each-ref --format="%(objectname) %(refname)" --exclude refs/pull/ Time (mean ± σ): 106.6 ms ± 1.1 ms [User: 99.4 ms, System: 7.1 ms] Range (min … max): 104.7 ms … 109.1 ms 27 runs Summary 'git.compile for-each-ref --format="%(objectname) %(refname)" --exclude refs/pull/' ran 7.69 ± 0.08 times faster than 'git.compile for-each-ref --format="%(objectname) %(refname)" | grep -vE "[0-9a-f]{40} refs/pull/"' Subsequent patches will improve on this by avoiding visiting excluded sections of the `packed-refs` file in certain cases. Co-authored-by: Jeff King <peff@peff.net> Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Taylor Blau <me@ttaylorr.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-07-10 21:12:19 +00:00
#define OPT_REF_FILTER_EXCLUDE(var) \
OPT_STRVEC(0, "exclude", &(var)->exclude, \
N_("pattern"), N_("exclude refs which match pattern"))
/*
* API for filtering a set of refs. Based on the type of refs the user
* has requested, we iterate through those refs and apply filters
* as per the given ref_filter structure and finally store the
* filtered refs in the ref_array structure.
*/
int filter_refs(struct ref_array *array, struct ref_filter *filter, unsigned int type);
/*
* Filter refs using the given ref_filter and type, sort the contents
* according to the given ref_sorting, format the filtered refs with the
* given ref_format, and print them to stdout.
*/
void filter_and_format_refs(struct ref_filter *filter, unsigned int type,
struct ref_sorting *sorting,
struct ref_format *format);
/* Clear all memory allocated to ref_array */
void ref_array_clear(struct ref_array *array);
/* Used to verify if the given format is correct and to parse out the used atoms */
int verify_ref_format(struct ref_format *format);
/* Sort the given ref_array as per the ref_sorting provided */
void ref_array_sort(struct ref_sorting *sort, struct ref_array *array);
/* Set REF_SORTING_* sort_flags for all elements of a sorting list */
void ref_sorting_set_sort_flags_all(struct ref_sorting *sorting, unsigned int mask, int on);
/* Based on the given format and quote_style, fill the strbuf */
int format_ref_array_item(struct ref_array_item *info,
struct ref_format *format,
struct strbuf *final_buf,
struct strbuf *error_buf);
/* Release a "struct ref_sorting" */
void ref_sorting_release(struct ref_sorting *);
for-each-ref: delay parsing of --sort=<atom> options The for-each-ref family of commands invoke parsers immediately when it sees each --sort=<atom> option, and die before even seeing the other options on the command line when the <atom> is unrecognised. Instead, accumulate them in a string list, and have them parsed into a ref_sorting structure after the command line parsing is done. As a consequence, "git branch --sort=bogus -h" used to fail to give the brief help, which arguably may have been a feature, now does so, which is more consistent with how other options work. The patch is smaller than the actual extent of the "damage" to the codebase, thanks to the fact that the original code consistently used OPT_REF_SORT() macro to handle command line options. We only needed to replace the variable used for the list, and implementation of the callback function used in the macro. The old rule was for the users of the API to: - Declare ref_sorting and ref_sorting_tail variables; - OPT_REF_SORT() macro will instantiate ref_sorting instance (which may barf and die) and append it to the tail; - Append to the tail each ref_sorting read from the configuration by parsing in the config callback (which may barf and die); - See if ref_sorting is null and use ref_sorting_default() instead. Now the rule is not all that different but is simpler: - Declare ref_sorting_options string list. - OPT_REF_SORT() macro will append it to the string list; - Append to the string list the sort key read from the configuration; - call ref_sorting_options() to turn the string list to ref_sorting structure (which also deals with the default value). As side effects, this change also cleans up a few issues: - 95be717c (parse_opt_ref_sorting: always use with NONEG flag, 2019-03-20) muses that "git for-each-ref --no-sort" should simply clear the sort keys accumulated so far; it now does. - The implementation detail of "struct ref_sorting" and the helper function parse_ref_sorting() can now be private to the ref-filter API implementation. - If you set branch.sort to a bogus value, the any "git branch" invocation, not only the listing mode, would abort with the original code; now it doesn't Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-10-20 19:23:53 +00:00
/* Convert list of sort options into ref_sorting */
struct ref_sorting *ref_sorting_options(struct string_list *);
/* Function to parse --merged and --no-merged options */
int parse_opt_merge_filter(const struct option *opt, const char *arg, int unset);
/* Get the current HEAD's description */
char *get_head_description(void);
/* Set up translated strings in the output. */
void setup_ref_filter_porcelain_msg(void);
/*
* Print up to maxcount ref_array elements to stdout using the given
* ref_format.
*/
void print_formatted_ref_array(struct ref_array *array, struct ref_format *format);
/*
* Print a single ref, outside of any ref-filter. Note that the
* name must be a fully qualified refname.
*/
void pretty_print_ref(const char *name, const struct object_id *oid,
struct ref_format *format);
/*
* Push a single ref onto the array; this can be used to construct your own
* ref_array without using filter_refs().
*/
struct ref_array_item *ref_array_push(struct ref_array *array,
const char *refname,
const struct object_id *oid);
for-each-ref: add ahead-behind format atom The previous change implemented the ahead_behind() method, including an algorithm to compute the ahead/behind values for a number of commit tips relative to a number of commit bases. Now, integrate that algorithm as part of 'git for-each-ref' hidden behind a new format atom, ahead-behind. This naturally extends to 'git branch' and 'git tag' builtins, as well. This format allows specifying multiple bases, if so desired, and all matching references are compared against all of those bases. For this reason, failing to read a reference provided from these atoms results in an error. In order to translate the ahead_behind() method information to the format output code in ref-filter.c, we must populate arrays of ahead_behind_count structs. In struct ref_array, we store the full array that will be passed to ahead_behind(). In struct ref_array_item, we store an array of pointers that point to the relvant items within the full array. In this way, we can pull all relevant ahead/behind values directly when formatting output for a specific item. It also ensures the lifetime of the ahead_behind_count structs matches the time that the array is being used. Add specific tests of the ahead/behind counts in t6600-test-reach.sh, as it has an interesting repository shape. In particular, its merging strategy and its use of different commit-graphs would demonstrate over- counting if the ahead_behind() method did not already account for that possibility. Also add tests for the specific for-each-ref, branch, and tag builtins. In the case of 'git tag', there are intersting cases that happen when some of the selected tips are not commits. This requires careful logic around commits_nr in the second loop of filter_ahead_behind(). Also, the test in t7004 is carefully located to avoid being dependent on the GPG prereq. It also avoids using the test_commit helper, as that will add ticks to the time and disrupt the expected timestamps in later tag tests. Also add performance tests in a new p1300-graph-walks.sh script. This will be useful for more uses in the future, but for now compare the ahead-behind counting algorithm in 'git for-each-ref' to the naive implementation by running 'git rev-list --count' processes for each input. For the Git source code repository, the improvement is already obvious: Test this tree --------------------------------------------------------------- 1500.2: ahead-behind counts: git for-each-ref 0.07(0.07+0.00) 1500.3: ahead-behind counts: git branch 0.07(0.06+0.00) 1500.4: ahead-behind counts: git tag 0.07(0.06+0.00) 1500.5: ahead-behind counts: git rev-list 1.32(1.04+0.27) But the standard performance benchmark is the Linux kernel repository, which demosntrates a significant improvement: Test this tree --------------------------------------------------------------- 1500.2: ahead-behind counts: git for-each-ref 0.27(0.24+0.02) 1500.3: ahead-behind counts: git branch 0.27(0.24+0.03) 1500.4: ahead-behind counts: git tag 0.28(0.27+0.01) 1500.5: ahead-behind counts: git rev-list 4.57(4.03+0.54) The 'git rev-list' test exists in this change as a demonstration, but it will be removed in the next change to avoid wasting time on this comparison. Signed-off-by: Derrick Stolee <derrickstolee@github.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-03-20 11:26:54 +00:00
/*
* If the provided format includes ahead-behind atoms, then compute the
* ahead-behind values for the array of filtered references. Must be
* called after filter_refs() but before outputting the formatted refs.
*
* If this is not called, then any ahead-behind atoms will be blank.
*/
void filter_ahead_behind(struct repository *r,
struct ref_format *format,
struct ref_array *array);
void ref_filter_init(struct ref_filter *filter);
void ref_filter_clear(struct ref_filter *filter);
#endif /* REF_FILTER_H */