git/t/helper
Taylor Blau 59c35fac54 refs/packed-backend.c: implement jump lists to avoid excluded pattern(s)
When iterating through the `packed-refs` file in order to answer a query
like:

    $ git for-each-ref --exclude=refs/__hidden__

it would be useful to avoid walking over all of the entries in
`refs/__hidden__/*` when possible, since we know that the ref-filter
code is going to throw them away anyways.

In certain circumstances, doing so is possible. The algorithm for doing
so is as follows:

  - For each excluded pattern, find the first record that matches it,
    and the first record that *doesn't* match it (i.e. the location
    you'd next want to consider when excluding that pattern).

  - Sort the set of excluded regions from the previous step in ascending
    order of the first location within the `packed-refs` file that
    matches.

  - Clean up the results from the previous step: discard empty regions,
    and combine adjacent regions. The set of regions which remains is
    referred to as the "jump list", and never contains any references
    which should be included in the result set.

Then when iterating through the `packed-refs` file, if `iter->pos` is
ever contained in one of the regions from the previous steps, advance
`iter->pos` past the end of that region, and continue enumeration.

Note that we only perform this optimization when none of the excluded
pattern(s) have special meta-characters in them. For a pattern like
"refs/foo[ac]", the excluded regions ("refs/fooa", "refs/fooc", and
everything underneath them) are not connected. A future implementation
that handles this case may split the character class (pretending as if
two patterns were excluded: "refs/fooa", and "refs/fooc").

There are a few other gotchas worth considering. First, note that the
jump list is sorted, so once we jump past a region, we can avoid
considering it (or any regions preceding it) again. The member
`jump_pos` is used to track the first next-possible region to jump
through.

Second, note that the jump list is best-effort, since we do not handle
loose references, and because of the meta-character issue above. The
jump list may not skip past all references which won't appear in the
results, but will never skip over a reference which does appear in the
result set.

In repositories with a large number of hidden references, the speed-up
can be significant. Tests here are done with a copy of linux.git with a
reference "refs/pull/N" pointing at every commit, as in:

    $ git rev-list HEAD | awk '{ print "create refs/pull/" NR " " $0 }' |
        git update-ref --stdin
    $ git pack-refs --all

, it is significantly faster to have `for-each-ref` jump over the
excluded references, as opposed to filtering them out after the fact:

    $ hyperfine \
      'git for-each-ref --format="%(objectname) %(refname)" | grep -vE "^[0-9a-f]{40} refs/pull/"' \
      'git.prev for-each-ref --format="%(objectname) %(refname)" --exclude="refs/pull"' \
      'git.compile for-each-ref --format="%(objectname) %(refname)" --exclude="refs/pull"'
    Benchmark 1: git for-each-ref --format="%(objectname) %(refname)" | grep -vE "^[0-9a-f]{40} refs/pull/"
      Time (mean ± σ):     798.1 ms ±   3.3 ms    [User: 687.6 ms, System: 146.4 ms]
      Range (min … max):   794.5 ms … 805.5 ms    10 runs

    Benchmark 2: git.prev for-each-ref --format="%(objectname) %(refname)" --exclude="refs/pull"
      Time (mean ± σ):      98.9 ms ±   1.4 ms    [User: 93.1 ms, System: 5.7 ms]
      Range (min … max):    97.0 ms … 104.0 ms    29 runs

    Benchmark 3: git.compile for-each-ref --format="%(objectname) %(refname)" --exclude="refs/pull"
      Time (mean ± σ):       4.5 ms ±   0.2 ms    [User: 0.7 ms, System: 3.8 ms]
      Range (min … max):     4.1 ms …   5.8 ms    524 runs

    Summary
      'git.compile for-each-ref --format="%(objectname) %(refname)" --exclude="refs/pull"' ran
       21.87 ± 1.05 times faster than 'git.prev for-each-ref --format="%(objectname) %(refname)" --exclude="refs/pull"'
      176.52 ± 8.19 times faster than 'git for-each-ref --format="%(objectname) %(refname)" | grep -vE "^[0-9a-f]{40} refs/pull/"'

(Comparing stock git and this patch isn't quite fair, since an earlier
commit in this series adds a naive implementation of the `--exclude`
option. `git.prev` is built from the previous commit and includes this
naive implementation).

Using the jump list is fairly straightforward (see the changes to
`refs/packed-backend.c::next_record()`), but constructing the list is
not. To ensure that the construction is correct, add a new suite of
tests in t1419 covering various corner cases (overlapping regions,
partially overlapping regions, adjacent regions, etc.).

Co-authored-by: Jeff King <peff@peff.net>
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-07-10 14:48:55 -07:00
..
.gitignore
test-advise.c treewide: remove cache.h inclusion due to setup.h changes 2023-03-21 10:56:54 -07:00
test-bitmap.c setup.h: move declarations for setup.c functions from cache.h 2023-03-21 10:56:54 -07:00
test-bloom.c hash-ll.h: split out of hash.h to remove dependency on repository.h 2023-04-24 12:47:32 -07:00
test-bundle-uri.c treewide: remove unnecessary cache.h inclusion from a few headers 2023-03-21 10:56:50 -07:00
test-cache-tree.c hash-ll.h: split out of hash.h to remove dependency on repository.h 2023-04-24 12:47:32 -07:00
test-chmtime.c t/helper: allow chmtime to print verbosely without modifying mtime 2023-04-14 10:27:52 -07:00
test-config.c Merge branch 'en/header-split-cleanup' 2023-04-06 13:38:31 -07:00
test-crontab.c treewide: remove unnecessary cache.h includes in source files 2023-02-23 17:25:28 -08:00
test-csprng.c wrapper: add a helper to generate numbers from a CSPRNG 2022-01-17 14:17:48 -08:00
test-ctype.c Merge branch 'rs/test-ctype-eof' 2023-05-10 10:23:27 -07:00
test-date.c Merge branch 'en/header-split-cache-h' 2023-04-25 13:56:20 -07:00
test-delta.c treewide: remove unnecessary includes of cache.h 2023-03-21 10:56:53 -07:00
test-dir-iterator.c dir-iterator: drop unused DIR_ITERATOR_FOLLOW_SYMLINKS 2023-02-16 16:21:56 -08:00
test-drop-caches.c t/helper: mark unused argv/argc arguments 2023-03-28 14:11:24 -07:00
test-dump-cache-tree.c Merge branch 'en/header-split-cache-h-part-2' 2023-05-09 16:45:46 -07:00
test-dump-fsmonitor.c Merge branch 'en/header-split-cache-h-part-2' 2023-05-09 16:45:46 -07:00
test-dump-split-index.c Merge branch 'en/header-split-cleanup' 2023-04-06 13:38:31 -07:00
test-dump-untracked-cache.c Merge branch 'en/header-split-cache-h-part-2' 2023-05-09 16:45:46 -07:00
test-env-helper.c env-helper: move this built-in to "test-tool env-helper" 2023-01-14 18:07:11 -08:00
test-example-decorate.c Merge branch 'en/header-split-cache-h-part-2' 2023-05-09 16:45:46 -07:00
test-fake-ssh.c Merge branch 'ab/various-leak-fixes' 2022-12-14 15:55:46 +09:00
test-fast-rebase.c object-name.h: move declarations for object-name.c functions from cache.h 2023-04-11 08:52:09 -07:00
test-fsmonitor-client.c Merge branch 'en/header-split-cache-h-part-2' 2023-05-09 16:45:46 -07:00
test-genrandom.c
test-genzeros.c test-genzeros: avoid raw write(2) 2023-02-16 08:30:38 -08:00
test-getcwd.c t0001: fix broken not-quite getcwd(3) test in bed67874e2 2021-07-30 10:18:27 -07:00
test-hash-speed.c builtins + test helpers: use return instead of exit() in cmd_* 2021-06-09 09:15:58 +09:00
test-hash.c treewide: remove unnecessary cache.h inclusion from several sources 2023-03-21 10:56:51 -07:00
test-hashmap.c t/helper/test-hashmap.c: avoid using strtok() 2023-04-24 16:01:28 -07:00
test-hexdump.c t/helper: mark unused argv/argc arguments 2023-03-28 14:11:24 -07:00
test-index-version.c t/helper: mark unused argv/argc arguments 2023-03-28 14:11:24 -07:00
test-json-writer.c t/helper/test-json-writer.c: avoid using strtok() 2023-04-24 16:01:28 -07:00
test-lazy-init-name-hash.c hash-ll.h: split out of hash.h to remove dependency on repository.h 2023-04-24 12:47:32 -07:00
test-match-trees.c Merge branch 'en/header-split-cache-h-part-2' 2023-05-09 16:45:46 -07:00
test-mergesort.c treewide: remove cache.h inclusion due to previous changes 2023-04-24 12:47:33 -07:00
test-mktemp.c
test-oid-array.c Merge branch 'en/header-split-cache-h-part-2' 2023-05-09 16:45:46 -07:00
test-oidmap.c Merge branch 'en/header-split-cache-h-part-2' 2023-05-09 16:45:46 -07:00
test-oidtree.c Merge branch 'en/header-split-cache-h-part-2' 2023-05-09 16:45:46 -07:00
test-online-cpus.c t/helper: mark unused argv/argc arguments 2023-03-28 14:11:24 -07:00
test-pack-mtimes.c treewide: remove cache.h inclusion due to setup.h changes 2023-03-21 10:56:54 -07:00
test-parse-options.c Merge branch 'en/header-split-cache-h-part-2' 2023-05-09 16:45:46 -07:00
test-parse-pathspec-file.c treewide: remove unnecessary inclusion of gettext.h 2023-03-21 10:56:51 -07:00
test-partial-clone.c treewide: remove cache.h inclusion due to setup.h changes 2023-03-21 10:56:54 -07:00
test-path-utils.c hash-ll.h: split out of hash.h to remove dependency on repository.h 2023-04-24 12:47:32 -07:00
test-pcre2-config.c treewide: remove unnecessary cache.h includes in source files 2023-02-23 17:25:28 -08:00
test-pkt-line.c write-or-die.h: move declarations for write-or-die.c functions from cache.h 2023-03-21 10:56:54 -07:00
test-prio-queue.c t/helper: mark unused argv/argc arguments 2023-03-28 14:11:24 -07:00
test-proc-receive.c treewide: remove cache.h inclusion due to setup.h changes 2023-03-21 10:56:54 -07:00
test-progress.c treewide: remove unnecessary inclusion of gettext.h 2023-03-21 10:56:51 -07:00
test-reach.c ref-filter.h: provide REF_FILTER_INIT 2023-07-10 14:48:55 -07:00
test-read-cache.c hash-ll.h: split out of hash.h to remove dependency on repository.h 2023-04-24 12:47:32 -07:00
test-read-graph.c Merge branch 'en/header-split-cleanup' 2023-04-06 13:38:31 -07:00
test-read-midx.c treewide: remove cache.h inclusion due to previous changes 2023-04-24 12:47:33 -07:00
test-ref-store.c refs/packed-backend.c: implement jump lists to avoid excluded pattern(s) 2023-07-10 14:48:55 -07:00
test-reftable.c reftable: ensure git-compat-util.h is the first (indirect) include 2023-04-24 12:47:33 -07:00
test-regex.c test-tool regex: call regfree(), fix memory leaks 2022-07-01 13:38:50 -07:00
test-repository.c treewide: remove cache.h inclusion due to setup.h changes 2023-03-21 10:56:54 -07:00
test-revision-walking.c Merge branch 'en/header-split-cleanup' 2023-04-06 13:38:31 -07:00
test-rot13-filter.c t0021: implementation the rot13-filter.pl script in C 2022-08-14 22:57:12 -07:00
test-run-command.c treewide: remove unnecessary inclusion of gettext.h 2023-03-21 10:56:51 -07:00
test-scrap-cache-tree.c Merge branch 'en/header-split-cache-h-part-2' 2023-05-09 16:45:46 -07:00
test-serve-v2.c treewide: remove cache.h inclusion due to setup.h changes 2023-03-21 10:56:54 -07:00
test-sha1.c Makefile & test-tool: replace "DC_SHA1" variable with a "define" 2022-11-07 22:11:51 -05:00
test-sha1.sh
test-sha256.c
test-sigchain.c t/helper: mark unused argv/argc arguments 2023-03-28 14:11:24 -07:00
test-simple-ipc.c treewide: remove unnecessary cache.h includes in source files 2023-02-23 17:25:28 -08:00
test-strcmp-offset.c t/helper: mark unused argv/argc arguments 2023-03-28 14:11:24 -07:00
test-string-list.c Merge branch 'en/header-split-cache-h-part-2' 2023-05-09 16:45:46 -07:00
test-submodule-config.c Merge branch 'en/header-split-cache-h-part-2' 2023-05-09 16:45:46 -07:00
test-submodule-nested-repo-config.c hash-ll.h: split out of hash.h to remove dependency on repository.h 2023-04-24 12:47:32 -07:00
test-submodule.c Merge branch 'en/header-split-cache-h-part-2' 2023-05-09 16:45:46 -07:00
test-subprocess.c treewide: remove cache.h inclusion due to setup.h changes 2023-03-21 10:56:54 -07:00
test-tool-utils.h submodule--helper: move "is-active" to a test-tool 2022-09-02 09:16:23 -07:00
test-tool.c env-helper: move this built-in to "test-tool env-helper" 2023-01-14 18:07:11 -08:00
test-tool.h env-helper: move this built-in to "test-tool env-helper" 2023-01-14 18:07:11 -08:00
test-trace2.c Merge branch 'en/header-split-cache-h-part-2' 2023-05-09 16:45:46 -07:00
test-urlmatch-normalization.c test-tool urlmatch-normalization: fix a memory leak 2022-07-01 13:38:49 -07:00
test-userdiff.c treewide: remove cache.h inclusion due to setup.h changes 2023-03-21 10:56:54 -07:00
test-wildmatch.c treewide: remove unnecessary cache.h includes in source files 2023-02-23 17:25:28 -08:00
test-windows-named-pipe.c
test-write-cache.c hash-ll.h: split out of hash.h to remove dependency on repository.h 2023-04-24 12:47:32 -07:00
test-xml-encode.c t/helper: mark unused argv/argc arguments 2023-03-28 14:11:24 -07:00