Commit graph

960 commits

Author SHA1 Message Date
Junio C Hamano 3eb6679959 Merge branch 'ps/environ-wo-the-repository'
Code clean-up.

* ps/environ-wo-the-repository: (21 commits)
  environment: stop storing "core.notesRef" globally
  environment: stop storing "core.warnAmbiguousRefs" globally
  environment: stop storing "core.preferSymlinkRefs" globally
  environment: stop storing "core.logAllRefUpdates" globally
  refs: stop modifying global `log_all_ref_updates` variable
  branch: stop modifying `log_all_ref_updates` variable
  repo-settings: track defaults close to `struct repo_settings`
  repo-settings: split out declarations into a standalone header
  environment: guard state depending on a repository
  environment: reorder header to split out `the_repository`-free section
  environment: move `set_git_dir()` and related into setup layer
  environment: make `get_git_namespace()` self-contained
  environment: move object database functions into object layer
  config: make dependency on repo in `read_early_config()` explicit
  config: document `read_early_config()` and `read_very_early_config()`
  environment: make `get_git_work_tree()` accept a repository
  environment: make `get_graft_file()` accept a repository
  environment: make `get_index_file()` accept a repository
  environment: make `get_object_directory()` accept a repository
  environment: make `get_git_common_dir()` accept a repository
  ...
2024-09-23 10:35:05 -07:00
Junio C Hamano c1f41bbe1a Merge branch 'cp/unit-test-reftable-stack'
Another reftable test migrated to the unit-test framework.

* cp/unit-test-reftable-stack:
  t-reftable-stack: add test for stack iterators
  t-reftable-stack: add test for non-default compaction factor
  t-reftable-stack: use reftable_ref_record_equal() to compare ref records
  t-reftable-stack: use Git's tempfile API instead of mkstemp()
  t: harmonize t-reftable-stack.c with coding guidelines
  t: move reftable/stack_test.c to the unit testing framework
2024-09-16 14:22:53 -07:00
Junio C Hamano b64f249726 Merge branch 'jk/messages-with-excess-lf-fix'
One-line messages to "die" and other helper functions will get LF
added by these helper functions, but many existing messages had an
unnecessary LF at the end, which have been corrected.

* jk/messages-with-excess-lf-fix:
  drop trailing newline from warning/error/die messages
2024-09-12 11:47:23 -07:00
Junio C Hamano 04595eb407 Merge branch 'gt/unit-test-oid-array'
Another unit-test.

* gt/unit-test-oid-array:
  t: port helper/test-oid-array.c to unit-tests/t-oid-array.c
2024-09-12 11:47:23 -07:00
Patrick Steinhardt 673af418d0 environment: guard state depending on a repository
In "environment.h" we have quite a lot of functions and variables that
either explicitly or implicitly depend on `the_repository`.

The implicit set of stateful declarations includes for example variables
which get populated when parsing a repository's Git configuration. This
set of variables is broken by design, as their state often depends on
the last repository config that has been parsed. So they may or may not
represent the state of `the_repository`.

Fixing that is quite a big undertaking, and later patches in this series
will demonstrate a solution for a first small set of those variables. So
for now, let's guard these with `USE_THE_REPOSITORY_VARIABLE` so that
callers are aware of the implicit dependency.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-09-12 10:15:42 -07:00
Patrick Steinhardt b92266b79c config: make dependency on repo in read_early_config() explicit
The `read_early_config()` function can be used to read configuration
where a repository has not yet been set up. As such, it is optional
whether or not `the_repository` has already been initialized. If it was
initialized we use its commondir and gitdir. If not, the function will
try to detect the Git directories by itself and, if found, also parse
their config files.

This means that we implicitly rely on `the_repository`. Make this
dependency explicit by passing a `struct repository`. This allows us to
again drop the `USE_THE_REPOSITORY_VARIABLE` define in "config.c".

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-09-12 10:15:40 -07:00
Chandra Pratap 15e29ea1c6 t: move reftable/stack_test.c to the unit testing framework
reftable/stack_test.c exercises the functions defined in
reftable/stack.{c, h}. Migrate reftable/stack_test.c to the
unit testing framework. Migration involves refactoring the tests
to use the unit testing framework instead of reftable's test
framework and renaming the tests to be in-line with unit-tests'
standards.

Since some of the tests use set_test_hash() defined by
reftable/test_framework.{c, h} but these files are not
'#included' in the test file, copy this function in the
ported test file.

With the migration of stack test to the unit-tests framework,
"test-tool reftable" becomes a no-op. Hence, get rid of everything
that uses "test-tool reftable" alongside everything that is used
to implement it.

While at it, alphabetically sort the cmds[] list in
helper/test-tool.c by moving the entry for "dump-reftable".

Mentored-by: Patrick Steinhardt <ps@pks.im>
Mentored-by: Christian Couder <chriscool@tuxfamily.org>
Signed-off-by: Chandra Pratap <chandrapratap3519@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-09-08 13:24:03 -07:00
Jeff King 1a60f2066a drop trailing newline from warning/error/die messages
Our error reporting routines append a trailing newline, and the strings
we pass to them should not include them (otherwise we get an extra blank
line after the message).

These cases were all found by looking at the results of:

  git grep -P '[^_](error|error_errno|warning|die|die_errno)\(.*\\n"[,)]' '*.c'

Note that we _do_ sometimes include a newline in the middle of such
messages, to create multiline output (hence our grep matching "," or ")"
after we see the newline, so we know we're at the end of the string).

It's possible that one or more of these cases could intentionally be
including a blank line at the end, but having looked at them all
manually, I think these are all just mistakes.

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-09-05 09:07:12 -07:00
Junio C Hamano 17636cdf3b Merge branch 'ps/reftable-concurrent-compaction'
The code path for compacting reftable files saw some bugfixes
against concurrent operation.

* ps/reftable-concurrent-compaction:
  reftable/stack: fix segfault when reload with reused readers fails
  reftable/stack: reorder swapping in the reloaded stack contents
  reftable/reader: keep readers alive during iteration
  reftable/reader: introduce refcounting
  reftable/stack: fix broken refnames in `write_n_ref_tables()`
  reftable/reader: inline `reader_close()`
  reftable/reader: inline `init_reader()`
  reftable/reader: rename `reftable_new_reader()`
  reftable/stack: inline `stack_compact_range_stats()`
  reftable/blocksource: drop malloc block source
2024-09-03 09:15:03 -07:00
Junio C Hamano 8c1c63d525 Merge branch 'ps/leakfixes-part-5'
Even more leak fixes.

* ps/leakfixes-part-5:
  transport: fix leaking negotiation tips
  transport: fix leaking arguments when fetching from bundle
  builtin/fetch: fix leaking transaction with `--atomic`
  remote: fix leaking peer ref when expanding refmap
  remote: fix leaks when matching refspecs
  remote: fix leaking config strings
  builtin/fetch-pack: fix leaking refs
  sideband: fix leaks when configuring sideband colors
  builtin/send-pack: fix leaking refspecs
  transport: fix leaking OID arrays in git:// transport data
  t/helper: fix leaking multi-pack-indices in "read-midx"
  builtin/repack: fix leaks when computing packs to repack
  midx-write: fix leaking hashfile on error cases
  builtin/archive: fix leaking `OPT_FILENAME()` value
  builtin/upload-archive: fix leaking args passed to `write_archive()`
  builtin/merge-tree: fix leaking `-X` strategy options
  pretty: fix leaking key/value separator buffer
  pretty: fix memory leaks when parsing pretty formats
  convert: fix leaks when resetting attributes
  mailinfo: fix leaking header data
2024-09-03 09:15:00 -07:00
Ghanshyam Thakkar a680635e05 t: port helper/test-oid-array.c to unit-tests/t-oid-array.c
helper/test-oid-array.c along with t0064-oid-array.sh test the
oid-array.h API, which provides storage and processing
efficiency over large lists of object identifiers.

Migrate them to the unit testing framework for better runtime
performance and efficiency. As we don't initialize a repository
in these tests, the hash algo that functions like oid_array_lookup()
use is not initialized, therefore call repo_set_hash_algo() to
initialize it. And init_hash_algo():lib-oid.c can aid in this
process, so make it public.

Mentored-by: Christian Couder <chriscool@tuxfamily.org>
Mentored-by: Kaartic Sivaraam <kaartic.sivaraam@gmail.com>
Helped-by: Phillip Wood <phillip.wood123@gmail.com>
Signed-off-by: Ghanshyam Thakkar <shyamthakkar001@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-09-01 20:43:38 -07:00
Junio C Hamano 839b808325 Merge branch 'cp/unit-test-reftable-block'
Another test for reftable library ported to the unit test framework.

* cp/unit-test-reftable-block:
  t-reftable-block: mark unused argv/argc
  t-reftable-block: add tests for index blocks
  t-reftable-block: add tests for obj blocks
  t-reftable-block: add tests for log blocks
  t-reftable-block: remove unnecessary variable 'j'
  t-reftable-block: use xstrfmt() instead of xstrdup()
  t-reftable-block: use block_iter_reset() instead of block_iter_close()
  t-reftable-block: use reftable_record_key() instead of strbuf_addstr()
  t-reftable-block: use reftable_record_equal() instead of check_str()
  t-reftable-block: release used block reader
  t: harmonize t-reftable-block.c with coding guidelines
  t: move reftable/block_test.c to the unit testing framework
2024-08-29 11:08:16 -07:00
Junio C Hamano d4d677704d Merge branch 'ps/reftable-drop-generic'
The code in the reftable library has been cleaned up by discarding
unused "generic" interface.

* ps/reftable-drop-generic:
  reftable: mark unused parameters in empty iterator functions
  reftable/generic: drop interface
  t/helper: refactor to not use `struct reftable_table`
  t/helper: use `hash_to_hex_algop()` to print hashes
  t/helper: inline printing of reftable records
  t/helper: inline `reftable_table_print()`
  t/helper: inline `reftable_stack_print_directory()`
  t/helper: inline `reftable_reader_print_file()`
  t/helper: inline `reftable_dump_main()`
  reftable/dump: drop unused `compact_stack()`
  reftable/generic: move generic iterator code into iterator interface
  reftable/iter: drop double-checking logic
  reftable/stack: open-code reading refs
  reftable/merged: stop using generic tables in the merged table
  reftable/merged: rename `reftable_new_merged_table()`
  reftable/merged: expose functions to initialize iterators
2024-08-29 11:08:16 -07:00
Junio C Hamano a9bc27fb18 Merge branch 'gt/unit-test-urlmatch-normalization'
Another rewrite of test.

* gt/unit-test-urlmatch-normalization:
  t: migrate t0110-urlmatch-normalization to the new framework
2024-08-28 10:31:27 -07:00
Junio C Hamano 3222718ad7 Merge branch 'ds/for-each-ref-is-base'
'git for-each-ref' learned a new "--format" atom to find the branch
that the history leading to a given commit "%(is-base:<commit>)" is
likely based on.

* ds/for-each-ref-is-base:
  p1500: add is-base performance tests
  for-each-ref: add 'is-base' token
  commit: add gentle reference lookup method
  commit-reach: add get_branch_base_for_tip
2024-08-26 11:32:24 -07:00
Junio C Hamano 2b30d66c43 Merge branch 'jk/mark-unused-parameters'
Mark unused parameters as UNUSED to squelch -Wunused warnings.

* jk/mark-unused-parameters:
  t-hashmap: stop calling setup() for t_intern() test
  scalar: mark unused parameters in dummy function
  daemon: mark unused parameters in non-posix fallbacks
  setup: mark unused parameter in config callback
  test-mergesort: mark unused parameters in trivial callback
  t-hashmap: mark unused parameters in callback function
  reftable: mark unused parameters in virtual functions
  reftable: drop obsolete test function declarations
  reftable: ignore unused argc/argv in test functions
  unit-tests: ignore unused argc/argv
  t/helper: mark more unused argv/argc arguments
  oss-fuzz: mark unused argv/argc argument
  refs: mark unused parameters in do_for_each_reflog_helper()
  refs: mark unused parameters in ref_store fsck callbacks
  update-ref: mark more unused parameters in parser callbacks
  imap-send: mark unused parameter in ssl_socket_connect() fallback
2024-08-26 11:32:23 -07:00
Junio C Hamano 668843e6d8 Merge branch 'cp/unit-test-reftable-readwrite'
* cp/unit-test-reftable-readwrite:
  t-reftable-readwrite: add test for known error
  t-reftable-readwrite: use 'for' in place of infinite 'while' loops
  t-reftable-readwrite: use free_names() instead of a for loop
  t: move reftable/readwrite_test.c to the unit testing framework
2024-08-23 09:02:35 -07:00
Junio C Hamano 5e56a39e6a Merge branch 'ps/config-wo-the-repository'
Use of API functions that implicitly depend on the_repository
object in the config subsystem has been rewritten to pass a
repository object through the callchain.

* ps/config-wo-the-repository:
  config: hide functions using `the_repository` by default
  global: prepare for hiding away repo-less config functions
  config: don't depend on `the_repository` with branch conditions
  config: don't have setters depend on `the_repository`
  config: pass repo to functions that rename or copy sections
  config: pass repo to `git_die_config()`
  config: pass repo to `git_config_get_expiry_in_days()`
  config: pass repo to `git_config_get_expiry()`
  config: pass repo to `git_config_get_max_percent_split_change()`
  config: pass repo to `git_config_get_split_index()`
  config: pass repo to `git_config_get_index_threads()`
  config: expose `repo_config_clear()`
  config: introduce missing setters that take repo as parameter
  path: hide functions using `the_repository` by default
  path: stop relying on `the_repository` in `worktree_git_path()`
  path: stop relying on `the_repository` when reporting garbage
  hooks: remove implicit dependency on `the_repository`
  editor: do not rely on `the_repository` for interactive edits
  path: expose `do_git_common_path()` as `repo_common_pathv()`
  path: expose `do_git_path()` as `repo_git_pathv()`
2024-08-23 09:02:34 -07:00
Patrick Steinhardt d857469d85 reftable/reader: introduce refcounting
It was recently reported that concurrent reads and writes may cause the
reftable backend to segfault. The root cause of this is that we do not
properly keep track of reftable readers across reloads.

Suppose that you have a reftable iterator and then decide to reload the
stack while iterating through the iterator. When the stack has been
rewritten since we have created the iterator, then we would end up
discarding a subset of readers that may still be in use by the iterator.
The consequence is that we now try to reference deallocated memory,
which of course segfaults.

One way to trigger this is in t5616, where some background maintenance
jobs have been leaking from one test into another. This leads to stack
traces like the following one:

  + git -c protocol.version=0 -C pc1 fetch --filter=blob:limit=29999 --refetch origin
  AddressSanitizer:DEADLYSIGNAL
  =================================================================
  ==657994==ERROR: AddressSanitizer: SEGV on unknown address 0x7fa0f0ec6089 (pc 0x55f23e52ddf9 bp
0x7ffe7bfa1700 sp 0x7ffe7bfa1700 T0)
  ==657994==The signal is caused by a READ memory access.
      #0 0x55f23e52ddf9 in get_var_int reftable/record.c:29
      #1 0x55f23e53295e in reftable_decode_keylen reftable/record.c:170
      #2 0x55f23e532cc0 in reftable_decode_key reftable/record.c:194
      #3 0x55f23e54e72e in block_iter_next reftable/block.c:398
      #4 0x55f23e5573dc in table_iter_next_in_block reftable/reader.c:240
      #5 0x55f23e5573dc in table_iter_next reftable/reader.c:355
      #6 0x55f23e5573dc in table_iter_next reftable/reader.c:339
      #7 0x55f23e551283 in merged_iter_advance_subiter reftable/merged.c:69
      #8 0x55f23e55169e in merged_iter_next_entry reftable/merged.c:123
      #9 0x55f23e55169e in merged_iter_next_void reftable/merged.c:172
      #10 0x55f23e537625 in reftable_iterator_next_ref reftable/generic.c:175
      #11 0x55f23e2cf9c6 in reftable_ref_iterator_advance refs/reftable-backend.c:464
      #12 0x55f23e2d996e in ref_iterator_advance refs/iterator.c:13
      #13 0x55f23e2d996e in do_for_each_ref_iterator refs/iterator.c:452
      #14 0x55f23dca6767 in get_ref_map builtin/fetch.c:623
      #15 0x55f23dca6767 in do_fetch builtin/fetch.c:1659
      #16 0x55f23dca6767 in fetch_one builtin/fetch.c:2133
      #17 0x55f23dca6767 in cmd_fetch builtin/fetch.c:2432
      #18 0x55f23dba7764 in run_builtin git.c:484
      #19 0x55f23dba7764 in handle_builtin git.c:741
      #20 0x55f23dbab61e in run_argv git.c:805
      #21 0x55f23dbab61e in cmd_main git.c:1000
      #22 0x55f23dba4781 in main common-main.c:64
      #23 0x7fa0f063fc89 in __libc_start_call_main ../sysdeps/nptl/libc_start_call_main.h:58
      #24 0x7fa0f063fd44 in __libc_start_main_impl ../csu/libc-start.c:360
      #25 0x55f23dba6ad0 in _start (git+0xadfad0) (BuildId: 803b2b7f59beb03d7849fb8294a8e2145dd4aa27)

While it is somewhat awkward that the maintenance processes survive
tests in the first place, it is totally expected that reftables should
work alright with concurrent writers. Seemingly they don't.

The only underlying resource that we need to care about in this context
is the reftable reader, which is responsible for reading a single table
from disk. These readers get discarded immediately (unless reused) when
calling `reftable_stack_reload()`, which is wrong. We can only close
them once we know that there are no iterators using them anymore.

Prepare for a fix by converting the reftable readers to be refcounted.

Reported-by: Jeff King <peff@peff.net>
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-08-23 08:04:47 -07:00
Patrick Steinhardt a0218203cd reftable/reader: rename reftable_new_reader()
Rename the `reftable_new_reader()` function to `reftable_reader_new()`
to match our coding guidelines.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-08-23 08:04:46 -07:00
Patrick Steinhardt fb24460e1d t/helper: fix leaking multi-pack-indices in "read-midx"
Several of the subcommands of `test-helper read-midx` do not close the
MIDX that they have opened, leading to memory leaks. Fix those.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-08-22 09:18:05 -07:00
Patrick Steinhardt 89191232b8 t/helper: refactor to not use struct reftable_table
The `struct reftable_table` interface in our "reftable" test helper gets
used such that we can easily print either a single table, or a merged
stack. This generic interface is about to go away.

Prepare the code for this change by using merged tables instead. When
printing the stack we've already got one. When using a single table, we
can create a merged table from it to adapt.

This removes the last user of the generic interface.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-08-22 07:59:48 -07:00
Patrick Steinhardt 1f39dd2ae5 t/helper: use hash_to_hex_algop() to print hashes
The "reftable" test helper uses a hand-crafted version to convert from a
raw hash to its hex variant. This was done because this code used to be
part of the reftable library, where we do not use most functions from
the Git core.

Now that the code is integrated into the "dump-reftable" helper though,
that limitation went away. Let's thus use `hash_to_hex_algop()` instead.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-08-22 07:59:48 -07:00
Patrick Steinhardt 42c424d69d t/helper: inline printing of reftable records
Move printing of reftable records into the "dump-reftable" helper. This
follows the same reasoning as the preceding commit.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-08-22 07:59:47 -07:00
Patrick Steinhardt 64a5b7a8ca t/helper: inline reftable_table_print()
Move `reftable_table_print()` into the "dump-reftable" helper. This
follows the same reasoning as the preceding commit.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-08-22 07:59:47 -07:00
Patrick Steinhardt ca74ef6ffb t/helper: inline reftable_stack_print_directory()
Move `reftable_stack_print_directory()` into the "dump-reftable" helper.
This follows the same reasoning as the preceding commit.

Note that this requires us to remove the tests for this functionality in
`reftable/stack_test.c`. The test does not really add much anyway,
because all it verifies is that we do not crash or run into an error,
and it specifically doesn't check the outputted data. Also, as the code
is now part of the test helper, it doesn't make much sense to have a
unit test for it in the first place.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-08-22 07:59:47 -07:00
Patrick Steinhardt 22f519a9a0 t/helper: inline reftable_reader_print_file()
Move `reftable_reader_print_file()` into the "dump-reftable" helper.
This follows the same reasoning as the preceding commit.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-08-22 07:59:47 -07:00
Patrick Steinhardt 2b06b28fd6 t/helper: inline reftable_dump_main()
The printing functionality part of `reftable/dump.c` is really only used
by our "dump-reftable" test helper. It is certainly not generic logic
that is useful to anybody outside of Git, and the format it generates is
quite specific. Still, parts of it are used in our test suite and the
output may be useful to take a peek into reftable stacks, tables and
blocks. So while it does not make sense to expose this as part of the
reftable library, it does make sense to keep it around.

Inline the `reftable_dump_main()` function into the "dump-reftable" test
helper. This clarifies that its format is subject to change and not part
of our public interface. Furthermore, this allows us to iterate on the
implementation in subsequent patches.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-08-22 07:59:47 -07:00
Chandra Pratap 546cc0d64e t: move reftable/block_test.c to the unit testing framework
reftable/block_test.c exercises the functions defined in
reftable/block.{c, h}. Migrate reftable/block_test.c to the unit
testing framework. Migration involves refactoring the tests
to use the unit testing framework instead of reftable's test
framework and renaming the tests to follow the unit-tests'
naming conventions.

Mentored-by: Patrick Steinhardt <ps@pks.im>
Mentored-by: Christian Couder <chriscool@tuxfamily.org>
Signed-off-by: Chandra Pratap <chandrapratap3519@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-08-21 09:41:40 -07:00
Ghanshyam Thakkar 05026637f3 t: migrate t0110-urlmatch-normalization to the new framework
helper/test-urlmatch-normalization along with
t0110-urlmatch-normalization test the `url_normalize()` function from
'urlmatch.h'. Migrate them to the unit testing framework for better
performance. And also add different test_msg()s for better debugging.

In the migration, last two of the checks from `t_url_general_escape()`
were slightly changed compared to the shell script. This involves
changing

'\'' -> '
'\!' -> !

in the urls of those checks. This is because in C strings, we don't
need to escape "'" and "!". Other than these two, all the urls were
pasted verbatim from the shell script.

Another change is the removal of a MINGW prerequisite from one of the
test. It was there because[1] on Windows, the command line is a
Unicode string, it is not possible to pass arbitrary bytes to a
program. But in unit tests we don't have this limitation.

And since we can construct strings with arbitrary bytes in C, let's
also remove the test files which contain URLs with arbitrary bytes in
the 't/t0110' directory and instead embed those URLs in the unit test
code itself.

[1]: https://lore.kernel.org/git/53CAC8EF.6020707@gmail.com/

Mentored-by: Christian Couder <chriscool@tuxfamily.org>
Mentored-by: Kaartic Sivaraam <kaartic.sivaraam@gmail.com>
Signed-off-by: Ghanshyam Thakkar <shyamthakkar001@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-08-20 10:08:28 -07:00
Junio C Hamano b9497848df Merge branch 'tb/incremental-midx-part-1'
Incremental updates of multi-pack index files.

* tb/incremental-midx-part-1:
  midx: implement support for writing incremental MIDX chains
  t/t5313-pack-bounds-checks.sh: prepare for sub-directories
  t: retire 'GIT_TEST_MULTI_PACK_INDEX_WRITE_BITMAP'
  midx: implement verification support for incremental MIDXs
  midx: support reading incremental MIDX chains
  midx: teach `midx_fanout_add_midx_fanout()` about incremental MIDXs
  midx: teach `midx_preferred_pack()` about incremental MIDXs
  midx: teach `midx_contains_pack()` about incremental MIDXs
  midx: remove unused `midx_locate_pack()`
  midx: teach `fill_midx_entry()` about incremental MIDXs
  midx: teach `nth_midxed_offset()` about incremental MIDXs
  midx: teach `bsearch_midx()` about incremental MIDXs
  midx: introduce `bsearch_one_midx()`
  midx: teach `nth_bitmapped_pack()` about incremental MIDXs
  midx: teach `nth_midxed_object_oid()` about incremental MIDXs
  midx: teach `prepare_midx_pack()` about incremental MIDXs
  midx: teach `nth_midxed_pack_int_id()` about incremental MIDXs
  midx: add new fields for incremental MIDX chains
  Documentation: describe incremental MIDX format
2024-08-19 11:07:37 -07:00
Junio C Hamano 4dbca805e0 Merge branch 'rs/unit-tests-test-run'
Unit-test framework has learned a simple control structure to allow
embedding test statements in-line instead of having to create a new
function to contain them.

* rs/unit-tests-test-run:
  t-strvec: use if_test
  t-reftable-basics: use if_test
  t-ctype: use if_test
  unit-tests: add if_test
  unit-tests: show location of checks outside of tests
  t0080: use here-doc test body
2024-08-19 11:07:36 -07:00
Jeff King f288a57789 test-mergesort: mark unused parameters in trivial callback
The mode_copy() function does nothing, but since it's used as a function
pointer within "struct mode", it has to conform to the interface. Mark
it to quiet -Wunused-parameter.

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-08-17 09:46:23 -07:00
Jeff King 7046c85cce t/helper: mark more unused argv/argc arguments
This is a continuation of 126e3b3d2a (t/helper: mark unused argv/argc
arguments, 2023-03-28) to cover a few new cases:

 - test-example-tap was added since that commit

 - test-hashmap used to accept the "ignorecase" argument on the command
   line. But since most of its logic was moved to a unit-test in
   3469a23659 (t: port helper/test-hashmap.c to unit-tests/t-hashmap.c,
   2024-08-03), it now ignores its argv entirely.

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-08-17 09:46:11 -07:00
Junio C Hamano e7f86cb69d Merge branch 'jc/refs-symref-referent'
The refs API has been taught to give symref target information to
the users of ref iterators, allowing for-each-ref and friends to
avoid an extra ref_resolve_* API call per a symbolic ref.

* jc/refs-symref-referent:
  ref-filter: populate symref from iterator
  refs: add referent to each_ref_fn
  refs: keep track of unresolved reference value in iterators
2024-08-15 13:22:15 -07:00
Junio C Hamano a3d71f2076 Merge branch 'gt/unit-test-hashmap'
An existing test of hashmap API has been rewritten with the
unit-test framework.

* gt/unit-test-hashmap:
  t: port helper/test-hashmap.c to unit-tests/t-hashmap.c
2024-08-15 13:22:12 -07:00
Junio C Hamano 7b11e20bff Merge branch 'cp/unit-test-reftable-tree'
A test in reftable library has been rewritten using the unit test
framework.

* cp/unit-test-reftable-tree:
  t-reftable-tree: improve the test for infix_walk()
  t-reftable-tree: add test for non-existent key
  t-reftable-tree: split test_tree() into two sub-test functions
  t: move reftable/tree_test.c to the unit testing framework
  reftable: remove unnecessary curly braces in reftable/tree.c
2024-08-14 14:54:56 -07:00
Junio C Hamano 61fd5de05f Merge branch 'kl/test-fixes'
A flakey test and incorrect calls to strtoX() functions have been
fixed.

* kl/test-fixes:
  t6421: fix test to work when repo dir contains d0
  set errno=0 before strtoX calls
2024-08-14 14:54:55 -07:00
Junio C Hamano d65332f241 Merge branch 'cp/unit-test-reftable-pq'
The tests for "pq" part of reftable library got rewritten to use
the unit test framework.

* cp/unit-test-reftable-pq:
  t-reftable-pq: add tests for merged_iter_pqueue_top()
  t-reftable-pq: add test for index based comparison
  t-reftable-pq: make merged_iter_pqueue_check() callable by reference
  t-reftable-pq: make merged_iter_pqueue_check() static
  t: move reftable/pq_test.c to the unit testing framework
  reftable: change the type of array indices to 'size_t' in reftable/pq.c
  reftable: remove unnecessary curly braces in reftable/pq.c
2024-08-14 14:54:48 -07:00
Junio C Hamano 4385f8a52d Merge branch 'ps/leakfixes-part-3'
More leakfixes.

* ps/leakfixes-part-3: (24 commits)
  commit-reach: fix trivial memory leak when computing reachability
  convert: fix leaking config strings
  entry: fix leaking pathnames during delayed checkout
  object-name: fix leaking commit list items
  t/test-repository: fix leaking repository
  builtin/credential-cache: fix trivial leaks
  builtin/worktree: fix leaking derived branch names
  builtin/shortlog: fix various trivial memory leaks
  builtin/rerere: fix various trivial memory leaks
  builtin/credential-store: fix leaking credential
  builtin/show-branch: fix several memory leaks
  builtin/rev-parse: fix memory leak with `--parseopt`
  builtin/stash: fix various trivial memory leaks
  builtin/remote: fix various trivial memory leaks
  builtin/remote: fix leaking strings in `branch_list`
  builtin/ls-remote: fix leaking `pattern` strings
  builtin/submodule--helper: fix leaking buffer in `is_tip_reachable`
  builtin/submodule--helper: fix leaking clone depth parameter
  builtin/name-rev: fix various trivial memory leaks
  builtin/describe: fix trivial memory leak when describing blob
  ...
2024-08-14 14:54:47 -07:00
Derrick Stolee e32eaf73b0 commit-reach: add get_branch_base_for_tip
Add a new reachability algorithm that intends to discover (from a heuristic)
which branch was used as the starting point for a given commit. Add focused
tests using the 'test-tool reach' command.

In repositories that use pull requests (or merge requests) to advance one or
more "protected" branches, the history of that reference can be recovered by
following the first-parent history in most cases. Most are completed using
no-fast-forward merges, though squash merges are quite common. Less common
is rebase-and-merge, which still validates this assumption. Finally, the
case that breaks this assumption is the fast-forward update (with potential
rebasing).  Even in this case, the previous commit commonly appears in the
first-parent history of the branch.

Similar assumptions can be made for a topic branch created by a single user
with the intention to merge back into another branch. Using 'git commit',
'git merge', and 'git cherry-pick' from HEAD will default to having the
first-parent commit be the previous commit at HEAD. This history changes
only with commands such as 'git reset' or 'git rebase', where the command
names also imply that the branch is starting from a new location.

With this movement of branches in mind, the following heuristic is proposed
as a way to determine the base branch for a given source branch:

  Among a list of candidate base branches, select the candidate that
  minimizes the number of commits in the first-parent history of the source
  that are not in the first-parent history of the candidate.

Prior third-party solutions to this problem have used this optimization
criteria, but have relied upon extracting the first-parent history and
comparing those lists as tables instead of using commit-graph walks.

Given current command-line interface options, this optimization criteria is
not easy to detect directly. Even using the command

  git rev-list --count --first-parent <base>..<source>

does not measure this count, as it uses full reachability from <base> to
determine which commits to remove from the range '<base>..<source>'. This
may lead to one asking if we should instead be using the full reachability
of the candidate and only the first-parent history of the source. This,
unfortunately, does not work for repositories that use long-lived branches
and automation to merge across those branches.

In extremely large repositories, merging into a single trunk may not be
feasible.  This is usually due to the desired frequency of updates
(thousands of engineers doing daily work) combined with the time required to
perform a validation build.  These factors combine to create significant
risk of semantic merge conflicts, leading to build breaks on the trunk. In
response, repository maintainers can create a single Level Zero (L0) trunk
and multiple Level One (L1) branches. By partitioning the engineers by
organization, these engineers may see lower risk of semantic merge conflicts
as well as be protected against build breaks in other L1 branches. The key
to making this system work is a semi-automated process of merging L1
branches into the L0 trunk and vice-versa.  In a large enough organization,
these L1 branches may further split into L2 or L3 branches, but the same
principles apply for merging across deeper levels.

If these automated merges use a typical merge with the second parent
bringing in the "new" content, then each L0 and L1 branch can track its
previous positions by following first-parent history, which appear as
parallel paths (until reaching the first place where the branches diverged).
If we also walk to second parents, then the histories overlap significantly
and cannot be distinguished except for very-recent changes.

For this reason, the first-parent condition should be symmetrical across the
base and source branches.

Another common case for desiring the result of this optimization method is
the use of release branches. When releasing a version of a repository, a
branch can be used to track that release. Any updates that are worth fixing
in that release can be merged to the release branch and shipped with only
the necessary fixes without any new features introduced in the trunk branch.
The 'maint-2.<X>' branches represent this pattern in the Git project. The
microsoft/git fork uses 'vfs-2.<X>.<Y>' branches to track the changes that
are custom to that fork on top of each upstream Git release 2.<X>.<Y>. This
application doesn't need the symmetrical first-parent condition, but the use
of first-parent histories does not change the results for these branches.

To determine the base branch from a list of candidates, create a new method
in commit-reach.c that performs a single* commit-graph walk. The core
concept is to walk first-parents starting at the candidate bases and the
source, tracking the "best" base to reach a given commit. Use generation
numbers to ensure that a commit is walked at most once and all children have
been explored before visiting it.  When reaching a commit that is reachable
from both a base and the source, we will then have a guarantee that this is
the closest intersection of first-parent histories. Track the best base to
reach that commit and return it as a result. In rare cases involving
multiple root commits, the first-parent history of the source may never
intersect any of the candidates and thus a null result is returned.

* There are up to two walks, since we require all commits to have a computed
  generation number in order to avoid incorrect results. This is similar to
  the need for computed generation numbers in ahead_behind() as implemented
  in fd67d149bd (commit-reach: implement ahead_behind() logic, 2023-03-20).

In order to track the "best" base, use a new commit slab that stores an
integer.  This value defaults to zero upon initialization, so use -1 to
track that the source commit can reach this commit and use 'i + 1' to track
that the ith base can reach this commit. When multiple bases can reach a
commit, minimize the index to break ties. This allows the caller to specify
an order to the bases that determines some amount of preference when the
heuristic does not result in a unique result.

The trickiest part of the integer slab is what happens when reaching a
collision among the histories of the bases and the history of the source.
This is noticed when viewing the first parent and seeing that it has a slab
value that differs in sign (negative or positive). In this case, the
collision commit is stored in the method variable 'branch_point' and its
slab value is set to -1. The index of the best base (so far) is stored in
the method variable 'best_index'. It is possible that there are multiple
commits that have the branch_point as its first parent, leading to multiple
updates of best_index.  The result is determined when 'branch_point' is
visited in the commit walk, giving the guarantee that all commits that could
reach 'branch_point' were visited.

Several interesting cases of collisions and different results are tested in
the t6600-test-reach.sh script. Recall that this script also tests the
algorithm in three possible states involving the commit-graph file and how
many commits are written in the file. This provides some coverage of the
need (and lack of need) for the ensure_generations_valid() method.

Signed-off-by: Derrick Stolee <stolee@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-08-14 10:10:05 -07:00
Chandra Pratap 5b539a5361 t: move reftable/readwrite_test.c to the unit testing framework
reftable/readwrite_test.c exercises the functions defined in
reftable/reader.{c,h} and reftable/writer.{c,h}. Migrate
reftable/readwrite_test.c to the unit testing framework. Migration
involves refactoring the tests to use the unit testing framework
instead of reftable's test framework and renaming the tests to
align with unit-tests' naming conventions.

Since some tests in reftable/readwrite_test.c use the functions
set_test_hash(), noop_flush() and strbuf_add_void() defined in
reftable/test_framework.{c,h} but these files are not #included
in the ported unit test, copy these functions in the new test file.

While at it, ensure structs are 0-initialized with '= { 0 }'
instead of '= { NULL }'.

Mentored-by: Patrick Steinhardt <ps@pks.im>
Mentored-by: Christian Couder <chriscool@tuxfamily.org>
Signed-off-by: Chandra Pratap <chandrapratap3519@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-08-13 10:08:02 -07:00
Patrick Steinhardt 219de841d9 global: prepare for hiding away repo-less config functions
We're about to hide config functions that implicitly depend on
`the_repository` behind the `USE_THE_REPOSITORY_VARIABLE` macro. This
will uncover a bunch of dependents that transitively relied on the
global variable, but didn't define the macro yet.

Adapt them such that we define the macro to prepare for this change.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-08-13 10:01:05 -07:00
John Cai e8207717f1 refs: add referent to each_ref_fn
Add a parameter to each_ref_fn so that callers to the ref APIs
that use this function as a callback can have acess to the
unresolved value of a symbolic ref.

Signed-off-by: John Cai <johncai86@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-08-09 08:47:34 -07:00
Taylor Blau b80236d0e3 midx: support reading incremental MIDX chains
Now that the MIDX machinery's internals have been taught to understand
incremental MIDXs over the previous handful of commits, the MIDX
machinery itself can begin reading incremental MIDXs.

(Note that while the on-disk format for incremental MIDXs has been
defined, the writing end has not been implemented. This will take place
in the commit after next.)

The core of this change involves following the order specified in the
MIDX chain in reverse and opening up MIDXs in the chain one-by-one,
adding them to the previous layer's `->base_midx` pointer at each step.

In order to implement this, the `load_multi_pack_index()` function is
taught to call a new `load_multi_pack_index_chain()` function if loading
a non-incremental MIDX failed via `load_multi_pack_index_one()`.

When loading a MIDX chain, `load_midx_chain_fd_st()` reads each line in
the file one-by-one and dispatches calls to
`load_multi_pack_index_one()` to read each layer of the MIDX chain. When
a layer was successfully read, it is added to the MIDX chain by calling
`add_midx_to_chain()` which validates the contents of the `BASE` chunk,
performs some bounds checks on the number of combined packs and objects,
and attaches the new MIDX by assigning its `base_midx` pointer to the
existing part of the chain.

As a supplement to this, introduce a new mode in the test-read-midx
test-tool which allows us to read the information for a specific MIDX in
the chain by specifying its trailing checksum via the command-line
arguments like so:

    $ test-tool read-midx .git/objects [checksum]

Signed-off-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-08-06 12:01:38 -07:00
Ghanshyam Thakkar 3469a23659 t: port helper/test-hashmap.c to unit-tests/t-hashmap.c
helper/test-hashmap.c along with t0011-hashmap.sh test the hashmap.h
library. Migrate them to the unit testing framework for better
debugging, runtime performance and concise code.

Along with the migration, make 'add' tests from the shell script order
agnostic in unit tests, since they iterate over entries with the same
keys and we do not guarantee the order. This was already done for the
'iterate' tests[1].

The helper/test-hashmap.c is still not removed because it contains a
performance test meant to be run by the user directly (not used in
t/perf). And it makes sense for such a utility to be a helper.

[1]: e1e7a77141 (t: sort output of hashmap iteration, 2019-07-30)

Mentored-by: Christian Couder <chriscool@tuxfamily.org>
Mentored-by: Kaartic Sivaraam <kaartic.sivaraam@gmail.com>
Helped-by: Josh Steadmon <steadmon@google.com>
Helped-by: Junio C Hamano <gitster@pobox.com>
Helped-by: Phillip Wood <phillip.wood123@gmail.com>
Signed-off-by: Ghanshyam Thakkar <shyamthakkar001@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-08-06 09:25:54 -07:00
Kyle Lippincott b928d57ca9 set errno=0 before strtoX calls
To detect conversion failure after calls to functions like `strtod`, one
can check `errno == ERANGE`. These functions are not guaranteed to set
`errno` to `0` on successful conversion, however. Manual manipulation of
`errno` can likely be avoided by checking that the output pointer
differs from the input pointer, but that's not how other locations, such
as parse.c:139, handle this issue; they set errno to 0 prior to
executing the function.

For every place I could find a strtoX function with an ERANGE check
following it, set `errno = 0;` prior to executing the conversion
function.

Signed-off-by: Kyle Lippincott <spectral@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-08-05 10:59:20 -07:00
Chandra Pratap ec9c0704fc t: move reftable/tree_test.c to the unit testing framework
reftable/tree_test.c exercises the functions defined in
reftable/tree.{c, h}. Migrate reftable/tree_test.c to the unit
testing framework. Migration involves refactoring the tests to use
the unit testing framework instead of reftable's test framework and
renaming the tests to align with unit-tests' standards.

Also add a comment to help understand the test routine.

Note that this commit mostly moves the test from reftable/ to
t/unit-tests/ and most of the refactoring is performed by the
trailing commits.

Mentored-by: Patrick Steinhardt <ps@pks.im>
Mentored-by: Christian Couder <chriscool@tuxfamily.org>
Signed-off-by: Chandra Pratap <chandrapratap3519@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-08-04 09:50:26 -07:00
Chandra Pratap a08ea27cd0 t: move reftable/pq_test.c to the unit testing framework
reftable/pq_test.c exercises a priority queue defined by
reftable/pq.{c, h}. Migrate reftable/pq_test.c to the unit testing
framework. Migration involves refactoring the tests to use the unit
testing framework instead of reftable's test framework, and
renaming the tests to align with unit-tests' standards.

Mentored-by: Patrick Steinhardt <ps@pks.im>
Mentored-by: Christian Couder <chriscool@tuxfamily.org>
Signed-off-by: Chandra Pratap <chandrapratap3519@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-08-01 09:07:29 -07:00
Patrick Steinhardt 11f841c1cc t/test-repository: fix leaking repository
The test-repository test helper zeroes out `the_repository` such that it
can be sure that our codebase only ends up using the supplied repository
that we initialize in the respective helper functions. This does cause
memory leaks though as the data that `the_repository` has been holding
onto is not referenced anymore.

Fix this by calling `repo_clear()` instead.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-08-01 08:47:37 -07:00