development/git - HydraGit

mirror of https://github.com/git/git synced 2024-08-27 11:39:22 +00:00

Author	SHA1	Message	Date
Junio C Hamano	dc8ce995a2	Merge branch 'ps/worktree-refdb-initialization' Instead of manually creating refs/ hierarchy on disk upon a creation of a secondary worktree, which is only usable via the files backend, use the refs API to populate it. * ps/worktree-refdb-initialization: builtin/worktree: create refdb via ref backend worktree: expose interface to look up worktree by name builtin/worktree: move setup of commondir file earlier refs/files: skip creation of "refs/{heads,tags}" for worktrees setup: move creation of "refs/" into the files backend refs: prepare `refs_init_db()` for initializing worktree refs	2024-01-26 08:54:46 -08:00
Junio C Hamano	32c6fc3e30	Merge branch 'ps/refstorage-extension' Introduce a new extension "refstorage" so that we can mark a repository that uses a non-default ref backend, like reftable. * ps/refstorage-extension: t9500: write "extensions.refstorage" into config builtin/clone: introduce `--ref-format=` value flag builtin/init: introduce `--ref-format=` value flag builtin/rev-parse: introduce `--show-ref-format` flag t: introduce GIT_TEST_DEFAULT_REF_FORMAT envvar setup: introduce GIT_DEFAULT_REF_FORMAT envvar setup: introduce "extensions.refStorage" extension setup: set repository's formats on init setup: start tracking ref storage format refs: refactor logic to look up storage backends worktree: skip reading HEAD when repairing worktrees t: introduce DEFAULT_REPO_FORMAT prereq	2024-01-16 10:11:57 -08:00
Junio C Hamano	492ee03f60	Merge branch 'en/header-cleanup' Remove unused header "#include". * en/header-cleanup: treewide: remove unnecessary includes in source files treewide: add direct includes currently only pulled in transitively trace2/tr2_tls.h: remove unnecessary include submodule-config.h: remove unnecessary include pkt-line.h: remove unnecessary include line-log.h: remove unnecessary include http.h: remove unnecessary include fsmonitor--daemon.h: remove unnecessary includes blame.h: remove unnecessary includes archive.h: remove unnecessary include treewide: remove unnecessary includes in source files treewide: remove unnecessary includes from header files	2024-01-08 14:05:15 -08:00
Patrick Steinhardt	2eb1d0c452	refs/files: skip creation of "refs/{heads,tags}" for worktrees The files ref backend will create both "refs/heads" and "refs/tags" in the Git directory. While this logic makes sense for normal repositories, it does not for worktrees because those refs are "common" refs that would always be contained in the main repository's ref database. Introduce a new flag telling the backend that it is expected to create a per-worktree ref database and skip creation of these dirs in the files backend when the flag is set. No other backends (currently) need worktree-specific logic, so this is the only required change to start creating per-worktree ref databases via `refs_init_db()`. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2024-01-08 13:17:30 -08:00
Patrick Steinhardt	c358d165f2	setup: move creation of "refs/" into the files backend When creating the ref database we unconditionally create the "refs/" directory in "setup.c". This is a mandatory prerequisite for all Git repositories regardless of the ref backend in use, because Git will be unable to detect the directory as a repository if "refs/" doesn't exist. We are about to add another new caller that will want to create a ref database when creating worktrees. We would require the same logic to create the "refs/" directory even though the caller really should not care about such low-level details. Ideally, the ref database should be fully initialized after calling `refs_init_db()`. Move the code to create the directory into the files backend itself to make it so. This means that future ref backends will also need to have equivalent logic around to ensure that the directory exists, but it seems a lot more sensible to have it this way round than to require callers to create the directory themselves. An alternative to this would be to create "refs/" in `refs_init_db()` directly. This feels conceptually unclean though as the creation of the refdb is now cluttered across different callsites. Furthermore, both the "files" and the upcoming "reftable" backend write backend-specific data into the "refs/" directory anyway, so splitting up this logic would only make it harder to reason about. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2024-01-08 13:17:30 -08:00
Patrick Steinhardt	2e573d61ff	refs: prepare `refs_init_db()` for initializing worktree refs The purpose of `refs_init_db()` is to initialize the on-disk files of a new ref database. The function is quite inflexible right now though, as callers can neither specify the `struct ref_store` nor can they pass any flags. Refactor the interface to accept both of these. This will be required so that we can start initializing per-worktree ref databases via the ref backend instead of open-coding the initialization in "worktree.c". Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2024-01-08 13:17:30 -08:00
Patrick Steinhardt	0fcc285c5e	refs: refactor logic to look up storage backends In order to look up ref storage backends, we're currently using a linked list of backends, where each backend is expected to set up its `next` pointer to the next ref storage backend. This is kind of a weird setup as backends need to be aware of other backends without much of a reason. Refactor the code so that the array of backends is centrally defined in "refs.c", where each backend is now identified by an integer constant. Expose functions to translate from those integer constants to the name and vice versa, which will be required by subsequent patches. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2024-01-02 09:24:47 -08:00
Elijah Newren	eea0e59ffb	treewide: remove unnecessary includes in source files Each of these were checked with gcc -E -I. ${SOURCE_FILE} \| grep ${HEADER_FILE} to ensure that removing the direct inclusion of the header actually resulted in that header no longer being included at all (i.e. that no other header pulled it in transitively). ...except for a few cases where we verified that although the header was brought in transitively, nothing from it was directly used in that source file. These cases were: * builtin/credential-cache.c * builtin/pull.c * builtin/send-pack.c Signed-off-by: Elijah Newren <newren@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2023-12-26 12:04:31 -08:00
Patrick Steinhardt	29a186917b	refs: remove `delete_refs` callback from backends Now that `refs_delete_refs` is implemented in a generic way via the ref transaction interfaces there are no callers left that invoke the `delete_refs` callback anymore. Remove it from all of our backends. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2023-11-17 10:12:12 +09:00
Patrick Steinhardt	d6f8e72982	refs: deduplicate code to delete references Both the files and the packed-refs reference backends now use the same generic transactions-based code to delete references. Let's pull these implementations up into `refs_delete_refs()` to deduplicate the code. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2023-11-17 10:12:12 +09:00
Patrick Steinhardt	e85e5dd78a	refs/files: use transactions to delete references In the `files_delete_refs()` callback function of the files backend we implement deletion of references. This is done in two steps: 1. We lock the packed-refs file and delete all references from it in a single transaction. 2. We delete all loose references via separate calls to `refs_delete_ref()`. These steps essentially duplicate the logic around locking and deletion order that we already have in the transactional interfaces, where we do know to lock and evict references from the packed-refs file. Despite the fact that we duplicate the logic, it's also less efficient than if we used a single generic transaction: - The transactional interface knows to skip locking of the packed refs in case they don't contain any of the refs which are about to be deleted. - We end up creating N+1 separate reference transactions, one for the packed-refs file and N for the individual loose references. Refactor the code to instead delete references via a single transaction. As we don't assert the expected old object ID this is equivalent to the previous behaviour, and we already do the same in the packed-refs backend. Despite the fact that the result is simpler to reason about, this change also results in improved performance. The following benchmarks have been executed in linux.git: ``` $ hyperfine -n '{rev}, packed={packed} refcount={refcount}' \ -L packed true,false -L refcount 1,1000 -L rev master,pks-ref-store-generic-delete-refs \ --setup 'git -C /home/pks/Development/git switch --detach {rev} && make -C /home/pks/Development/git -j17' \ --prepare 'printf "create refs/heads/new-branch-%d HEAD\n" $(seq {refcount}) \| git -C /home/pks/Reproduction/linux.git update-ref --stdin && if test {packed} = true; then git pack-refs --all; fi' \ --warmup=10 \ '/home/pks/Development/git/bin-wrappers/git -C /home/pks/Reproduction/linux.git branch -d new-branch-{1..{refcount}}' Benchmark 1: master packed=true refcount=1 Time (mean ± σ): 7.8 ms ± 1.6 ms [User: 3.4 ms, System: 4.4 ms] Range (min … max): 5.5 ms … 11.0 ms 120 runs Benchmark 2: master packed=false refcount=1 Time (mean ± σ): 7.0 ms ± 1.1 ms [User: 3.2 ms, System: 3.8 ms] Range (min … max): 5.7 ms … 9.8 ms 180 runs Benchmark 3: master packed=true refcount=1000 Time (mean ± σ): 283.8 ms ± 5.2 ms [User: 45.7 ms, System: 231.5 ms] Range (min … max): 276.7 ms … 291.6 ms 10 runs Benchmark 4: master packed=false refcount=1000 Time (mean ± σ): 284.4 ms ± 5.3 ms [User: 44.2 ms, System: 233.6 ms] Range (min … max): 277.1 ms … 293.3 ms 10 runs Benchmark 5: generic-delete-refs packed=true refcount=1 Time (mean ± σ): 6.2 ms ± 1.8 ms [User: 2.3 ms, System: 3.9 ms] Range (min … max): 4.1 ms … 12.2 ms 142 runs Benchmark 6: generic-delete-refs packed=false refcount=1 Time (mean ± σ): 7.1 ms ± 1.4 ms [User: 2.8 ms, System: 4.3 ms] Range (min … max): 4.2 ms … 10.8 ms 157 runs Benchmark 7: generic-delete-refs packed=true refcount=1000 Time (mean ± σ): 198.9 ms ± 1.7 ms [User: 29.5 ms, System: 165.7 ms] Range (min … max): 196.1 ms … 201.4 ms 10 runs Benchmark 8: generic-delete-refs packed=false refcount=1000 Time (mean ± σ): 199.7 ms ± 7.8 ms [User: 32.2 ms, System: 163.2 ms] Range (min … max): 193.8 ms … 220.7 ms 10 runs Summary generic-delete-refs packed=true refcount=1 ran 1.14 ± 0.37 times faster than master packed=false refcount=1 1.15 ± 0.40 times faster than generic-delete-refs packed=false refcount=1 1.26 ± 0.44 times faster than master packed=true refcount=1 32.24 ± 9.17 times faster than generic-delete-refs packed=true refcount=1000 32.36 ± 9.29 times faster than generic-delete-refs packed=false refcount=1000 46.00 ± 13.10 times faster than master packed=true refcount=1000 46.10 ± 13.13 times faster than master packed=false refcount=1000 ``` Especially in the case where we have many references we can see a clear performance speedup of nearly 30%. This is in contrast to the stated objecive in `a27dcf89b6` (refs: make delete_refs() virtual, 2016-09-04), where the virtual `delete_refs()` function was introduced with the intent to speed things up rather than making things slower. So it seems like we have outlived the need for a virtual function. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2023-11-17 10:12:12 +09:00
Victoria Dye	2cdb796101	files-backend.c: avoid stat in 'loose_fill_ref_dir' Modify the 'readdir' loop in 'loose_fill_ref_dir' to, rather than 'stat' a file to determine whether it is a directory or not, use 'get_dtype'. Currently, the loop uses 'stat' to determine whether each dirent is a directory itself or not in order to construct the appropriate ref cache entry. If 'stat' fails (returning a negative value), the dirent is silently skipped; otherwise, 'S_ISDIR(st.st_mode)' is used to check whether the entry is a directory. On platforms that include an entry's d_type in in the 'dirent' struct, this extra 'stat' check is redundant. We can use the 'get_dtype' method to extract this information on platforms that support it (i.e. where NO_D_TYPE_IN_DIRENT is unset), and derive it with 'stat' on platforms that don't. Because 'stat' is an expensive call, this confers a modest-but-noticeable performance improvement when iterating over large numbers of refs (approximately 20% speedup in 'git for-each-ref' in a 30k ref repo). Unlike other existing usage of 'get_dtype', the 'follow_symlinks' arg is set to 1 to replicate the existing handling of symlink dirents. This unfortunately requires calling 'stat' on the associated entry regardless of platform, but symlinks in the loose ref store are highly unlikely since they'd need to be created manually by a user. Note that this patch also changes the condition for skipping creation of a ref entry from "when 'stat' fails" to "when the d_type is anything other than DT_REG or DT_DIR". If a dirent's d_type is DT_UNKNOWN (either because the platform doesn't support d_type in dirents or some other reason) or DT_LNK, 'get_dtype' will try to derive the underlying type with 'stat'. If the 'stat' fails, the d_type will remain 'DT_UNKNOWN' and dirent will be skipped. However, it will also be skipped if it is any other valid d_type (e.g. DT_FIFO for named pipes, DT_LNK for a nested symlink). Git does not handle these properly anyway, so we can safely constrain accepted types to directories and regular files. Signed-off-by: Victoria Dye <vdye@github.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2023-10-09 15:53:14 -07:00
Junio C Hamano	39fe402d67	Merge branch 'tb/refs-exclusion-and-packed-refs' Enumerating refs in the packed-refs file, while excluding refs that match certain patterns, has been optimized. * tb/refs-exclusion-and-packed-refs: ls-refs.c: avoid enumerating hidden refs where possible upload-pack.c: avoid enumerating hidden refs where possible builtin/receive-pack.c: avoid enumerating hidden references refs.h: implement `hidden_refs_to_excludes()` refs.h: let `for_each_namespaced_ref()` take excluded patterns revision.h: store hidden refs in a `strvec` refs/packed-backend.c: add trace2 counters for jump list refs/packed-backend.c: implement jump lists to avoid excluded pattern(s) refs/packed-backend.c: refactor `find_reference_location()` refs: plumb `exclude_patterns` argument throughout builtin/for-each-ref.c: add `--exclude` option ref-filter.c: parameterize match functions over patterns ref-filter: add `ref_filter_clear()` ref-filter: clear reachable list pointers after freeing ref-filter.h: provide `REF_FILTER_INIT` refs.c: rename `ref_filter`	2023-07-21 13:47:26 -07:00
Taylor Blau	b269ac53c0	refs: plumb `exclude_patterns` argument throughout The subsequent patch will want to access an optional `excluded_patterns` array within `refs/packed-backend.c` that will cull out certain references matching any of the given patterns on a best-effort basis. To do so, the refs subsystem needs to be updated to pass this value across a number of different locations. Prepare for a future patch by introducing this plumbing now, passing NULLs at top-level APIs in order to make that patch less noisy and more easily readable. Signed-off-by: Taylor Blau <me@ttaylorr.co> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2023-07-10 14:48:55 -07:00
Elijah Newren	c339932bd8	repository: remove unnecessary include of path.h This also made it clear that several .c files that depended upon path.h were missing a #include for it; add the missing includes while at it. Signed-off-by: Elijah Newren <newren@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2023-06-21 13:39:53 -07:00
Elijah Newren	bc5c5ec044	cache.h: remove this no-longer-used header Since this header showed up in some places besides just #include statements, update/clean-up/remove those other places as well. Note that compat/fsmonitor/fsm-path-utils-darwin.c previously got away with violating the rule that all files must start with an include of git-compat-util.h (or a short-list of alternate headers that happen to include it first). This change exposed the violation and caused it to stop building correctly; fix it by having it include git-compat-util.h first, as per policy. Signed-off-by: Elijah Newren <newren@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2023-06-21 13:39:53 -07:00
Junio C Hamano	cbc882ea38	Merge branch 'jc/pack-ref-exclude-include' "git pack-refs" learns "--include" and "--exclude" to tweak the ref hierarchy to be packed using pattern matching. * jc/pack-ref-exclude-include: pack-refs: teach pack-refs --include option pack-refs: teach --exclude option to exclude refs from being packed docs: clarify git-pack-refs --all will pack all refs	2023-06-13 12:29:45 -07:00
John Cai	4fe42f326e	pack-refs: teach pack-refs --include option Allow users to be more selective over which refs to pack by adding an --include option to git-pack-refs. The existing options allow some measure of selectivity. By default git-pack-refs packs all tags. --all can be used to include all refs, and the previous commit added the ability to exclude certain refs with --exclude. While these knobs give the user some selection over which refs to pack, it could be useful to give more control. For instance, a repository may have a set of branches that are rarely updated and would benefit from being packed. --include would allow the user to easily include a set of branches to be packed while leaving everything else unpacked. Signed-off-by: John Cai <johncai86@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2023-05-12 14:54:14 -07:00
John Cai	826ae79fca	pack-refs: teach --exclude option to exclude refs from being packed At GitLab, we have a system that creates ephemeral internal refs that don't live long before getting deleted. Having an option to exclude certain refs from a packed-refs file allows these internal references to be deleted much more efficiently. Add an --exclude option to the pack-refs builtin, and use the ref exclusions API to exclude certain refs from being packed into the final packed-refs file Signed-off-by: John Cai <johncai86@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2023-05-12 14:54:14 -07:00
Elijah Newren	d1cbe1e6d8	hash-ll.h: split out of hash.h to remove dependency on repository.h hash.h depends upon and includes repository.h, due to the definition and use of the_hash_algo (defined as the_repository->hash_algo). However, most headers trying to include hash.h are only interested in the layout of the structs like object_id. Move the parts of hash.h that do not depend upon repository.h into a new file hash-ll.h (the "low level" parts of hash.h), and adjust other files to use this new header where the convenience inline functions aren't needed. This allows hash.h and object.h to be fairly small, minimal headers. It also exposes a lot of hidden dependencies on both path.h (which was brought in by repository.h) and repository.h (which was previously implicitly brought in by object.h), so also adjust other files to be more explicit about what they depend upon. Signed-off-by: Elijah Newren <newren@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2023-04-24 12:47:32 -07:00
Elijah Newren	d5fff46f40	copy.h: move declarations for copy.c functions from cache.h Signed-off-by: Elijah Newren <newren@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2023-04-24 12:47:31 -07:00
Elijah Newren	87bed17907	object-file.h: move declarations for object-file.c functions from cache.h Signed-off-by: Elijah Newren <newren@gmail.com> Acked-by: Calvin Wan <calvinwan@google.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2023-04-11 08:52:10 -07:00
Elijah Newren	d48be35ca6	write-or-die.h: move declarations for write-or-die.c functions from cache.h Signed-off-by: Elijah Newren <newren@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2023-03-21 10:56:54 -07:00
Elijah Newren	e38da487cc	setup.h: move declarations for setup.c functions from cache.h Signed-off-by: Elijah Newren <newren@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2023-03-21 10:56:54 -07:00
Elijah Newren	32a8f51061	environment.h: move declarations for environment.c functions from cache.h Signed-off-by: Elijah Newren <newren@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2023-03-21 10:56:53 -07:00
Elijah Newren	d5ebb50dcb	wrapper.h: move declarations for wrapper.c functions from cache.h Signed-off-by: Elijah Newren <newren@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2023-03-21 10:56:53 -07:00
Elijah Newren	f394e093df	treewide: be explicit about dependence on gettext.h Dozens of files made use of gettext functions, without explicitly including gettext.h. This made it more difficult to find which files could remove a dependence on cache.h. Make C files explicitly include gettext.h if they are using it. However, while compat/fsmonitor/fsm-ipc-darwin.c should also gain an include of gettext.h, it was left out to avoid conflicting with an in-flight topic. Signed-off-by: Elijah Newren <newren@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2023-03-21 10:56:51 -07:00
Elijah Newren	b5fa608180	ident.h: move ident-related declarations out of cache.h These functions were all defined in a separate ident.c already, so create ident.h and move the declarations into that file. Signed-off-by: Elijah Newren <newren@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2023-02-23 17:25:29 -08:00
Elijah Newren	41771fa435	cache.h: remove dependence on hex.h; make other files include it explicitly Signed-off-by: Elijah Newren <newren@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2023-02-23 17:25:29 -08:00
Han-Wen Nienhuys	71e5473493	refs: unify parse_worktree_ref() and ref_type() The logic to handle worktree refs (worktrees/NAME/REF and main-worktree/REF) existed in two places: * ref_type() in refs.c * parse_worktree_ref() in worktree.c Collapse this logic together in one function parse_worktree_ref(): this avoids having to cross-check the result of parse_worktree_ref() and ref_type(). Introduce enum ref_worktree_type, which is slightly different from enum ref_type. The latter is a misleading name (one would think that 'ref_type' would have the symref option). Instead, enum ref_worktree_type only makes explicit how a refname relates to a worktree. From this point of view, HEAD and refs/bisect/abc are the same: they specify the current worktree implicitly. The files-backend must avoid packing refs/bisect/* and friends into packed-refs, so expose is_per_worktree_ref() separately. Signed-off-by: Han-Wen Nienhuys <hanwen@google.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2022-09-19 11:11:11 -07:00
Ævar Arnfjörð Bjarmason	5cf88fd8b0	git-compat-util.h: use "UNUSED", not "UNUSED(var)" As reported in [1] the "UNUSED(var)" macro introduced in `2174b8c75d` (Merge branch 'jk/unused-annotation' into next, 2022-08-24) breaks coccinelle's parsing of our sources in files where it occurs. Let's instead partially go with the approach suggested in [2] of making this not take an argument. As noted in [1] "coccinelle" will ignore such tokens in argument lists that it doesn't know about, and it's less of a surprise to syntax highlighters. This undoes the "help us notice when a parameter marked as unused is actually use" part of `9b24034754` (git-compat-util: add UNUSED macro, 2022-08-19), a subsequent commit will further tweak the macro to implement a replacement for that functionality. 1. https://lore.kernel.org/git/220825.86ilmg4mil.gmgdl@evledraar.gmail.com/ 2. https://lore.kernel.org/git/220819.868rnk54ju.gmgdl@evledraar.gmail.com/ Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2022-09-01 10:49:48 -07:00
Jeff King	7718827a2d	refs: mark unused virtual method parameters The refs code uses various polymorphic types (e.g., loose vs packed ref_stores, abstracted iterators). Not every virtual function or callback needs all of its parameters. Let's mark the unused ones to quiet -Wunused-parameter. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2022-08-19 12:18:55 -07:00
Jeff King	63e14ee2d6	refs: mark unused each_ref_fn parameters Functions used with for_each_ref(), etc, need to conform to the each_ref_fn interface. But most of them don't need every parameter; let's annotate the unused ones to quiet -Wunused-parameter. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2022-08-19 12:18:54 -07:00
Junio C Hamano	c6da34a610	Revert "Merge branch 'ps/avoid-unnecessary-hook-invocation-with-packed-refs'" This reverts commit `991b4d47f0`, reversing changes made to `bcd020f88e`.	2022-04-13 15:51:33 -07:00
Junio C Hamano	3d8046a820	Merge branch 'ab/refs-various-fixes' Code clean-up. * ab/refs-various-fixes: refs debug: add a wrapper for "read_symbolic_ref" packed-backend: remove stub BUG(...) functions misc *.c: use designated initializers for struct assignments refs: use designated initializers for "struct ref_iterator_vtable" refs: use designated initializers for "struct ref_storage_be"	2022-03-29 12:22:02 -07:00
Junio C Hamano	6e1a8952e9	Merge branch 'ps/fsync-refs' Updates to refs traditionally weren't fsync'ed, but we can configure using core.fsync variable to do so. * ps/fsync-refs: core.fsync: new option to harden references	2022-03-25 16:38:25 -07:00
Ævar Arnfjörð Bjarmason	e2f8acb6a0	refs: use designated initializers for "struct ref_iterator_vtable" Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2022-03-17 10:36:11 -07:00
Ævar Arnfjörð Bjarmason	32bff617c6	refs: use designated initializers for "struct ref_storage_be" Change the definition of the three refs backends we currently carry to use designated initializers. The "= NULL" assignments being retained here are redundant, and could be removed, but let's keep them for clarity. All of these backends define almost all fields, so we're not saving much in terms of line count by omitting these, but e.g. for "refs_be_debug" it's immediately apparent that we're omitting "init" when comparing its assignment to the others. This is a follow-up to similar work merged in `bd4232fac3` (Merge branch 'ab/struct-init', 2021-07-16), `a4b9fb6a5c` (Merge branch 'ab/designated-initializers-more', 2021-10-18) and `a30321b9ea` (Merge branch 'ab/designated-initializers' into ab/designated-initializers-more, 2021-09-27). Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2022-03-17 10:36:04 -07:00
Patrick Steinhardt	bc22d845c4	core.fsync: new option to harden references When writing both loose and packed references to disk we first create a lockfile, write the updated values into that lockfile, and on commit we rename the file into place. According to filesystem developers, this behaviour is broken because applications should always sync data to disk before doing the final rename to ensure data consistency [1][2][3]. If applications fail to do this correctly, a hard crash of the machine can easily result in corrupted on-disk data. This kind of corruption can in fact be easily observed with Git when the machine hard-resets shortly after writing references to disk. On machines with ext4, this will likely lead to the "empty files" problem: the file has been renamed, but its data has not been synced to disk. The result is that the reference is corrupt, and in the worst case this can lead to data loss. Implement a new option to harden references so that users and admins can avoid this scenario by syncing locked loose and packed references to disk before we rename them into place. [1]: https://thunk.org/tytso/blog/2009/03/15/dont-fear-the-fsync/ [2]: https://btrfs.wiki.kernel.org/index.php/FAQ (What are the crash guarantees of overwrite-by-rename) [3]: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/Documentation/admin-guide/ext4.rst (see auto_da_alloc) Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2022-03-15 13:30:58 -07:00
Patrick Steinhardt	0a7b38707d	refs/files-backend: optimize reading of symbolic refs When reading references via `files_read_raw_ref()` we always consult both the loose reference, and if that wasn't found, we also consult the packed-refs file. While this makes sense to read a generic reference, it is wasteful in the case where we only care about symbolic references: the packed-refs backend does not support them, and thus it cannot ever return one for us. Special-case reading of symbolic references for the files backend such that we always skip asking the packed-refs backend. We use `refs_read_symbolic_ref()` extensively to determine whether we need to skip updating local symbolic references during a fetch, which is why the change results in a significant speedup when doing fetches in repositories with huge numbers of references. The following benchmark executes a mirror-fetch in a repository with about 2 million references via `git fetch --prune --no-write-fetch-head +refs/:refs/`: Benchmark 1: HEAD~ Time (mean ± σ): 68.372 s ± 2.344 s [User: 65.629 s, System: 8.786 s] Range (min … max): 65.745 s … 70.246 s 3 runs Benchmark 2: HEAD Time (mean ± σ): 60.259 s ± 0.343 s [User: 61.019 s, System: 7.245 s] Range (min … max): 60.003 s … 60.649 s 3 runs Summary 'HEAD' ran 1.13 ± 0.04 times faster than 'HEAD~' Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2022-03-01 10:13:46 -08:00
Patrick Steinhardt	cd475b3b03	refs: add ability for backends to special-case reading of symbolic refs Reading of symbolic and non-symbolic references is currently treated the same in reference backends: we always call `refs_read_raw_ref()` and then decide based on the returned flags what type it is. This has one downside though: symbolic references may be treated different from normal references in a backend from normal references. The packed-refs backend for example doesn't even know about symbolic references, and as a result it is pointless to even ask it for one. There are cases where we really only care about whether a reference is symbolic or not, but don't care about whether it exists at all or may be a non-symbolic reference. But it is not possible to optimize for this case right now, and as a consequence we will always first check for a loose reference to exist, and if it doesn't, we'll query the packed-refs backend for a known-to-not-be-symbolic reference. This is inefficient and requires us to search all packed references even though we know to not care for the result at all. Introduce a new function `refs_read_symbolic_ref()` which allows us to fix this case. This function will only ever return symbolic references and can thus optimize for the scenario layed out above. By default, if the backend doesn't provide an implementation for it, we just use the old code path and fall back to `read_raw_ref()`. But in case the backend provides its own, more efficient implementation, we will use that one instead. Note that this function is explicitly designed to not distinguish between missing references and non-symbolic references. If it did, we'd be forced to always search the packed-refs backend to see whether the symbolic reference the user asked for really doesn't exist, or if it exists as a non-symbolic reference. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2022-03-01 10:13:46 -08:00
Junio C Hamano	991b4d47f0	Merge branch 'ps/avoid-unnecessary-hook-invocation-with-packed-refs' Because a deletion of ref would need to remove it from both the loose ref store and the packed ref store, a delete-ref operation that logically removes one ref may end up invoking ref-transaction hook twice, which has been corrected. * ps/avoid-unnecessary-hook-invocation-with-packed-refs: refs: skip hooks when deleting uncovered packed refs refs: do not execute reference-transaction hook on packing refs refs: demonstrate excessive execution of the reference-transaction hook refs: allow skipping the reference-transaction hook refs: allow passing flags when beginning transactions refs: extract packed_refs_delete_refs() to allow control of transaction	2022-02-18 13:53:27 -08:00
Ævar Arnfjörð Bjarmason	ce14de03db	refs API: remove "failure_errno" from refs_resolve_ref_unsafe() Remove the now-unused "failure_errno" parameter from the refs_resolve_ref_unsafe() signature. In my recent `96f6623ada` (Merge branch 'ab/refs-errno-cleanup', 2021-11-29) series we made all of its callers explicitly request the errno via an output parameter. As that series shows all but one caller ended up passing in a boilerplate "ignore_errno", since they only cared about whether the return value was NULL or not, i.e. if the ref could be resolved. There was one small issue with that series fixed with a follow-up in `31e3912369` (Merge branch 'ab/refs-errno-cleanup', 2022-01-14) a small bug in that series was fixed. After those two there was one caller left in sequencer.c that used the "failure_errno', but as of the preceding commit it uses a boilerplate "ignore_errno" instead. This leaves the public refs API without any use of "failure_errno" at all. We could still do with a bit of cleanup and generalization between refs.c and refs/files-backend.c before the "reftable" integration lands, but that's all internal to the reference code itself. So let's remove this output parameter. Not only isn't it used now, but it's unlikely that we'll want it again in the future. We'd like to slowly move the refs API to a more file-backend independent way of communicating error codes, having it use a "failure_errno" was only the first step in that direction. If this or any other function needs to communicate what specifically is wrong with the requested "refname" it'll be better to have the function set some output enum of well-defined error states than piggy-backend on "errno". Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2022-01-26 15:58:41 -08:00
Patrick Steinhardt	2ed1b64ebd	refs: skip hooks when deleting uncovered packed refs When deleting refs from the loose-files refs backend, then we need to be careful to also delete the same ref from the packed refs backend, if it exists. If we don't, then deleting the loose ref would "uncover" the packed ref. We thus always have to queue up deletions of refs for both the loose and the packed refs backend. This is done in two separate transactions, where the end result is that the reference-transaction hook is executed twice for the deleted refs. This behaviour is quite misleading: it's exposing implementation details of how the files backend works to the user, in contrast to the logical updates that we'd really want to expose via the hook. Worse yet, whether the hook gets executed once or twice depends on how well-packed the repository is: if the ref only exists as a loose ref, then we execute it once, otherwise if it is also packed then we execute it twice. Fix this behaviour and don't execute the reference-transaction hook at all when refs in the packed-refs backend if it's driven by the files backend. This works as expected even in case the refs to be deleted only exist in the packed-refs backend because the loose-backend always queues refs in its own transaction even if they don't exist such that they can be locked for concurrent creation. And it also does the right thing in case neither of the backends has the ref because that would cause the transaction to fail completely. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2022-01-17 11:01:45 -08:00
Patrick Steinhardt	ffad994138	refs: do not execute reference-transaction hook on packing refs The reference-transaction hook is supposed to track logical changes to references, but it currently also gets executed when packing refs in a repository. This is unexpected and ultimately not all that useful: packing refs is not supposed to result in any user-visible change to the refs' state, and it ultimately is an implementation detail of how refs stores work. Fix this excessive execution of the hook when packing refs. Reported-by: Waleed Khan <me@waleedkhan.name> Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2022-01-17 11:01:45 -08:00
Patrick Steinhardt	fbe73f61cb	refs: allow passing flags when beginning transactions We do not currently have any flags when creating reference transactions, but we'll add one to disable execution of the reference transaction hook in some cases. Allow passing flags to `ref_store_transaction_begin()` to prepare for this change. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2022-01-17 11:01:44 -08:00
Patrick Steinhardt	69840cc0f7	refs: extract packed_refs_delete_refs() to allow control of transaction When deleting loose refs, then we also have to delete the refs in the packed backend. This is done by calling `refs_delete_refs()`, which then uses the packed-backend's logic to delete refs. This doesn't allow us to exercise any control over the reference transaction which is being created in the packed backend, which is required in a subsequent commit. Extract a new function `packed_refs_delete_refs()`, which hosts most of the logic to delete refs except for creating the transaction itself. Like this, we can easily create the transaction in the files backend and thus exert more control over it. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2022-01-17 11:01:44 -08:00
Junio C Hamano	31e3912369	Merge branch 'ab/refs-errno-cleanup' A brown-paper-bag fix on top of a topic that was merged during this cycle. * ab/refs-errno-cleanup: refs API: use "failure_errno", not "errno"	2022-01-14 15:25:15 -08:00
Ævar Arnfjörð Bjarmason	cac15b3fb4	refs API: use "failure_errno", not "errno" Fix a logic error in refs_resolve_ref_unsafe() introduced in a recent series of mine to abstract the refs API away from errno. See `96f6623ada` (Merge branch 'ab/refs-errno-cleanup', 2021-11-29)for that series. In that series introduction of "failure_errno" to refs_resolve_ref_unsafe came in `ef18119dec` (refs API: add a version of refs_resolve_ref_unsafe() with "errno", 2021-10-16). There we'd set "errno = 0" immediately before refs_read_raw_ref(), and then set "failure_errno" to "errno" if errno was non-zero afterwards. Then in the next commit `8b72fea7e9` (refs API: make refs_read_raw_ref() not set errno, 2021-10-16) we started expecting "refs_read_raw_ref()" to set "failure_errno". It would do that if refs_read_raw_ref() failed, but it wouldn't be the same errno. So we might set the "errno" here to any arbitrary bad value, and end up e.g. returning NULL when we meant to return the refname from refs_resolve_ref_unsafe(), or the other way around. Instrumenting this code will reveal cases where refs_read_raw_ref() will fail, and "errno" and "failure_errno" will be set to different values. In practice I haven't found a case where this scary bug changed anything in practice. The reason for that is that we'll not care about the actual value of "errno" here per-se, but only whether: 1. We have an errno 2. If it's one of ENOENT, EISDIR or ENOTDIR. See the adjacent code added in `a1c1d8170d` (refs_resolve_ref_unsafe: handle d/f conflicts for writes, 2017-10-06) I.e. if we clobber "failure_errno" with "errno", but it happened to be one of those three, and we'll clobber it with another one of the three we were OK. Perhaps there are cases where the difference ended up mattering, but I haven't found them. Instrumenting the test suite to fail if "errno" and "failure_errno" are different shows a lot of failures, checking if they're different and one is but not the other is outside that list of three "errno" values yields no failures. But let's fix the obvious bug. We should just stop paying attention to "errno" in refs_resolve_ref_unsafe(). In addition let's change the partial resetting of "errno" in files_read_raw_ref() to happen just before the "return", to ensure that any such bug will be more easily spotted in the future. Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2022-01-13 10:53:54 -08:00
Junio C Hamano	626f2cabe6	Merge branch 'ab/reflog-prep' Code refactoring in the reflog part of refs API. * ab/reflog-prep: reflog + refs-backend: move "verbose" out of the backend refs files-backend: assume cb->newlog if !EXPIRE_REFLOGS_DRY_RUN reflog: reduce scope of "struct rev_info" reflog expire: don't use lookup_commit_reference_gently() reflog expire: refactor & use "tip_commit" only for UE_NORMAL reflog expire: use "switch" over enum values reflog: change one->many worktree->refnames to use a string_list reflog expire: narrow scope of "cb" in cmd_reflog_expire() reflog delete: narrow scope of "cmd" passed to count_reflog_ent()	2022-01-10 11:52:52 -08:00

1 2 3 4 5 ...

431 commits