development/git - HydraGit

mirror of https://github.com/git/git synced 2024-10-05 16:19:28 +00:00

Author	SHA1	Message	Date
Junio C Hamano	863c596e68	Merge branch 'jc/sparse-checkout-set-default-fix' "git sparse-checkout set" added default patterns even when the patterns are being fed from the standard input, which has been corrected. * jc/sparse-checkout-set-default-fix: sparse-checkout: use default patterns for 'set' only !stdin	2024-01-08 14:05:16 -08:00
Junio C Hamano	492ee03f60	Merge branch 'en/header-cleanup' Remove unused header "#include". * en/header-cleanup: treewide: remove unnecessary includes in source files treewide: add direct includes currently only pulled in transitively trace2/tr2_tls.h: remove unnecessary include submodule-config.h: remove unnecessary include pkt-line.h: remove unnecessary include line-log.h: remove unnecessary include http.h: remove unnecessary include fsmonitor--daemon.h: remove unnecessary includes blame.h: remove unnecessary includes archive.h: remove unnecessary include treewide: remove unnecessary includes in source files treewide: remove unnecessary includes from header files	2024-01-08 14:05:15 -08:00
Junio C Hamano	9decd56cc9	Merge branch 'ml/doc-merge-updates' Doc update. * ml/doc-merge-updates: Documentation/git-merge.txt: use backticks for command wrapping Documentation/git-merge.txt: fix reference to synopsis	2024-01-08 14:05:15 -08:00
Junio C Hamano	6bf317df4b	Merge branch 'jc/archive-list-with-extra-args' "git archive --list extra garbage" silently ignored excess command line parameters, which has been corrected. * jc/archive-list-with-extra-args: archive: "--list" does not take further options	2024-01-08 14:05:14 -08:00
Tamino Bauknecht	39487a1510	fetch: add new config option fetch.all Introduce a boolean configuration option fetch.all which allows to fetch all available remotes by default. The config option can be overridden by explicitly specifying a remote or by using --no-all. The behavior for --all is unchanged and calling git-fetch with --all and a remote will still result in an error. Additionally, describe the configuration variable in the config documentation and implement new tests to cover the expected behavior. Also add --no-all to the command-line documentation of git-fetch. Signed-off-by: Tamino Bauknecht <dev@tb6.eu> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2024-01-08 13:36:23 -08:00
Patrick Steinhardt	8f4c00de95	builtin/worktree: create refdb via ref backend When creating a worktree we create the worktree's ref database manually by first writing a "HEAD" file so that the directory is recognized as a Git repository by other commands, and then running git-update-ref(1) or git-symbolic-ref(1) to write the actual value. But while this is fine for the files backend, this logic simply assumes too much about how the ref backend works and will leave behind an invalid ref database once any other ref backend lands. Refactor the code to instead use `refs_init_db()` to initialize the ref database so that git-worktree(1) itself does not need to know about how to initialize it. This will allow future ref backends to customize how the per-worktree ref database is set up. Furthermore, as we now already have a worktree ref store around, we can also avoid spawning external commands to write the HEAD reference and instead use the refs API to do so. Note that we do not have an equivalent to passing the `--quiet` flag to git-symbolic-ref(1) as we did before. This flag does not have an effect anyway though, as git-symbolic-ref(1) only honors it when reading a symref, but never when writing one. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2024-01-08 13:17:30 -08:00
Patrick Steinhardt	b8a846b2e0	worktree: expose interface to look up worktree by name Our worktree interfaces do not provide a way to look up a worktree by its name. Expose `get_linked_worktree()` to allow for this usecase. As callers are responsible for freeing this worktree, introduce a new function `free_worktree()` that does so. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2024-01-08 13:17:30 -08:00
Patrick Steinhardt	84f0ea956f	builtin/worktree: move setup of commondir file earlier Shuffle around how we create supporting worktree files so that we first ensure that the worktree has all link files ("gitdir", "commondir") before we try to initialize the ref database by writing "HEAD". This will be required by a subsequent commit where we start to initialize the ref database via `refs_init_db()`, which will require an initialized `struct worktree *`. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2024-01-08 13:17:30 -08:00
Patrick Steinhardt	2eb1d0c452	refs/files: skip creation of "refs/{heads,tags}" for worktrees The files ref backend will create both "refs/heads" and "refs/tags" in the Git directory. While this logic makes sense for normal repositories, it does not for worktrees because those refs are "common" refs that would always be contained in the main repository's ref database. Introduce a new flag telling the backend that it is expected to create a per-worktree ref database and skip creation of these dirs in the files backend when the flag is set. No other backends (currently) need worktree-specific logic, so this is the only required change to start creating per-worktree ref databases via `refs_init_db()`. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2024-01-08 13:17:30 -08:00
Patrick Steinhardt	c358d165f2	setup: move creation of "refs/" into the files backend When creating the ref database we unconditionally create the "refs/" directory in "setup.c". This is a mandatory prerequisite for all Git repositories regardless of the ref backend in use, because Git will be unable to detect the directory as a repository if "refs/" doesn't exist. We are about to add another new caller that will want to create a ref database when creating worktrees. We would require the same logic to create the "refs/" directory even though the caller really should not care about such low-level details. Ideally, the ref database should be fully initialized after calling `refs_init_db()`. Move the code to create the directory into the files backend itself to make it so. This means that future ref backends will also need to have equivalent logic around to ensure that the directory exists, but it seems a lot more sensible to have it this way round than to require callers to create the directory themselves. An alternative to this would be to create "refs/" in `refs_init_db()` directly. This feels conceptually unclean though as the creation of the refdb is now cluttered across different callsites. Furthermore, both the "files" and the upcoming "reftable" backend write backend-specific data into the "refs/" directory anyway, so splitting up this logic would only make it harder to reason about. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2024-01-08 13:17:30 -08:00
Patrick Steinhardt	2e573d61ff	refs: prepare `refs_init_db()` for initializing worktree refs The purpose of `refs_init_db()` is to initialize the on-disk files of a new ref database. The function is quite inflexible right now though, as callers can neither specify the `struct ref_store` nor can they pass any flags. Refactor the interface to accept both of these. This will be required so that we can start initializing per-worktree ref databases via the ref backend instead of open-coding the initialization in "worktree.c". Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2024-01-08 13:17:30 -08:00
Junio C Hamano	5bf20d6c77	Merge branch 'ps/refstorage-extension' into ps/worktree-refdb-initialization * ps/refstorage-extension: t9500: write "extensions.refstorage" into config builtin/clone: introduce `--ref-format=` value flag builtin/init: introduce `--ref-format=` value flag builtin/rev-parse: introduce `--show-ref-format` flag t: introduce GIT_TEST_DEFAULT_REF_FORMAT envvar setup: introduce GIT_DEFAULT_REF_FORMAT envvar setup: introduce "extensions.refStorage" extension setup: set repository's formats on init setup: start tracking ref storage format refs: refactor logic to look up storage backends worktree: skip reading HEAD when repairing worktrees t: introduce DEFAULT_REPO_FORMAT prereq builtin/clone: create the refdb with the correct object format builtin/clone: skip reading HEAD when retrieving remote builtin/clone: set up sparse checkout later builtin/clone: fix bundle URIs with mismatching object formats remote-curl: rediscover repository when fetching refs setup: allow skipping creation of the refdb setup: extract function to create the refdb	2024-01-08 12:58:54 -08:00
Patrick Steinhardt	cd69c635a1	ci: add job performing static analysis on GitLab CI Our GitHub Workflows definitions have a static analysis job that runs the following tasks: - Coccinelle to check for suggested refactorings. - `make hdr-check` to check for missing includes or forward declarations in our header files. - `make check-pot` to check our translations for issues. - `./ci/check-directional-formatting.bash` to check whether our sources contain any Unicode directional formatting code points. Add an equivalent job to our GitLab CI definitions. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2024-01-08 11:23:03 -08:00
Patrick Steinhardt	fc134b41ce	git-prompt: stop manually parsing HEAD with unknown ref formats We're manually parsing the HEAD reference in git-prompt to figure out whether it is a symbolic or direct reference. This makes it intimately tied to the on-disk format we use to store references and will stop working once we gain additional reference backends in the Git project. Ideally, we would refactor the code to exclusively use plumbing tools to read refs such that we do not have to care about the on-disk format at all. Unfortunately though, spawning processes can be quite expensive on some systems like Windows. As the Git prompt logic may be executed quite frequently we try very hard to spawn as few processes as possible. This refactoring is thus out of question for now. Instead, condition the logic on the repository's ref format: if the repo uses the the "files" backend we can continue to use the old logic and read the respective files from disk directly. If it's anything else, then we use git-symbolic-ref(1) to read the value of HEAD. This change makes the Git prompt compatible with the upcoming "reftable" format. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2024-01-08 11:21:45 -08:00
Junio C Hamano	4081d45d7f	Merge branch 'ps/refstorage-extension' into ps/prompt-parse-HEAD-futureproof * ps/refstorage-extension: t9500: write "extensions.refstorage" into config builtin/clone: introduce `--ref-format=` value flag builtin/init: introduce `--ref-format=` value flag builtin/rev-parse: introduce `--show-ref-format` flag t: introduce GIT_TEST_DEFAULT_REF_FORMAT envvar setup: introduce GIT_DEFAULT_REF_FORMAT envvar setup: introduce "extensions.refStorage" extension setup: set repository's formats on init setup: start tracking ref storage format refs: refactor logic to look up storage backends worktree: skip reading HEAD when repairing worktrees t: introduce DEFAULT_REPO_FORMAT prereq	2024-01-08 11:21:18 -08:00
Rubén Justo	5aea3955bc	branch: clarify <oldbranch> term Since `52d59cc645` (branch: add a --copy (-c) option to go with --move (-m), 2017-06-18) <oldbranch> is used in more operations than just -m. Let's also clarify what we do if <oldbranch> is omitted. Signed-off-by: Rubén Justo <rjusto@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2024-01-08 10:06:05 -08:00
Illia Bobyr	25aec06326	rebase: clarify --reschedule-failed-exec default Documentation should mention the default behavior. It is better to explain the persistent nature of the --reschedule-failed-exec flag from the user standpoint, rather than from the implementation standpoint. Signed-off-by: Illia Bobyr <illia.bobyr@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2024-01-05 09:41:25 -08:00
Jeff King	993d38a066	index-pack: spawn threads atomically The t5309 script triggers a racy false positive with SANITIZE=leak on a multi-core system. Running with "--stress --run=6" usually fails within 10 seconds or so for me, complaining with something like: + git index-pack --fix-thin --stdin fatal: REF_DELTA at offset 46 already resolved (duplicate base 01d7713666f4de822776c7622c10f1b07de280dc?) ================================================================= ==3904583==ERROR: LeakSanitizer: detected memory leaks Direct leak of 32 byte(s) in 1 object(s) allocated from: #0 0x7fa790d01986 in __interceptor_realloc ../../../../src/libsanitizer/lsan/lsan_interceptors.cpp:98 #1 0x7fa790add769 in __pthread_getattr_np nptl/pthread_getattr_np.c:180 #2 0x7fa790d117c5 in __sanitizer::GetThreadStackTopAndBottom(bool, unsigned long, unsigned long) ../../../../src/libsanitizer/sanitizer_common/sanitizer_linux_libcdep.cpp:150 #3 0x7fa790d11957 in __sanitizer::GetThreadStackAndTls(bool, unsigned long, unsigned long, unsigned long, unsigned long) ../../../../src/libsanitizer/sanitizer_common/sanitizer_linux_libcdep.cpp:598 #4 0x7fa790d03fe8 in __lsan::ThreadStart(unsigned int, unsigned long long, __sanitizer::ThreadType) ../../../../src/libsanitizer/lsan/lsan_posix.cpp:51 #5 0x7fa790d013fd in __lsan_thread_start_func ../../../../src/libsanitizer/lsan/lsan_interceptors.cpp:440 #6 0x7fa790adc3eb in start_thread nptl/pthread_create.c:444 #7 0x7fa790b5ca5b in clone3 ../sysdeps/unix/sysv/linux/x86_64/clone3.S:81 SUMMARY: LeakSanitizer: 32 byte(s) leaked in 1 allocation(s). Aborted What happens is this: 0. We construct a bogus pack with a duplicate object in it and trigger index-pack. 1. We spawn a bunch of worker threads to resolve deltas (on my system it is 16 threads). 2. One of the threads sees the duplicate object and bails by calling exit(), taking down all of the threads. This is expected and is the point of the test. 3. At the time exit() is called, we may still be spawning threads from the main process via pthread_create(). LSan hooks thread creation to update its book-keeping; it has to know where each thread's stack is (so it can find entry points for reachable memory). So it calls pthread_getattr_np() to get information about the new thread. That may allocate memory that must be freed with a matching call to pthread_attr_destroy(). Probably LSan does that immediately, but if you're unlucky enough, the exit() will happen while it's between those two calls, and the allocated pthread_attr_t appears as a leak. This isn't a real leak. It's not even in our code, but rather in the LSan instrumentation code. So we could just ignore it. But the false positive can cause people to waste time tracking it down. It's possibly something that LSan could protect against (e.g., cover the getattr/destroy pair with a mutex, and then in the final post-exit() check for leaks try to take the same mutex). But I don't know enough about LSan to say if that's a reasonable approach or not (or if my analysis is even completely correct). In the meantime, it's pretty easy to avoid the race by making creation of the worker threads "atomic". That is, we'll spawn all of them before letting any of them start to work. That's easy to do because we already have a work_lock() mutex for handing out that work. If the main process takes it, then all of the threads will immediately block until we've finished spawning and released it. This shouldn't make any practical difference for non-LSan runs. The thread spawning is quick, and could happen before any worker thread gets scheduled anyway. Probably other spots that use threads are subject to the same issues. But since we have to manually insert locking (and since this really is kind of a hack), let's not bother with them unless somebody experiences a similar racy false-positive in practice. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2024-01-05 08:40:56 -08:00
Jeff King	d70f554cdf	commit-graph: retain commit slab when closing NULL commit_graph This fixes a regression introduced in `ac6d45d11f` (commit-graph: move slab-clearing to close_commit_graph(), 2023-10-03), in which running: git -c fetch.writeCommitGraph=true fetch --recurse-submodules multiple times in a freshly cloned repository causes a segfault. What happens in the second (and subsequent) runs is this: 1. We make a "struct commit" for any ref tips which we're storing (even if we already have them, they still go into FETCH_HEAD). Because the first run will have created a commit graph, we'll find those commits in the graph. The commit struct is therefore created with a NULL "maybe_tree" entry, because we can load its oid from the graph later. But to do that we need to remember that we got the commit from the graph, which is recorded in a global commit_graph_data_slab object. 2. Because we're using --recurse-submodules, we'll try to fetch each of the possible submodules. That implies creating a separate "struct repository" in-process for each submodule, which will require a later call to repo_clear(). The call to repo_clear() calls raw_object_store_clear(), which in turn calls close_object_store(), which in turn calls close_commit_graph(). And the latter frees the commit graph data slab. 3. Later, when trying to write out a new commit graph, we'll ask for their tree oid via get_commit_tree_oid(), which will see that the object is parsed but with a NULL maybe_tree field. We'd then usually pull it from the graph file, but because the slab was cleared, we don't realize that we can do so! We end up returning NULL and segfaulting. (It seems questionable that we'd write a graph entry for such a commit anyway, since we know we already have one. I didn't double-check, but that may simply be another side effect of having cleared the slab). The bug is in step (2) above. We should not be clearing the slab when cleaning up the submodule repository structs. Prior to `ac6d45d11f`, we did not do so because it was done inside a helper function that returned early when it saw NULL. So the behavior change from that commit is that we'll now _always_ clear the slab via repo_clear(), even if the repository being closed did not have a commit graph (and thus would have a NULL commit_graph struct). The most immediate fix is to add in a NULL check in close_commit_graph(), making it a true noop when passed in an object_store with a NULL commit_graph (it's OK to just return early, since the rest of its code is already a noop when passed NULL). That restores the pre-ac6d45d11f behavior. And that's what this patch does, along with a test that exercises it (we already have a test that uses submodules along with fetch.writeCommitGraph, but the bug only triggers when there is a subsequent fetch and when that fetch uses --recurse-submodules). So that fixes the regression in the least-risky way possible. I do think there's some fragility here that we might want to follow up on. We have a global commit_graph_data_slab that contains graph positions, and our global commit structs depend on the that slab remaining valid. But close_commit_graph() is just about closing _one_ object store's graph. So it's dangerous to call that function and clear the slab without also throwing away any "struct commit" we might have parsed that depends on it. Which at first glance seems like a bug we could already trigger. In the situation described here, there is no commit graph in the submodule repository, so our commit graph is NULL (in fact, in our test script there is no submodule repo at all, so we immediately return from repo_init() and call repo_clear() only to free up memory). But what would happen if there was one? Wouldn't we see a non-NULL commit_graph entry, and then clear the global slab anyway? The answer is "no", but for very bizarre reasons. Remember that repo_clear() calls raw_object_store_clear(), which then calls close_object_store() and thus close_commit_graph(). But before it does so, raw_object_store_clear() does something else: it frees the commit graph and sets it to NULL! So by this code path we'll _never_ see a non-NULL commit_graph struct, and thus never clear the slab. So it happens to work out. But it still seems questionable to me that we would clear a global slab (which might still be in use) when closing the commit graph. This clearing comes from `957ba814bf` (commit-graph: when closing the graph, also release the slab, 2021-09-08), and was fixing a case where we really did need it to be closed (and in that case we presumably call close_object_store() more directly). So I suspect there may still be a bug waiting to happen there, as any object loaded before the call to close_object_store() may be stranded with a bogus maybe_tree entry (and thus looking at it after the call might cause an error). But I'm not sure how to trigger it, nor what the fix should look like (you probably would need to "unparse" any objects pulled from the graph). And so this patch punts on that for now in favor of fixing the recent regression in the most direct way, which should not have any other fallouts. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2024-01-05 08:35:26 -08:00
Chandra Pratap	556e68032f	write-or-die: make GIT_FLUSH a Boolean environment variable Among Git's environment variables, the ones marked as "Boolean" accept values in a way similar to Boolean configuration variables, i.e. values like 'yes', 'on', 'true' and positive numbers are taken as "on" and values like 'no', 'off', 'false' are taken as "off". GIT_FLUSH can be used to force Git to use non-buffered I/O when writing to stdout. It can only accept two values, '1' which causes Git to flush more often and '0' which makes all output buffered. Make GIT_FLUSH accept more values besides '0' and '1' by turning it into a Boolean environment variable, modifying the required logic. Update the related documentation. Signed-off-by: Chandra Pratap <chandrapratap3519@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2024-01-04 10:32:21 -08:00
Sam Delmerico	ee9895b0ff	push: region_leave trace for negotiate_using_fetch There were two region_enter events for negotiate_using_fetch instead of one enter and one leave. This commit replaces the second region_enter event with a region_leave. Signed-off-by: Sam Delmerico <delmerico@google.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2024-01-03 15:32:44 -08:00
Maarten van der Schrieck	9cd30af991	Documentation: fix statement about rebase.instructionFormat Since commit `62db5247` (rebase -i: generate the script via rebase--helper, 2017-07-14), the short hash is given in rebase-todo. Specifying rebase.instructionFormat does not alter this behavior, contrary to what the documentation implies. Signed-off-by: Maarten van der Schrieck <maarten@thingsconnected.nl> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2024-01-03 11:21:15 -08:00
Patrick Steinhardt	19b9496c1f	reftable/merged: transfer ownership of records when iterating When iterating over records with the merged iterator we put the records into a priority queue before yielding them to the caller. This means that we need to allocate the contents of these records before we can pass them over to the caller. The handover to the caller is quite inefficient though because we first deallocate the record passed in by the caller and then copy over the new record, which requires us to reallocate memory. Refactor the code to instead transfer ownership of the new record to the caller. So instead of reallocating all contents, we now release the old record and then copy contents of the new record into place. The following benchmark of `git show-ref --quiet` in a repository with around 350k refs shows a clear improvement. Before: HEAP SUMMARY: in use at exit: 21,163 bytes in 193 blocks total heap usage: 708,058 allocs, 707,865 frees, 36,783,255 bytes allocated After: HEAP SUMMARY: in use at exit: 21,163 bytes in 193 blocks total heap usage: 357,007 allocs, 356,814 frees, 24,193,602 bytes allocated This shows that we now have roundabout a single allocation per record that we're yielding from the iterator. Ideally, we'd also get rid of this allocation so that the number of allocations doesn't scale with the number of refs anymore. This would require some larger surgery though because the memory is owned by the priority queue before transferring it over to the caller. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2024-01-03 09:54:21 -08:00
Patrick Steinhardt	5473aca376	reftable/merged: really reuse buffers to compute record keys In `829231dc20` (reftable/merged: reuse buffer to compute record keys, 2023-12-11), we have refactored the merged iterator to reuse a pair of long-living strbufs by relying on the fact that `reftable_record_key()` tries to reuse already allocated strbufs by calling `strbuf_reset()`, which should give us significantly fewer reallocations compared to the old code that used on-stack strbufs that are allocated for each and every iteration. Unfortunately, we called `strbuf_release()` on these long-living strbufs that we meant to reuse on each iteration, defeating the optimization. Fix this performance issue by not releasing those buffers on iteration anymore, where we instead rely on `merged_iter_close()` to release the buffers for us. Using `git show-ref --quiet` in a repository with ~350k refs this leads to a significant drop in allocations. Before: HEAP SUMMARY: in use at exit: 21,163 bytes in 193 blocks total heap usage: 1,410,148 allocs, 1,409,955 frees, 61,976,068 bytes allocated After: HEAP SUMMARY: in use at exit: 21,163 bytes in 193 blocks total heap usage: 708,058 allocs, 707,865 frees, 36,783,255 bytes allocated Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2024-01-03 09:54:21 -08:00
Patrick Steinhardt	b31e3cc620	reftable/record: store "val2" hashes as static arrays Similar to the preceding commit, convert ref records of type "val2" to store their object IDs in static arrays instead of allocating them for every single record. We're using the same benchmark as in the preceding commit, with `git show-ref --quiet` in a repository with ~350k refs. This time around though the effects aren't this huge. Before: HEAP SUMMARY: in use at exit: 21,163 bytes in 193 blocks total heap usage: 1,419,040 allocs, 1,418,847 frees, 62,153,868 bytes allocated After: HEAP SUMMARY: in use at exit: 21,163 bytes in 193 blocks total heap usage: 1,410,148 allocs, 1,409,955 frees, 61,976,068 bytes allocated This is because "val2"-type records are typically only stored for peeled tags, and the number of annotated tags in the benchmark repository is rather low. Still, it can be seen that this change leads to a reduction of allocations overall, even if only a small one. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2024-01-03 09:54:21 -08:00
Patrick Steinhardt	7af607c58d	reftable/record: store "val1" hashes as static arrays When reading ref records of type "val1", we store its object ID in an allocated array. This results in an additional allocation for every single ref record we read, which is rather inefficient especially when iterating over refs. Refactor the code to instead use an embedded array of `GIT_MAX_RAWSZ` bytes. While this means that `struct ref_record` is bigger now, we typically do not store all refs in an array anyway and instead only handle a limited number of records at the same point in time. Using `git show-ref --quiet` in a repository with ~350k refs this leads to a significant drop in allocations. Before: HEAP SUMMARY: in use at exit: 21,098 bytes in 192 blocks total heap usage: 2,116,683 allocs, 2,116,491 frees, 76,098,060 bytes allocated After: HEAP SUMMARY: in use at exit: 21,098 bytes in 192 blocks total heap usage: 1,419,031 allocs, 1,418,839 frees, 62,145,036 bytes allocated Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2024-01-03 09:54:20 -08:00
Patrick Steinhardt	88f59d9e31	reftable/record: constify some parts of the interface We're about to convert reftable records to stop storing their object IDs as allocated hashes. Prepare for this refactoring by constifying some parts of the interface that will be impacted by this. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2024-01-03 09:54:20 -08:00
Patrick Steinhardt	ddac965965	reftable/writer: fix index corruption when writing multiple indices Each reftable may contain multiple types of blocks for refs, objects and reflog records, where each of these may have an index that makes it more efficient to find the records. It was observed that the index for log records can become corrupted under certain circumstances, where the first entry of the index points into the object index instead of to the log records. As it turns out, this corruption can occur whenever we write a log index as well as at least one additional index. Writing records and their index is basically a two-step process: 1. We write all blocks for the corresponding record. Each block that gets written is added to a list of blocks to index. 2. Once all blocks were written we finish the section. If at least two blocks have been added to the list of blocks to index then we will now write the index for those blocks and flush it, as well. When we have a very large number of blocks then we may decide to write a multi-level index, which is why we also keep track of the list of the index blocks in the same way as we previously kept track of the blocks to index. Now when we have finished writing all index blocks we clear the index and flush the last block to disk. This is done in the wrong order though because flushing the block to disk will re-add it to the list of blocks to be indexed. The result is that the next section we are about to write will have an entry in the list of blocks to index that points to the last block of the preceding section's index, which will corrupt the log index. Fix this corruption by clearing the index after having written the last block. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2024-01-03 09:54:20 -08:00
Patrick Steinhardt	75d790608f	reftable/stack: do not auto-compact twice in `reftable_stack_add()` In `5c086453ff` (reftable/stack: perform auto-compaction with transactional interface, 2023-12-11), we fixed a bug where the transactional interface to add changes to a reftable stack did not perform auto-compaction by calling `reftable_stack_auto_compact()` in `reftable_stack_addition_commit()`. While correct, this change may now cause us to perform auto-compaction twice in the non-transactional interface `reftable_stack_add()`: - It performs auto-compaction by itself. - It now transitively performs auto-compaction via the transactional interface. Remove the first instance so that we only end up doing auto-compaction once. Reported-by: Han-Wen Nienhuys <hanwenn@gmail.com> Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2024-01-03 09:54:20 -08:00
Patrick Steinhardt	d26c21483d	reftable/stack: do not overwrite errors when compacting In order to compact multiple stacks we iterate through the merged ref and log records. When there is any error either when reading the records from the old merged table or when writing the records to the new table then we break out of the respective loops. When breaking out of the loop for the ref records though the error code will be overwritten, which may cause us to inadvertently skip over bad ref records. In the worst case, this can lead to a compacted stack that is missing records. Fix the code by using `goto done` instead so that any potential error codes are properly returned to the caller. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2024-01-03 09:54:20 -08:00
René Scharfe	54d8a2531b	t1006: prefer shell loop to awk for packed object sizes To compute the expected on-disk size of packed objects, we sort the output of show-index by pack offset and then compute the difference between adjacent entries using awk. This works but has a few readability problems: 1. Reading the index in pack order means don't find out the size of an oid's entry until we see the _next_ entry. So we have to save it to print later. We can instead iterate in reverse order, so we compute each oid's size as we see it. 2. Since the awk invocation is inside a text_expect block, we can't easily use single-quotes to hold the script. So we use double-quotes, but then have to escape the dollar signs in the awk script. We can swap this out for a shell loop instead (which is made much easier by the first change). Signed-off-by: René Scharfe <l.s.r@web.de> Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2024-01-03 09:26:53 -08:00
Junio C Hamano	a26002b628	The fifth batch Signed-off-by: Junio C Hamano <gitster@pobox.com>	2024-01-02 13:51:30 -08:00
Junio C Hamano	dbf668a1b7	Merge branch 'ps/pseudo-refs' Assorted changes around pseudoref handling. * ps/pseudo-refs: bisect: consistently write BISECT_EXPECTED_REV via the refdb refs: complete list of special refs refs: propagate errno when reading special refs fails wt-status: read HEAD and ORIG_HEAD via the refdb	2024-01-02 13:51:30 -08:00
Junio C Hamano	601b1571e8	Merge branch 'jc/orphan-unborn' Doc updates to clarify what an "unborn branch" means. * jc/orphan-unborn: orphan/unborn: fix use of 'orphan' in end-user facing messages orphan/unborn: add to the glossary and use them consistently	2024-01-02 13:51:30 -08:00
Junio C Hamano	cce4778520	Merge branch 'rj/status-bisect-while-rebase' "git status" is taught to show both the branch being bisected and being rebased when both are in effect at the same time. * rj/status-bisect-while-rebase: status: fix branch shown when not only bisecting	2024-01-02 13:51:29 -08:00
Junio C Hamano	59a29e1274	Merge branch 'la/trailer-cleanups' Code clean-up. * la/trailer-cleanups: trailer: use offsets for trailer_start/trailer_end trailer: find the end of the log message commit: ignore_non_trailer computes number of bytes to ignore	2024-01-02 13:51:29 -08:00
Junio C Hamano	43ec879169	Merge branch 'jc/retire-cas-opt-name-constant' Code clean-up. * jc/retire-cas-opt-name-constant: remote.h: retire CAS_OPT_NAME	2024-01-02 13:51:29 -08:00
Junio C Hamano	9cc710098b	Merge branch 'rs/rebase-use-strvec-pushf' Code clean-up. * rs/rebase-use-strvec-pushf: rebase: use strvec_pushf() for format-patch revisions	2024-01-02 13:51:29 -08:00
Junio C Hamano	72e6a61c40	Merge branch 'sh/completion-with-reftable' Command line completion script (in contrib/) learned to work better with the reftable backend. * sh/completion-with-reftable: completion: support pseudoref existence checks for reftables completion: refactor existence checks for pseudorefs	2024-01-02 13:51:28 -08:00
Patrick Steinhardt	1b2234079b	t9500: write "extensions.refstorage" into config In t9500 we're writing a custom configuration that sets up gitweb. This requires us to manually ensure that the repository format is configured as required, including both the repository format version and extensions. With the introduction of the "extensions.refStorage" extension we need to update the test to also write this new one. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2024-01-02 09:24:49 -08:00
Patrick Steinhardt	5ed860f51b	builtin/clone: introduce `--ref-format=` value flag Introduce a new `--ref-format` value flag for git-clone(1) that allows the user to specify the ref format that is to be used for a newly initialized repository. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2024-01-02 09:24:48 -08:00
Patrick Steinhardt	48fa45f5fb	builtin/init: introduce `--ref-format=` value flag Introduce a new `--ref-format` value flag for git-init(1) that allows the user to specify the ref format that is to be used for a newly initialized repository. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2024-01-02 09:24:48 -08:00
Patrick Steinhardt	3c4a5318af	builtin/rev-parse: introduce `--show-ref-format` flag Introduce a new `--show-ref-format` to git-rev-parse(1) that causes it to print the ref format used by a repository. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2024-01-02 09:24:48 -08:00
Patrick Steinhardt	58aaf59133	t: introduce GIT_TEST_DEFAULT_REF_FORMAT envvar Introduce a new GIT_TEST_DEFAULT_REF_FORMAT environment variable that lets developers run the test suite with a different default ref format without impacting the ref format used by non-test Git invocations. This is modeled after GIT_TEST_DEFAULT_OBJECT_FORMAT, which does the same thing for the repository's object format. Adapt the setup of the `REFFILES` test prerequisite to be conditionally set based on the default ref format. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2024-01-02 09:24:48 -08:00
Patrick Steinhardt	aa19619a98	setup: introduce GIT_DEFAULT_REF_FORMAT envvar Introduce a new GIT_DEFAULT_REF_FORMAT environment variable that lets users control the default ref format used by both git-init(1) and git-clone(1). This is modeled after GIT_DEFAULT_OBJECT_FORMAT, which does the same thing for the repository's object format. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2024-01-02 09:24:48 -08:00
Patrick Steinhardt	d7497a42b0	setup: introduce "extensions.refStorage" extension Introduce a new "extensions.refStorage" extension that allows us to specify the ref storage format used by a repository. For now, the only supported format is the "files" format, but this list will likely soon be extended to also support the upcoming "reftable" format. There have been discussions on the Git mailing list in the past around how exactly this extension should look like. One alternative [1] that was discussed was whether it would make sense to model the extension in such a way that backends are arbitrarily stackable. This would allow for a combined value of e.g. "loose,packed-refs" or "loose,reftable", which indicates that new refs would be written via "loose" files backend and compressed into "packed-refs" or "reftable" backends, respectively. It is arguable though whether this flexibility and the complexity that it brings with it is really required for now. It is not foreseeable that there will be a proliferation of backends in the near-term future, and the current set of existing formats and formats which are on the horizon can easily be configured with the much simpler proposal where we have a single value, only. Furthermore, if we ever see that we indeed want to gain the ability to arbitrarily stack the ref formats, then we can adapt the current extension rather easily. Given that Git clients will refuse any unknown value for the "extensions.refStorage" extension they would also know to ignore a stacked "loose,packed-refs" in the future. So let's stick with the easy proposal for the time being and wire up the extension. [1]: <pull.1408.git.1667846164.gitgitgadget@gmail.com> Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2024-01-02 09:24:48 -08:00
Patrick Steinhardt	58be32fff9	setup: set repository's formats on init The proper hash algorithm and ref storage format that will be used for a newly initialized repository will be figured out in `init_db()` via `validate_hash_algorithm()` and `validate_ref_storage_format()`. Until now though, we never set up the hash algorithm or ref storage format of `the_repository` accordingly. There are only two callsites of `init_db()`, one in git-init(1) and one in git-clone(1). The former function doesn't care for the formats to be set up properly because it never access the repository after calling the function in the first place. For git-clone(1) it's a different story though, as we call `init_db()` before listing remote refs. While we do indeed have the wrong hash function in `the_repository` when `init_db()` sets up a non-default object format for the repository, it never mattered because we adjust the hash after learning about the remote's hash function via the listed refs. So the current state is correct for the hash algo, but it's not for the ref storage format because git-clone(1) wouldn't know to set it up properly. But instead of adjusting only the `ref_storage_format`, set both the hash algo and the ref storage format so that `the_repository` is in the correct state when `init_db()` exits. This is fine as we will adjust the hash later on anyway and makes it easier to reason about the end state of `the_repository`. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2024-01-02 09:24:47 -08:00
Patrick Steinhardt	173761e21b	setup: start tracking ref storage format In order to discern which ref storage format a repository is supposed to use we need to start setting up and/or discovering the format. This needs to happen in two separate code paths. - The first path is when we create a repository via `init_db()`. When we are re-initializing a preexisting repository we need to retain the previously used ref storage format -- if the user asked for a different format then this indicates an error and we error out. Otherwise we either initialize the repository with the format asked for by the user or the default format, which currently is the "files" backend. - The second path is when discovering repositories, where we need to read the config of that repository. There is not yet any way to configure something other than the "files" backend, so we can just blindly set the ref storage format to this backend. Wire up this logic so that we have the ref storage format always readily available when needed. As there is only a single backend and because it is not configurable we cannot yet verify that this tracking works as expected via tests, but tests will be added in subsequent commits. To countermand this ommission now though, raise a BUG() in case the ref storage format is not set up properly in `ref_store_init()`. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2024-01-02 09:24:47 -08:00
Patrick Steinhardt	0fcc285c5e	refs: refactor logic to look up storage backends In order to look up ref storage backends, we're currently using a linked list of backends, where each backend is expected to set up its `next` pointer to the next ref storage backend. This is kind of a weird setup as backends need to be aware of other backends without much of a reason. Refactor the code so that the array of backends is centrally defined in "refs.c", where each backend is now identified by an integer constant. Expose functions to translate from those integer constants to the name and vice versa, which will be required by subsequent patches. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2024-01-02 09:24:47 -08:00
Patrick Steinhardt	465a22b338	worktree: skip reading HEAD when repairing worktrees When calling `git init --separate-git-dir=<new-path>` on a preexisting repository, we move the Git directory of that repository to the new path specified by the user. If there are worktrees present in the repository, we need to repair the worktrees so that their gitlinks point to the new location of the repository. This repair logic will load repositories via `get_worktrees()`, which will enumerate up and initialize all worktrees. Part of initialization is logic that we resolve their respective worktree HEADs, even though that information may not actually be needed in the end by all callers. Although not a problem presently with the file-based reference backend, it will become a problem with the upcoming reftable backend. In the context of git-init(1) we do not have a fully-initialized repository set up via `setup_git_directory()` or friends. Consequently, we do not know about the repository format when `repair_worktrees()` is called, and properly setting up all parts of the repositroy in `init_db()` before we try to repair worktrees is not an easy task. With the introduction of the reftable backend, we would ultimately try to look up the worktree HEADs before we have figured out the reference format, which does not work. We do not require the worktree HEADs at all to repair worktrees. So let's fix this issue by skipping over the step that reads them. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2024-01-02 09:24:47 -08:00

... 3 4 5 6 7 ...

72117 commits