development/git - HydraGit

mirror of https://github.com/git/git synced 2024-08-25 02:35:46 +00:00

Author	SHA1	Message	Date
Ævar Arnfjörð Bjarmason	3611f7467f	for-each-repo: with bad config, don't conflate <path> and <cmd> Fix a logic error in `4950b2a2b5` (for-each-repo: run subcommands on configured repos, 2020-09-11). Due to assuming that elements returned from the repo_config_get_value_multi() call wouldn't be "NULL" we'd conflate the <path> and <command> part of the argument list when running commands. As noted in the preceding commit the fix is to move to a safer "_string_multi()" version of the _multi() API. This change is separated from the rest because those all segfaulted. In this change we ended up with different behavior. When using the "--config=<config>" form we take each element of the list as a path to a repository. E.g. with a configuration like: [repo] list = /some/repo We would, with this command: git for-each-repo --config=repo.list status builtin Run a "git status" in /some/repo, as: git -C /some/repo status builtin I.e. ask "status" to report on the "builtin" directory. But since a configuration such as this would result in a "struct string_list *" with one element, whose "string" member is "NULL": [repo] list We would, when constructing our command-line in "builtin/for-each-repo.c"... strvec_pushl(&child.args, "-C", path, NULL); for (i = 0; i < argc; i++) strvec_push(&child.args, argv[i]); ...have that "path" be "NULL", and as strvec_pushl() stops when it sees NULL we'd end with the first "argv" element as the argument to the "-C" option, e.g.: git -C status builtin I.e. we'd run the command "builtin" in the "status" directory. In another context this might be an interesting security vulnerability, but I think that this amounts to a nothingburger on that front. A hypothetical attacker would need to be able to write config for the victim to run, if they're able to do that there's more interesting attack vectors. See the "safe.directory" facility added in `8d1a744820` (setup.c: create `safe.bareRepository`, 2022-07-14). An even more unlikely possibility would be an attacker able to generate the config used for "for-each-repo --config=<key>", but nothing else (e.g. an automated system producing that list). Even in that case the attack vector is limited to the user running commands whose name matches a directory that's interesting to the attacker (e.g. a "log" directory in a repository). The second argument (if any) of the command is likely to make git die without doing anything interesting (e.g. "-p" to "log", there being no "-p" built-in command to run). Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2023-03-28 07:37:53 -07:00
Ævar Arnfjörð Bjarmason	9e2d884d0f	config API: add "string" version of _value_multi(), fix segfaults Fix numerous and mostly long-standing segfaults in consumers of the _config_value_multi() API. As discussed in the preceding commit an empty key in the config syntax yields a "NULL" string, which these users would give to strcmp() (or similar), resulting in segfaults. As this change shows, most users users of the _config_value_multi() API didn't really want such an an unsafe and low-level API, let's give them something with the safety of git_config_get_string() instead. This fix is similar to what the _string() functions and others acquired in[1] and [2]. Namely introducing and using a safer "_get_string_multi()" variant of the low-level "_value_multi()" function. This fixes segfaults in code introduced in: - `d811c8e17c` (versionsort: support reorder prerelease suffixes, 2015-02-26) - `c026557a37` (versioncmp: generalize version sort suffix reordering, 2016-12-08) - `a086f921a7` (submodule: decouple url and submodule interest, 2017-03-17) - `a6be5e6764` (log: add log.excludeDecoration config option, 2020-04-16) - `92156291ca` (log: add default decoration filter, 2022-08-05) - `50a044f1e4` (gc: replace config subprocesses with API calls, 2022-09-27) There are now two users ofthe low-level API: - One in "builtin/for-each-repo.c", which we'll convert in a subsequent commit. - The "t/helper/test-config.c" code added in [3]. As seen in the preceding commit we need to give the "t/helper/test-config.c" caller these "NULL" entries. We could also alter the underlying git_configset_get_value_multi() function to be "string safe", but doing so would leave no room for other variants of "*_get_value_multi()" that coerce to other types. Such coercion can't be built on the string version, since as we've established "NULL" is a true value in the boolean context, but if we coerced it to "" for use in a list of strings it'll be subsequently coerced to "false" as a boolean. The callback pattern being used here will make it easy to introduce e.g. a "multi" variant which coerces its values to "bool", "int", "path" etc. 1. `40ea4ed903` (Add config_error_nonbool() helper function, 2008-02-11) 2. `6c47d0e8f3` (config.c: guard config parser from value=NULL, 2008-02-11). 3. `4c715ebb96` (test-config: add tests for the config_set API, 2014-07-28) Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2023-03-28 07:37:53 -07:00
Ævar Arnfjörð Bjarmason	1c7e239bd0	config API users: test for _get_value_multi() segfaults As we'll discuss in the subsequent commit these tests all show _get_value_multi() API users unable to handle there being a value-less key in the config, which is represented with a "NULL" for that entry in the "string" member of the returned "struct string_list", causing a segfault. These added tests exhaustively test for that issue, as we'll see in a subsequent commit we'll need to change all of the API users of *_get_value_multi(). These cases were discovered by triggering each one individually, and then adding these tests. Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2023-03-28 07:37:53 -07:00
Ævar Arnfjörð Bjarmason	f7b2ff9516	for-each-repo: error on bad --config As noted in `6c62f01552` (for-each-repo: do nothing on empty config, 2021-01-08) this command wants to ignore a non-existing config key, but let's not conflate that with bad config. Before this, all these added tests would pass with an exit code of 0. We could preserve the comment added in `6c62f01552`, but now that we're directly using the documented repo_config_get_value_multi() value it's just narrating something that should be obvious from the API use, so let's drop it. Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2023-03-28 07:37:53 -07:00
Ævar Arnfjörð Bjarmason	a428619309	config API: have _multi() return an "int" and take a "dest" Have the "git_configset_get_value_multi()" function and its siblings return an "int" and populate a "dest" parameter like every other git_configset_get_()" in the API. As we'll take advantage of in subsequent commits, this fixes a blind spot in the API where it wasn't possible to tell whether a list was empty from whether a config key existed. For now we don't make use of those new return values, but faithfully convert existing API users. Most of this is straightforward, commentary on cases that stand out: - To ensure that we'll properly use the return values of this function in the future we're using the "RESULT_MUST_BE_USED" macro introduced in [1]. As git_die_config() now has to handle this return value let's have it BUG() if it can't find the config entry. As tested for in a preceding commit we can rely on getting the config list in git_die_config(). - The loops after getting the "list" value in "builtin/gc.c" could also make use of "unsorted_string_list_has_string()" instead of using that loop, but let's leave that for now. - In "versioncmp.c" we now use the return value of the functions, instead of checking if the lists are still non-NULL. 1. `1e8697b5c4` (submodule--helper: check repo{_submodule,}_init() return values, 2022-09-01), Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2023-03-28 07:37:53 -07:00
Ævar Arnfjörð Bjarmason	f6f348a6d5	versioncmp.c: refactor config reading next commit Refactor the reading of the versionSort.suffix and versionSort.prereleaseSuffix configuration variables to stay within the bounds of our CodingGuidelines when it comes to line length, and to avoid repeating ourselves. Renaming "deprecated_prereleases" to "oldl" doesn't help us to avoid line wrapping now, but it will in a subsequent commit. Let's also split out the names of the config variables into variables of our own, and refactor the nested if/else to avoid indenting it, and the existing bracing style issue. This all helps with the subsequent commit, where we'll need to start checking different git_config_get_value_multi() return value. See `c026557a37` (versioncmp: generalize version sort suffix reordering, 2016-12-08) for the original implementation of most of this. Moving the "initialized = 1" assignment allows us to move some of this to the variable declarations in the subsequent commit. Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2023-03-28 07:37:53 -07:00
Ævar Arnfjörð Bjarmason	b83efcecaf	config API: add and use a "git_config_get()" family of functions We already have the basic "git_config_get_value()" function and its "repo_" and "configset" siblings to get a given "key" and assign the last key found to a provided "value". But some callers don't care about that value, but just want to use the return value of the "get_value()" function to check whether the key exist (or another non-zero return value). The immediate motivation for this is that a subsequent commit will need to change all callers of the "_get_value_multi()" family of functions. In two cases here we (ab)used it to check whether we had any values for the given key, but didn't care about the return value. The rest of the callers here used various other config API functions to do the same, all of which resolved to the same underlying functions to provide the answer. Some of these were using either git_config_get_string() or git_config_get_string_tmp(), see `fe4c750fb1` (submodule--helper: fix a configure_added_submodule() leak, 2022-09-01) for a recent example. We can now use a helper function that doesn't require a throwaway variable. We could have changed git_configset_get_value_multi() (and then git_config_get_value() etc.) to accept a "NULL" as a "dest" for all callers, but let's avoid changing the behavior of existing API users. Having an "unused" value that we throw away internal to config.c is cheap. A "NULL as optional dest" pattern is also more fragile, as the intent of the caller might be misinterpreted if he were to accidentally pass "NULL", e.g. when "dest" is passed in from another function. Another name for this function could have been "_config_key_exists()", as suggested in [1]. That would work for all of these callers, and would currently be equivalent to this function, as the git_configset_get_value() API normalizes all non-zero return values to a "1". But adding that API would set us up to lose information, as e.g. if git_config_parse_key() in the underlying configset_find_element() fails we'd like to return -1, not 1. Let's change the underlying configset_find_element() function to support this use-case, we'll make further use of it in a subsequent commit where the git_configset_get_value_multi() function itself will expose this new return value. This still leaves various inconsistencies and clobbering or ignoring of the return value in place. E.g here we're modifying configset_add_value(), but ever since it was added in [2] we've been ignoring its "int" return value, but as we're changing the configset_find_element() it uses, let's have it faithfully ferry that "ret" along. Let's also use the "RESULT_MUST_BE_USED" macro introduced in [3] to assert that we're checking the return value of configset_find_element(). We're leaving the same change to configset_add_value() for some future series. Once we start paying attention to its return value we'd need to ferry it up as deep as do_config_from(), and would need to make least read_{,very_}early_config() and git_protected_config() return an "int" instead of "void". Let's leave that for now, and focus on the _get_*() functions. 1. `3c8687a73e` (add `config_set` API for caching config-like files, 2014-07-28) 2. https://lore.kernel.org/git/xmqqczadkq9f.fsf@gitster.g/ 3. `1e8697b5c4` (submodule--helper: check repo{_submodule,}_init() return values, 2022-09-01), Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2023-03-28 07:37:52 -07:00
Ævar Arnfjörð Bjarmason	e7587a8f53	config tests: add "NULL" tests for *_get_value_multi() A less well known edge case in the config format is that keys can be value-less, a shorthand syntax for "true" boolean keys. I.e. these two are equivalent as far as "--type=bool" is concerned: [a]key [a]key = true But as far as our parser is concerned the values for these two are NULL, and "true". I.e. for a sequence like: [a]key=x [a]key [a]key=y We get a "struct string_list" with "string" members with ".string" values of: { "x", NULL, "y" } This behavior goes back to the initial implementation of git_config_bool() in `17712991a5` (Add ".git/config" file parser, 2005-10-10). When parts of the config_set API were tested for in [1] they didn't add coverage for 3/4 of the "(NULL)" cases handled in "t/helper/test-config.c". We'd test that case for "get_value", but not "get_value_multi", "configset_get_value" and "configset_get_value_multi". We now cover all of those cases, which in turn expose the details of how this part of the config API works. 1. `4c715ebb96` (test-config: add tests for the config_set API, 2014-07-28) Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2023-03-28 07:37:52 -07:00
Ævar Arnfjörð Bjarmason	258902ce07	config tests: cover blind spots in git_die_config() tests There were no tests checking for the output of the git_die_config() function in the config API, added in `5a80e97c82` (config: add `git_die_config()` to the config-set API, 2014-08-07). We only tested "test_must_fail", but didn't assert the output. We need tests for this because a subsequent commit will alter the return value of git_config_get_value_multi(), which is used to get the config values in the git_die_config() function. This test coverage helps to build confidence in that subsequent change. These tests cover different interactions with git_die_config(): - The "notes.mergeStrategy" test in "t/t3309-notes-merge-auto-resolve.sh" is a case where a function outside of config.c (git_config_get_notes_strategy()) calls git_die_config(). - The "gc.pruneExpire" test in "t5304-prune.sh" is a case where git_config_get_expiry() calls git_die_config(), covering a different "type" than the "string" test for "notes.mergeStrategy". - The "fetch.negotiationAlgorithm" test in "t/t5552-skipping-fetch-negotiator.sh" is a case where git_config_get_string*() calls git_die_config(). We also cover both the "from command-line config" and "in file..at line" cases here. The clobbering of existing ".git/config" files here is so that we're not implicitly testing the line count of the default config. Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2023-03-28 07:37:52 -07:00
Michael J Gruber	3dc0b7f0dc	t3070: make chain lint tester happy `1f2e05f0b7` ("wildmatch: fix exponential behavior", 2023-03-20) introduced a new test with a background process. Backgrounding necessarily gives a result of 0, so that a seemingly broken && chain is not really broken. Adjust t3070 slightly so that our chain lint test recognizes the construct for what it is and does not raise a false positive. Signed-off-by: Michael J Gruber <git@grubix.eu> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2023-03-27 17:02:38 -07:00
Kristoffer Haugsbakk	d3b3419f8f	config: tell the user that we expect an ASCII character Commit `50b54fd72a` (config: be strict on core.commentChar, 2014-05-17) notes that “multi-byte character encoding could also be misinterpreted”, and indeed a multi-byte codepoint (non-ASCII) is not accepted as a valid `core.commentChar`. Signed-off-by: Kristoffer Haugsbakk <code@khaugsbakk.name> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2023-03-27 13:09:38 -07:00
Patrick Steinhardt	d3af1c193d	commit-graph: fix truncated generation numbers In `80c928d947` (commit-graph: simplify compute_generation_numbers(), 2023-03-20), the code to compute generation numbers was simplified to use the same infrastructure as is used to compute topological levels. This refactoring introduced a bug where the generation numbers are truncated when they exceed UINT32_MAX because we explicitly cast the computed generation number to `uint32_t`. This is not required though: both the computed value and the field of `struct commit_graph_data` are of the same type `timestamp_t` already, so casting to `uint32_t` will cause truncation. This cast can cause us to miscompute generation data overflows: 1. Given a commit with no parents and committer date `UINT32_MAX + 1`. 2. We compute its generation number as `UINT32_MAX + 1`, but truncate it to `1`. 3. We calculate the generation offset via `$generation - $date`, which is thus `1 - (UINT32_MAX + 1)`. The computation underflows and we thus end up with an offset that is bigger than the maximum allowed offset. As a result, we'd be writing generation data overflow information into the commit-graph that is bogus and ultimately not even required. Fix this bug by removing the needless cast. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2023-03-27 10:52:06 -07:00
Johannes Schindelin	3457b50e8c	t3701: we don't need no Perl for `add -i` anymore This should have been removed in `ab/retire-scripted-add-p` but wasn't. Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2023-03-27 10:40:12 -07:00
Johannes Schindelin	061dd722dc	unpack-trees: take care to propagate the split-index flag When copying the `split_index` structure from one index structure to another, we need to propagate the `SPLIT_INDEX_ORDERED` flag, too, if it is set, otherwise Git might forget to write the shared index when that is actually needed. It just so _happens_ that in many instances when `unpack_trees()` is called, the result causes the shared index to be written anyway, but there are edge cases when that is not so. Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2023-03-27 09:40:40 -07:00
Johannes Schindelin	be6b65b91b	fsmonitor: avoid overriding `cache_changed` bits As of `e636a7b4d0` (read-cache: be specific what part of the index has changed, 2014-06-13), the paradigm `cache_changed = 1` fell out of fashion and it became a bit field instead. This is important because some bits have specific meaning and should not be unset without care, e.g. `SPLIT_INDEX_ORDERED`. However, `b5a8169752` (mark_fsmonitor_valid(): mark the index as changed if needed, 2019-05-24) did use the `cache_changed` attribute as if it were a Boolean instead of a bit field. That not only would override the `SPLIT_INDEX_ORDERED` bit when marking index entries as valid via the FSMonitor, but worse: it would set the `SOMETHING_OTHER` bit (whose value is 1). This means that Git would unnecessarily force a full index to be written out when a split index was asked for. Let's instead use the bit that is specifically intended to indicate FSMonitor-triggered changes, allowing the split-index feature to work as designed. Noticed-by: Jeff Hostetler <jeffhost@microsoft.com> Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2023-03-27 09:40:39 -07:00
Johannes Schindelin	3b7a4475b0	split-index; stop abusing the `base_oid` to strip the "link" extension When a split-index is in effect, the `$GIT_DIR/index` file needs to contain a "link" extension that contains all the information about the split-index, including the information about the shared index. However, in some cases Git needs to suppress writing that "link" extension (i.e. to fall back to writing a full index) even if the in-memory index structure _has_ a `split_index` configured. This is the case e.g. when "too many not shared" index entries exist. In such instances, the current code sets the `base_oid` field of said `split_index` structure to all-zero to indicate that `do_write_index()` should skip writing the "link" extension. This can lead to problems later on, when the in-memory index is still used to perform other operations and eventually wants to write a split-index, detects the presence of the `split_index` and reuses that, too (under the assumption that it has been initialized correctly and still has a non-null `base_oid`). Let's stop zeroing out the `base_oid` to indicate that the "link" extension should not be written. One might be tempted to simply call `discard_split_index()` instead, under the assumption that Git decided to write a non-split index and therefore the `split_index` structure might no longer be wanted. However, that is not possible because that would release index entries in `split_index->base` that are likely to still be in use. Therefore we cannot do that. The next best thing we _can_ do is to introduce a bit field to indicate specifically which index extensions (not) to write. So that's what we do here. Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2023-03-27 09:40:39 -07:00
Johannes Schindelin	3704fed5ea	split-index & fsmonitor: demonstrate a bug This commit adds a new test case that demonstrates a bug in the split-index code that is triggered under certain circumstances when the FSMonitor is enabled, and its symptom manifests in the form of one of the following error messages: BUG: fsmonitor.c:20: fsmonitor_dirty has more entries than the index (2 > 1) BUG: unpack-trees.c:776: pos <n> doesn't point to the first entry of <dir>/ in index error: invalid path '' error: The following untracked working tree files would be overwritten by reset: initial.t Which of these error messages appears depends on timing-dependent conditions. Technically the root cause lies with a bug in the split-index code that has nothing to do with FSMonitor, but for the sake of this new test case it was the easiest way to trigger the bug. The bug is this: Under specific conditions, Git needs to skip writing the "link" extension (which is the index extension containing the information pertaining to the split-index). To do that, the `base_oid` attribute of the `split_index` structure in the in-memory index is zeroed out, and `do_write_index()` specifically checks for a "null" `base_oid` to understand that the "link" extension should not be written. However, this violates the consistency of the in-memory index structure, but that does not cause problems in most cases because the process exits without using the in-memory index structure anymore, anyway. But: _When_ the in-memory index is still used (which is the case e.g. in `git rebase`), subsequent writes of `the_index` are at risk of writing out a bogus index file, one that _should_ have a "link" extension but does not. In many cases, the `SPLIT_INDEX_ORDERED` flag _happens_ to be set for subsequent writes, forcing the shared index to be written, which re-initializes `base_oid` to a non-bogus state, and all is good. When it is _not_ set, however, all kinds of mayhem ensue, resulting in above-mentioned error messages, and often enough putting worktrees in a totally broken state where the only recourse is to manually delete the `index` and the `index.lock` files and then call `git reset` manually. Not something to ask users to do. The reason why it is comparatively easy to trigger the bug with FSMonitor is that there is _another_ bug in the FSMonitor code: `mark_fsmonitor_valid()` sets `cache_changed` to 1, i.e. treating that variable as a Boolean. But it is a bit field, and 1 happens to be the `SOMETHING_CHANGED` bit that forces the "link" extension to be skipped when writing the index, among other things. "Comparatively easy" is a relative term in this context, for sure. The essence of how the new test case triggers the bug is as following: 1. The `git rebase` invocation will first reset the worktree to a commit that contains only the `one.t` file, and then execute a rebase script that starts with the following commands (commit hashes skipped): label onto reset initial pick two label two reset two pick three [...] 2. Before executing the `label` command, a split index is written, as well as the shared index. 3. The `reset initial` command in the rebase script writes out a new split index but skips writing the shared index, as intended. 4. The `pick two` command updates the worktree and refreshes the index, marking the `two.t` entry as valid via the FSMonitor, which sets the `SOMETHING_CHANGED` bit in `cache_changed`, which in turn causes the `base_oid` attribute to be zeroed out and a full (non-split) index to be written (making sure _not_ to write the "link" extension). 5. Now, the `reset two` command will leave the worktree alone, but still write out a new split index, not writing the shared index (because `base_oid` is still zeroed out, and there is no index entry update requiring it to be written, either). 6. When it is turn to run `pick three`, the index is read, but it is too short: It only contains a single entry when there should be two, because the "link" extension is missing from the written-out index file. There are three bugs at play, actually, which will be fixed over the course of the next commits: - The `base_oid` attribute should not be zeroed out to indicate when the "link" extension should not be written, as it puts the in-memory index structure into an inconsistent state. - The FSMonitor should not overwrite bits in `cache_changed`. - The `unpack_trees()` function tries to reuse the `split_index` structure from the source index, if any, but does not propagate the `SPLIT_INDEX_ORDERED` flag. While a fix for the second bug would let this test case pass, there are other conditions where the `SOMETHING_CHANGED` bit is set. Therefore, the bug that most crucially needs to be fixed is the first one. Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2023-03-27 09:40:39 -07:00
Alex Henrie	6605fb70cb	rebase: add a config option for --rebase-merges The purpose of the new option is to accommodate users who would like --rebase-merges to be on by default and to facilitate turning on --rebase-merges by default without configuration in a future version of Git. Name the new option rebase.rebaseMerges, even though it is a little redundant, for consistency with the name of the command line option and to be clear when scrolling through values in the [rebase] section of .gitconfig. Support setting rebase.rebaseMerges to the nonspecific value "true" for users who don't need to or don't want to learn about the difference between rebase-cousins and no-rebase-cousins. Make --rebase-merges without an argument on the command line override any value of rebase.rebaseMerges in the configuration, for consistency with other command line flags with optional arguments that have an associated config option. Signed-off-by: Alex Henrie <alexhenrie24@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2023-03-27 09:32:49 -07:00
Alex Henrie	33561f5170	rebase: deprecate --rebase-merges="" The unusual syntax --rebase-merges="" (that is, --rebase-merges with an empty string argument) has been an undocumented synonym of --rebase-merges without an argument. Deprecate that syntax to avoid confusion when a rebase.rebaseMerges config option is introduced, where rebase.rebaseMerges="" will be equivalent to --no-rebase-merges. It is not likely that anyone is actually using this syntax, but just in case, deprecate the empty string argument instead of dropping support for it immediately. Signed-off-by: Alex Henrie <alexhenrie24@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2023-03-27 09:32:49 -07:00
Alex Henrie	7e5dcec3ca	rebase: add documentation and test for --no-rebase-merges As far as I can tell, --no-rebase-merges has always worked, but has never been documented. It is especially important to document it before a rebase.rebaseMerges option is introduced so that users know how to override the config option on the command line. It's also important to clarify that --rebase-merges without an argument is not the same as --no-rebase-merges and not passing --rebase-merges is not the same as passing --rebase-merges=no-rebase-cousins. A test case is necessary to make sure that --no-rebase-merges keeps working after its code is refactored in the following patches of this series. The test case is a little contrived: It's unlikely that a user would type both --rebase-merges and --no-rebase-merges at the same time. However, if an alias is defined which includes --rebase-merges, the user might decide to add --no-rebase-merges to countermand that part of the alias but leave alone other flags set by the alias. Signed-off-by: Alex Henrie <alexhenrie24@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2023-03-27 09:32:49 -07:00
René Scharfe	1aaed69d11	t5000: use check_mtime() `fd2da4b1ea` (archive: add --mtime, 2023-02-18) added a helper function for checking the file modification time of an extracted entry. Use it for the older mtime test as well to shorten the code and piggyback on the archive extraction done to validate file contents. Signed-off-by: René Scharfe <l.s.r@web.de> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2023-03-27 09:13:30 -07:00
Jacob Keller	1a3119ed06	blame: allow --contents to work with non-HEAD commit The --contents option can be used with git blame to blame the file as if it had the contents from the specified file. This is akin to copying the contents into the working tree and then running git blame. This option has been supported since `1cfe77333f` ("git-blame: no rev means start from the working tree file.") The --contents option always blames the file as if it was based on the current HEAD commit. If you try to pass a revision while using --contents, you get the following error: fatal: cannot use --contents with final commit object name This is because the blame process generates a fake working tree commit which always uses the HEAD object as its sole parent. Enhance fake_working_tree_commit to take the object ID to use for the parent instead of always using the HEAD object. Then, always generate a fake commit when we have contents provided, even if we have a final object. Remove the check to disallow --contents and a final revision. Note that the behavior of generating a fake working commit is still skipped when a revision is provided but --contents is not provided. Generating such a commit in that case would combine the currently checked out file contents with the provided revision, which breaks normal blame behavior and produces unexpected results. This enables use of --contents with an arbitrary revision, rather than forcing the use of the local HEAD commit. This makes the --contents option significantly more flexible, as it is no longer required to check out the working tree to the desired commit before using --contents. Reword the documentation so that its clear that --contents can be used with <rev>. Add tests for the --contents option to the annotate-tests.sh test script. Signed-off-by: Jacob Keller <jacob.keller@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2023-03-24 12:05:22 -07:00
Oswald Buddenhagen	54dbd0933b	sequencer: rewrite save_head() in terms of write_message() Saves some code duplication. Signed-off-by: Oswald Buddenhagen <oswald.buddenhagen@gmx.de> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2023-03-24 08:02:05 -07:00
Oswald Buddenhagen	2da2cc9b28	sequencer: remove pointless rollback_lock_file() The file is gone even if commit_lock_file() fails. Signed-off-by: Oswald Buddenhagen <oswald.buddenhagen@gmx.de> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2023-03-24 07:52:16 -07:00
Jeff King	4406522b76	pack-redundant: escalate deprecation warning to an error In `c3b58472be` (pack-redundant: gauge the usage before proposing its removal, 2020-08-25), we added a big, ugly warning when pack-redundant is run. The plan there indicated that we would ratchet that up to an error before finally removing it. Since it has been 2.5 years (and 9 releases) since then, let's continue with the plan. Note that we did get one bite on the warning, which was somebody asking about alternatives: https://lore.kernel.org/git/CAKvOHKAFXQwt4D8yUCCkf_TQL79mYaJ=KAKhtpDNTvHJFuX1NA@mail.gmail.com/ but we didn't undo the ugly warning (and the advice continues to be "use repack -d" instead). There was also some discussion around the time of the deprecation that pack-redundant was invoked by the bitbake tool, and it still seems to do so now: https://git.openembedded.org/bitbake That use should probably just go away in favor of an occasional repack (which probably even happens via auto-gc after fetch these days). But since neither of those data points caused us to cancel the deprecation plan by dropping the warning, it seems like we should proceed with the next step. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2023-03-23 13:56:02 -07:00
Mathias Krause	14b9a04479	grep: work around UTF-8 related JIT bug in PCRE2 <= 10.34 Stephane is reporting[1] a regression introduced in git v2.40.0 that leads to 'git grep' segfaulting in his CI pipeline. It turns out, he's using an older version of libpcre2 that triggers a wild pointer dereference in the generated JIT code that was fixed in PCRE2 10.35. Instead of completely disabling the JIT compiler for the buggy version, just mask out the Unicode property handling as we used to do prior to commit `acabd2048e` ("grep: correctly identify utf-8 characters with \{b,w} in -P"). [1] https://lore.kernel.org/git/7E83DAA1-F9A9-4151-8D07-D80EA6D59EEA@clumio.com/ Reported-by: Stephane Odul <stephane@clumio.com> Signed-off-by: Mathias Krause <minipli@grsecurity.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2023-03-23 11:19:34 -07:00
Jeff King	d051f1718e	fast-export: drop unused parameter from anonymize_commit_message() As the comment above the function indicates, we do not bother actually storing commit messages in our anonymization map. But we still take the message as a parameter, and just ignore it. Let's stop doing that, which will make -Wunused-parameter happier. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2023-03-22 15:37:09 -07:00
Jeff King	65c756fff0	fast-export: drop data parameter from anonymous generators The anonymization code has a specific generator callback for each type of data (e.g., one for paths, one for oids, and so on). These all take a "data" parameter, but none of them use it for anything. Which is not surprising, as the point is to generate a new name independent of any input, and each function keeps its own static counter. We added the extra pointer in `d5bf91fde4` (fast-export: add a "data" callback parameter to anonymize_str(), 2020-06-23) to handle --anonymize-map parsing, but that turned out to be awkward itself, and was recently dropped. So let's get rid of this "data" parameter that nobody is using, both from the generators and from anonymize_str() which plumbed it through. This simplifies the code, and makes -Wunused-parameter happier. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2023-03-22 15:37:09 -07:00
Jeff King	aa548459a0	fast-export: de-obfuscate --anonymize-map handling When we handle an --anonymize-map option, we parse the orig/anon pair, and then feed the "orig" string to anonymize_str(), along with a generator function that duplicates the "anon" string to be cached in the map. This works, because anonymize_str() says "ah, there is no mapping yet for orig; I'll add one from the generator". But there are some downsides: 1. It's a bit too clever, as it's not obvious what the code is trying to do or why it works. 2. It requires allowing generator functions to take an extra void pointer, which is not something any of the normal callers of anonymize_str() want. 3. It does the wrong thing if the same token is provided twice. When there are conflicting options, like: git fast-export --anonymize \ --anonymize-map=foo:one \ --anonymize-map=foo:two we usually let the second one override the first. But by using anonymize_str(), which has first-one-wins logic, we do the opposite. So instead of relying on anonymize_str(), let's directly add the entry ourselves. We can tweak the tests to show that we handle overridden options correctly now. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2023-03-22 15:37:09 -07:00
Jeff King	dcc4e134aa	fast-export: factor out anonymized_entry creation When anonymizing output, there's only one spot where we generate new entries to add to our hashmap: when anonymize_str() doesn't find an entry, we use the generate() callback to make one and add it. Let's pull that into its own function in preparation for another caller. Note that we'll add one extra feature. In anonymize_str(), we know that we won't find an existing entry in the hashmap (since it will only try to add after failing to find one). But other callers won't have the same behavior, so we should catch this case and free the now-dangling entry. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2023-03-22 15:37:09 -07:00
Jeff King	d6484e9fab	fast-export: simplify initialization of anonymized hashmaps We take pains to avoid doing a lookup on a hashmap which has not been initialized with hashmap_init(). That was necessary back when this code was written. But hashmap_get() became safer in `b7879b0ba6` (hashmap: allow re-use after hashmap_free(), 2020-11-02). Since then it's OK to call functions on a zero-initialized table; it will just correctly return NULL, since there is no match. This simplifies the code a little, and also lets us keep the initialization line closer to when we add an entry (which is when the hashmap really does need to be totally initialized). That will help later refactoring. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2023-03-22 15:37:08 -07:00
Jeff King	76e50f7fbc	fast-export: drop const when storing anonymized values We store anonymized values as pointers to "const char ", since they are conceptually const to callers who use them. But they are actually allocated strings whose memory is owned by the struct. The ownership mismatch hasn't been a big deal since we never free() them (they are held until the program ends), but let's switch them to "char " in preparation for changing that. Since most code only accesses them via anonymize_str(), it can continue to narrow them to "const char *" in its return value. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2023-03-22 15:37:08 -07:00
Junio C Hamano	27d43aaaf5	The third batch Signed-off-by: Junio C Hamano <gitster@pobox.com>	2023-03-21 14:19:03 -07:00
Junio C Hamano	ba235249c0	Merge branch 'fc/test-aggregation-clean-up' Code clean-up for test framework. * fc/test-aggregation-clean-up: test: don't print aggregate-results command test: simplify counts aggregation	2023-03-21 14:18:56 -07:00
Junio C Hamano	ea09dff59a	Merge branch 'ps/receive-pack-unlock-before-die' "git receive-pack" that responds to "git push" requests failed to clean a stale lockfile when killed in the middle, which has been corrected. * ps/receive-pack-unlock-before-die: receive-pack: fix stale packfile locks when dying	2023-03-21 14:18:55 -07:00
Junio C Hamano	1071deae00	Merge branch 'aj/ls-files-format-fix' Fix for a "ls-files --format="%(path)" that produced nonsense output, which was a bug in 2.38. * aj/ls-files-format-fix: ls-files: fix "--format" output of relative paths	2023-03-21 14:18:55 -07:00
Junio C Hamano	15108de2fa	Merge branch 'jk/format-patch-ignore-noprefix' "git format-patch" honors the src/dst prefixes set to nonstandard values with configuration variables like "diff.noprefix", causing receiving end of the patch that expects the standard -p1 format to break. Teach "format-patch" to ignore end-user configuration and always use the standard prefixes. This is a backward compatibility breaking change. * jk/format-patch-ignore-noprefix: rebase: prefer --default-prefix to --{src,dst}-prefix for format-patch format-patch: add format.noprefix option format-patch: do not respect diff.noprefix diff: add --default-prefix option t4013: add tests for diff prefix options diff: factor out src/dst prefix setup	2023-03-21 14:18:55 -07:00
Junio C Hamano	9b0c7f308a	am: refer to format-patch in the documentation There were two reasons we didn't do this. As "git am" is designed to grok e-mailed patches, not necessarily taken out of a Git repostiory or even if it came from a Git repository not necessarily produced with format-patch, we didn't want to single it out as the "blessed" input producer to the command. Also, in the original workflow that "git am" was invented for, the user of "am" was expected to be a different person than the users of "format-patch". But this is a very safe change to make in 2023. Thanks to the effort by many contributors, Git ended up becoming a bit more popular than we initially thought it would be, and "format-patch", which took me a few weeks to pursuade Linus to take in 2005, seems to have become the de-facto standard tool to produce patch e-mails. Interestingly, the documentation for "git apply", which is listed in SEE ALSO section of "git am" documentation, does mention "am" and "format-patch" as two things that are related but different from "apply" in an early part. Suggested-by: Kai Grossjohann <kai.grossjohann@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2023-03-21 13:18:45 -07:00
Felipe Contreras	ee6ad78260	doc: remove GNU troff workaround In 2007 the docbook project made the mistake of converting ' to \' for man pages [1]. It's a problem because groff interprets \' as acute accent which is rendered as ' in ASCII, but as ´ in utf-8. This started a cascade of bug reports in git [2], debian [3], Arch Linux [4], docbook itself [5], and probably many others. A solution was to use the correct groff character: \(aq, which is always rendered as ', but the problem is that such character doesn't work in other troff programs. A portable solution required the use of a conditional character that is \(aq in groff, but ' in all others: .ie \n(.g .ds Aq \(aq .el .ds Aq ' The proper solution took time to be implemented in docbook, but in 2010 they did it [6]. So the docbook man page stylesheets were broken from 1.73 to 1.76. Unfortunately by that point many workarounds already existed. In the case of git, GNU_ROFF was introduced, and in the case of Arch Linux a mapping from \' to ' was added to groff's man.local. Other distributions might have done the same, or similar workarounds. Since 2010 there is no need for this workaround, which is fixed elsewhere, not just in docbook, but other layers as well. Let's remove it. [1] `ea2a0bac56` [2] https://lore.kernel.org/git/20091012102926.GA3937@debian.b2j/ [3] https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=507673#65 [4] https://bugs.archlinux.org/task/9643 [5] https://sourceforge.net/p/docbook/bugs/1022/ [6] `fb55343426` Inspired-by: brian m. carlson <sandals@crustytoothpaste.net> Signed-off-by: Felipe Contreras <felipe.contreras@gmail.com> Reviewed-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2023-03-21 13:16:46 -07:00
Paul Eggert	370ddcbc89	git-compat-util: use gettimeofday(2) for time(2) Use gettimeofday instead of time(NULL) to get current time. This avoids clock skew on glibc 2.31+ on Linux, where in the first 1 to 2.5 ms of every second, time(NULL) returns a value that is one less than the tv_sec part of higher-resolution timestamps such as those returned by gettimeofday or timespec_get, or those in the file system. There are similar clock skew problems on AIX and MS-Windows, which have problems in the first 5 ms of every second. Without this patch, users can observe Git issuing a timestamp T+1 before it issues timestamp T, because Git sometimes uses time(NULL) or time(&t) and sometimes uses higher-res methods like gettimeofday. Although strictly speaking users should tolerate this behavior because a superuser can always change the clock back, this is a quality of implementation issue and users naturally expect Git to issue timestamps in increasing order unless the superuser has fiddled with the system clock. This patch always uses gettimeofday(...) instead of time(...), and I have verified that the resulting .o files never refer to the name 'time'. A trickier patch would change only those calls for which timestamp monotonicity is user-visible. Such a patch would require more expertise about Git internals, though, and would be harder to maintain later. Another possibility would be to change Git's documentation to warn users that Git does not always issue timestamps in increasing order. However, Git users would likely be either dismayed by this possibility, or confused by the level of detail that any such documentation would require. Yet another possibility would be to fix the Linux kernel so that the time syscall is consistent with the other timestamp syscalls. I suppose this has not been done due to performance implications. (Git's use of timestamps is rare enough that performance is not a significant consideration for git.) However, this wouldn't fix Git's problem on older Linux kernels, or on AIX or MS-Windows. Signed-off-by: Paul Eggert <eggert@cs.ucla.edu> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2023-03-21 13:11:42 -07:00
Derrick Stolee	cbfe360b14	commit-reach: add tips_reachable_from_bases() Both 'git for-each-ref --merged=<X>' and 'git branch --merged=<X>' use the ref-filter machinery to select references or branches (respectively) that are reachable from a set of commits presented by one or more --merged arguments. This happens within reach_filter(), which uses the revision-walk machinery to walk history in a standard way. However, the commit-reach.c file is full of custom searches that are more efficient, especially for reachability queries that can terminate early when reachability is discovered. Add a new tips_reachable_from_bases() method to commit-reach.c and call it from within reach_filter() in ref-filter.c. This affects both 'git branch' and 'git for-each-ref' as tested in p1500-graph-walks.sh. For the Linux kernel repository, we take an already-fast algorithm and make it even faster: Test HEAD~1 HEAD ------------------------------------------------------------------- 1500.5: contains: git for-each-ref --merged 0.13 0.02 -84.6% 1500.6: contains: git branch --merged 0.14 0.02 -85.7% 1500.7: contains: git tag --merged 0.15 0.03 -80.0% (Note that we remove the iterative 'git rev-list' test from p1500 because it no longer makes sense as a comparison to 'git for-each-ref' and would just waste time running it for these comparisons.) The algorithm is implemented in commit-reach.c in the method tips_reachable_from_base(). This method takes a string_list of tips and assigns the 'util' for each item with the value 1 if the base commit can reach those tips. Like other reachability queries in commit-reach.c, the fastest way to search for "can A reach B?" is to do a depth-first search up to the generation number of B, preferring to explore first parents before later parents. While we must walk all reachable commits up to that generation number when the answer is "no", the depth-first search can answer "yes" much faster than other approaches in most cases. This search becomes trickier when there are multiple targets for the depth-first search. The commits with lower generation number are more likely to be within the history of the start commit, but we don't want to waste time searching commits of low generation number if the commit target with lowest generation number has already been found. The trick here is to take the input commits and sort them by generation number in ascending order. Track the index within this order as min_generation_index. When we find a commit, if its index in the list is equal to min_generation_index, then we can increase the generation number boundary of our search to the next-lowest value in the list. With this mechanism, the number of commits to search is minimized with respect to the depth-first search heuristic. We will walk all commits up to the minimum generation number of a commit that is _not_ reachable from the start, but we will walk only the necessary portion of the depth-first search for the reachable commits of lower generation. Add extra tests for this behavior in t6600-test-reach.sh as the interesting data shape of that repository can sometimes demonstrate corner case bugs. Signed-off-by: Derrick Stolee <derrickstolee@github.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2023-03-20 12:17:33 -07:00
Derrick Stolee	49abcd21da	for-each-ref: add ahead-behind format atom The previous change implemented the ahead_behind() method, including an algorithm to compute the ahead/behind values for a number of commit tips relative to a number of commit bases. Now, integrate that algorithm as part of 'git for-each-ref' hidden behind a new format atom, ahead-behind. This naturally extends to 'git branch' and 'git tag' builtins, as well. This format allows specifying multiple bases, if so desired, and all matching references are compared against all of those bases. For this reason, failing to read a reference provided from these atoms results in an error. In order to translate the ahead_behind() method information to the format output code in ref-filter.c, we must populate arrays of ahead_behind_count structs. In struct ref_array, we store the full array that will be passed to ahead_behind(). In struct ref_array_item, we store an array of pointers that point to the relvant items within the full array. In this way, we can pull all relevant ahead/behind values directly when formatting output for a specific item. It also ensures the lifetime of the ahead_behind_count structs matches the time that the array is being used. Add specific tests of the ahead/behind counts in t6600-test-reach.sh, as it has an interesting repository shape. In particular, its merging strategy and its use of different commit-graphs would demonstrate over- counting if the ahead_behind() method did not already account for that possibility. Also add tests for the specific for-each-ref, branch, and tag builtins. In the case of 'git tag', there are intersting cases that happen when some of the selected tips are not commits. This requires careful logic around commits_nr in the second loop of filter_ahead_behind(). Also, the test in t7004 is carefully located to avoid being dependent on the GPG prereq. It also avoids using the test_commit helper, as that will add ticks to the time and disrupt the expected timestamps in later tag tests. Also add performance tests in a new p1300-graph-walks.sh script. This will be useful for more uses in the future, but for now compare the ahead-behind counting algorithm in 'git for-each-ref' to the naive implementation by running 'git rev-list --count' processes for each input. For the Git source code repository, the improvement is already obvious: Test this tree --------------------------------------------------------------- 1500.2: ahead-behind counts: git for-each-ref 0.07(0.07+0.00) 1500.3: ahead-behind counts: git branch 0.07(0.06+0.00) 1500.4: ahead-behind counts: git tag 0.07(0.06+0.00) 1500.5: ahead-behind counts: git rev-list 1.32(1.04+0.27) But the standard performance benchmark is the Linux kernel repository, which demosntrates a significant improvement: Test this tree --------------------------------------------------------------- 1500.2: ahead-behind counts: git for-each-ref 0.27(0.24+0.02) 1500.3: ahead-behind counts: git branch 0.27(0.24+0.03) 1500.4: ahead-behind counts: git tag 0.28(0.27+0.01) 1500.5: ahead-behind counts: git rev-list 4.57(4.03+0.54) The 'git rev-list' test exists in this change as a demonstration, but it will be removed in the next change to avoid wasting time on this comparison. Signed-off-by: Derrick Stolee <derrickstolee@github.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2023-03-20 12:17:33 -07:00
Derrick Stolee	fd67d149bd	commit-reach: implement ahead_behind() logic Fully implement the commit-counting logic required to determine ahead/behind counts for a batch of commit pairs. This is a new library method within commit-reach.h. This method will be linked to the for-each-ref builtin in the next change. The interface for ahead_behind() uses two arrays. The first array of commits contains the list of all starting points for the walk. This includes all tip commits _and_ base commits. The second array specifies base/tip pairs by pointing to commits within the first array, by index. The second array also stores the resulting ahead/behind counts for each of these pairs. This implementation of ahead_behind() allows multiple bases, if desired. Even with multiple bases, there is only one commit walk used for counting the ahead/behind values, saving time when the base/tip ranges overlap significantly. This interface for ahead_behind() also makes it very easy to call ensure_generations_valid() on the entire array of bases and tips. This call is necessary because it is critical that the walk that counts ahead/behind values never walks a commit more than once. Without generation numbers on every commit, there is a possibility that a commit date skew could cause the walk to revisit a commit and then double-count it. For this reason, it is strongly recommended that 'git ahead-behind' is only run in a repository with a commit-graph file that covers most of the reachable commits, storing precomputed generation numbers. If no commit-graph exists, this walk will be much slower as it must walk all reachable commits in ensure_generations_valid() before performing the counting logic. It is possible to detect if generation numbers are available at run time and redirect the implementation to another algorithm that does not require this property. However, that implementation requires a commit walk per base/tip pair _and_ can be slower due to the commit date heuristics required. Such an implementation could be considered in the future if there is a reason to include it, but most Git hosts should already be generating a commit-graph file as part of repository maintenance. Most Git clients should also be generating commit-graph files as part of background maintenance or automatic GCs. Now, let's discuss the ahead/behind counting algorithm. The first array of commits are considered the starting commits. The index within that array will play a critical role. We create a new commit slab that maps commits to a bitmap. For a given commit (anywhere in the history), its bitmap stores information relative to which of the input commits can reach that commit. The ith bit will be on if the ith commit from the starting list can reach that commit. It is important to notice that these bitmaps are not the typical "reachability bitmaps" that are stored in .bitmap files. Instead of signalling which objects are reachable from the current commit, they instead signal "which starting commits can reach me?" It is also important to know that the bitmap is not necessarily "complete" until we walk that commit. We will perform a commit walk by generation number in such a way that we can guarantee the bitmap is correct when we visit that commit. At the beginning of the ahead_behind() method, we initialize the bitmaps for each of the starting commits. By enabling the ith bit for the ith starting commit, we signal "the ith commit can reach itself." We walk commits by popping the commit with maximum generation number out of the queue, guaranteeing that we will never walk a child of that commit in any future steps. As we walk, we load the bitmap for the current commit and perform two main steps. The _second_ step examines each parent of the current commit and adds the current commit's bitmap bits to each parent's bitmap. (We create a new bitmap for the parent if this is our first time seeing that parent.) After adding the bits to the parent's bitmap, the parent is added to the walk queue. Due to this passing of bits to parents, the current commit has a guarantee that the ith bit is enabled on its bitmap if and only if the ith commit can reach the current commit. The first step of the walk is to examine the bitmask on the current commit and decide which ranges the commit is in or not. Due to the "bit pushing" in the second step, we have a guarantee that the ith bit of the current commit's bitmap is on if and only if the ith starting commit can reach it. For each ahead_behind_count struct, check the base_index and tip_index to see if those bits are enabled on the current bitmap. If exactly one bit is enabled, then increment the corresponding 'ahead' or 'behind' count. This increment is the reason we _absolutely need_ to walk commits at most once. The only subtle thing to do with this walk is to check to see if a parent has all bits on in its bitmap, in which case it becomes "stale" and is marked with the STALE bit. This allows queue_has_nonstale() to be the terminating condition of the walk, which greatly reduces the number of commits walked if all of the commits are nearby in history. It avoids walking a large number of common commits when there is a deep history. We also use the helper method insert_no_dup() to add commits to the priority queue without adding them multiple times. This uses the PARENT2 flag. Thus, we must clear both the STALE and PARENT2 bits of all commits, in case ahead_behind() is called multiple times in the same process. Co-authored-by: Taylor Blau <me@ttaylorr.com> Signed-off-by: Taylor Blau <me@ttaylorr.com> Signed-off-by: Derrick Stolee <derrickstolee@github.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2023-03-20 12:17:33 -07:00
Taylor Blau	c08645b353	commit-graph: introduce `ensure_generations_valid()` Use the just-introduced compute_reachable_generation_numbers_1() to implement a function which dynamically computes topological levels (or corrected commit dates) for out-of-graph commits. This will be useful for the ahead-behind algorithm we are about to introduce, which needs accurate topological levels on _all_ commits reachable from the tips in order to avoid over-counting. Co-authored-by: Derrick Stolee <derrickstolee@github.com> Signed-off-by: Taylor Blau <me@ttaylorr.com> Signed-off-by: Derrick Stolee <derrickstolee@github.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2023-03-20 12:17:33 -07:00
Derrick Stolee	2ee11f7261	commit-graph: return generation from memory The commit_graph_generation() method used to report a value of GENERATION_NUMBER_INFINITY if the commit_graph_data_slab had an instance for the given commit but the graph_pos indicated the commit was not in the commit-graph file. However, an upcoming change will introduce the ability to set generation values in-memory without writing the commit-graph file. Thus, we can no longer trust 'graph_pos' to indicate whether or not the generation member can be trusted. Instead, trust the 'generation' member if the commit has a value in the slab _and_ the 'generation' member is non-zero. Otherwise, treat it as GENERATION_NUMBER_INFINITY. This only makes a difference for a very old case for the commit-graph: the very first Git release to write commit-graph files wrote zeroes in the topological level positions. If we are parsing a commit-graph with all zeroes, those commits will now appear to have GENERATION_NUMBER_INFINITY (as if they were not parsed from the commit-graph). I attempted several variations to work around the need for providing an uninitialized 'generation' member, but this was the best one I found. It does require a change to a verification test in t5318 because it reports a different error than the one about non-zero generation numbers. Signed-off-by: Derrick Stolee <derrickstolee@github.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2023-03-20 12:17:33 -07:00
Derrick Stolee	80c928d947	commit-graph: simplify compute_generation_numbers() The previous change introduced the generic algorithm compute_reachable_generation_numbers() and used it as the core functionality of compute_topological_levels(). Now, use it as the core functionality of compute_generation_numbers(). The main difference here is that we use generation version 2, which is used in to toggle the logic in compute_generation_from_max() for computing the corrected commit date based on the corrected commit dates of the parent commits (and the commit date of the current commit). It also uses different methods for (get\|set)_generation in the vtable in order to store and access the value in the correct places. Co-authored-by: Taylor Blau <me@ttaylorr.com> Signed-off-by: Taylor Blau <me@ttaylorr.com> Signed-off-by: Derrick Stolee <derrickstolee@github.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2023-03-20 12:17:33 -07:00
Derrick Stolee	368d19b0b7	commit-graph: refactor compute_topological_levels() This patch extracts the common code used to compute topological levels and corrected committer dates into a common routine, compute_reachable_generation_numbers(). For ease of reading, it only modifies compute_topological_levels() to use this new routine, leaving compute_generation_numbers() to be modified in the next change. This new routine dispatches to call the necessary functions to get and set the generation number for a given commit through a vtable (the compute_generation_info struct). Computing the generation number itself is done in compute_generation_from_max(), which dispatches its implementation based on the generation version requested, or issuing a BUG() for unrecognized generation versions. This does not use a vtable because the logic depends only on the generation number version, not where the data is being loaded from or being stored to. This is a subtle point that will make more sense in a future change that modifies the in-memory generation values instead of just preparing values for writing to a commit-graph file. This change looks like it adds a lot of new code. However, two upcoming changes will be quite small due to the work being done in this change. Co-authored-by: Taylor Blau <me@ttaylorr.com> Signed-off-by: Taylor Blau <me@ttaylorr.com> Signed-off-by: Derrick Stolee <derrickstolee@github.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2023-03-20 12:17:33 -07:00
Derrick Stolee	b2c51b7590	for-each-ref: explicitly test no matches The for-each-ref builtin can take a list of ref patterns, but if none match, it still succeeds (but with no output). Add an explicit test that demonstrates that behavior. Signed-off-by: Derrick Stolee <derrickstolee@github.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2023-03-20 12:17:32 -07:00
Derrick Stolee	b73dec5530	for-each-ref: add --stdin option When a user wishes to input a large list of patterns to 'git for-each-ref' (likely a long list of exact refs) there are frequently system limits on the number of command-line arguments. Add a new --stdin option to instead read the patterns from standard input. Add tests that check that any unrecognized arguments are considered an error when --stdin is provided. Also, an empty pattern list is interpreted as the complete ref set. When reading from stdin, we populate the filter.name_patterns array dynamically as opposed to pointing to the 'argv' array directly. This is simple when using a strvec, as it is NULL-terminated in the same way. We then free the memory directly from the strvec. Helped-by: Phillip Wood <phillip.wood123@gmail.com> Signed-off-by: Derrick Stolee <derrickstolee@github.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2023-03-20 12:17:32 -07:00
SZEDER Gábor	353e6d4554	parse-options.h: use designated initializers in OPT_* macros Use designated initializers in the expansions of the OPT_* macros to make it more readable which one-letter macro parameter initializes which field in the resulting 'struct option'. Signed-off-by: SZEDER Gábor <szeder.dev@gmail.com> Reviewed-by: Elijah Newren <newren@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2023-03-20 12:04:07 -07:00

1 2 3 4 5 ...

69798 commits