development/git - HydraGit

mirror of https://github.com/git/git synced 2024-10-30 04:01:21 +00:00

Author	SHA1	Message	Date
Junio C Hamano	fc23c397c7	Merge branch 'tb/enable-cruft-packs-by-default' When "gc" needs to retain unreachable objects, packing them into cruft packs (instead of exploding them into loose object files) has been offered as a more efficient option for some time. Now the use of cruft packs has been made the default and no longer considered an experimental feature. * tb/enable-cruft-packs-by-default: repository.h: drop unused `gc_cruft_packs` builtin/gc.c: make `gc.cruftPacks` enabled by default t/t9300-fast-import.sh: prepare for `gc --cruft` by default t/t6500-gc.sh: add additional test cases t/t6500-gc.sh: refactor cruft pack tests t/t6501-freshen-objects.sh: prepare for `gc --cruft` by default t/t5304-prune.sh: prepare for `gc --cruft` by default builtin/gc.c: ignore cruft packs with `--keep-largest-pack` builtin/repack.c: fix incorrect reference to '-C' pack-write.c: plug a leak in stage_tmp_packfiles()	2023-04-28 16:03:03 -07:00
Junio C Hamano	849c8b3dbf	Merge branch 'tb/pack-revindex-on-disk' The on-disk reverse index that allows mapping from the pack offset to the object name for the object stored at the offset has been enabled by default. * tb/pack-revindex-on-disk: t: invert `GIT_TEST_WRITE_REV_INDEX` config: enable `pack.writeReverseIndex` by default pack-revindex: introduce `pack.readReverseIndex` pack-revindex: introduce GIT_TEST_REV_INDEX_DIE_ON_DISK pack-revindex: make `load_pack_revindex` take a repository t5325: mark as leak-free pack-write.c: plug a leak in stage_tmp_packfiles()	2023-04-27 16:00:59 -07:00
Taylor Blau	e3e24de1bf	builtin/gc.c: make `gc.cruftPacks` enabled by default Back in `5b92477f89` (builtin/gc.c: conditionally avoid pruning objects via loose, 2022-05-20), `git gc` learned the `--cruft` option and `gc.cruftPacks` configuration to opt-in to writing cruft packs when collecting or pruning unreachable objects. Cruft packs were introduced with the merge in `a50036da1a` (Merge branch 'tb/cruft-packs', 2022-06-03). They address the problem of "loose object explosions", where Git will write out many individual loose objects when there is a large number of unreachable objects that have not yet aged past `--prune=<date>`. Instead of keeping track of those unreachable yet recent objects via their loose object file's mtime, cruft packs collect all unreachable objects into a single pack with a corresponding `*.mtimes` file that acts as a table to store the mtimes of all unreachable objects. This prevents the need to store unreachable objects as loose as they age out of the repository, and avoids the problem of loose object explosions. Beyond avoiding loose object explosions, cruft packs also act as a more efficient mechanism to store unreachable objects as they age out of a repository. This is because pairs of similar unreachable objects serve as delta bases for one another. In `5b92477f89`, the feature was introduced as experimental. Since then, GitHub has been running these patches in every repository generating hundreds of millions of cruft packs along the way. The feature is battle-tested, and avoids many pathological cases such as above. Users who either run `git gc` manually, or via `git maintenance` can benefit from having cruft packs. As such, enable cruft pack generation to take place by default (by making `gc.cruftPacks` have the default of "true" rather than "false). Signed-off-by: Taylor Blau <me@ttaylorr.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2023-04-18 14:56:48 -07:00
Taylor Blau	05b9013b71	builtin/gc.c: ignore cruft packs with `--keep-largest-pack` When cruft packs were implemented, we never adjusted the code for `git gc`'s `--keep-largest-pack` and `gc.bigPackThreshold` to ignore cruft packs. This option and configuration option share a common implementation, but including cruft packs is wrong in both cases: - Running `git gc --keep-largest-pack` in a repository where the largest pack is the cruft pack itself will make it impossible for `git gc` to prune objects, since the cruft pack itself is kept. - The same is true for `gc.bigPackThreshold`, if the size of the cruft pack exceeds the limit set by the caller. In the future, it is possible that `gc.bigPackThreshold` could be used to write a separate cruft pack containing any new unreachable objects that entered the repository since the last time a cruft pack was written. There are some complexities to doing so, mainly around handling pruning objects that are in an existing cruft pack that is above the threshold (which would either need to be rewritten, or else delay pruning). Rewriting a substantially similar cruft pack isn't ideal, but it is significantly better than the status-quo. If users have large cruft packs that they don't want to rewrite, they can mark them as `*.keep` packs. But in general, if a repository has a cruft pack that is so large it is slowing down GC's, it should probably be pruned anyway. In the meantime, ignore cruft packs in the common implementation for both of these options, and add a pair of tests to prevent any future regressions here. Signed-off-by: Taylor Blau <me@ttaylorr.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2023-04-18 14:56:47 -07:00
Taylor Blau	a8dd7e05b1	config: enable `pack.writeReverseIndex` by default Back in `e37d0b8730` (builtin/index-pack.c: write reverse indexes, 2021-01-25), Git learned how to read and write a pack's reverse index from a file instead of in-memory. A pack's reverse index is a mapping from pack position (that is, the order that objects appear together in a ".pack") to their position in lexical order (that is, the order that objects are listed in an ".idx" file). Reverse indexes are consulted often during pack-objects, as well as during auxiliary operations that require mapping between pack offsets, pack order, and index index. They are useful in GitHub's infrastructure, where we have seen a dramatic increase in performance when writing ".rev" files[1]. In particular: - an ~80% reduction in the time it takes to serve fetches on a popular repository, Homebrew/homebrew-core. - a ~60% reduction in the peak memory usage to serve fetches on that same repository. - a collective savings of ~35% in CPU time across all pack-objects invocations serving fetches across all repositories in a single datacenter. Reverse indexes are also beneficial to end-users as well as forges. For example, the time it takes to generate a pack containing the objects for the 10 most recent commits in linux.git (representing a typical push) is significantly faster when on-disk reverse indexes are available: $ { git rev-parse HEAD && printf '^' && git rev-parse HEAD~10 } >in $ hyperfine -L v false,true 'git.compile -c pack.readReverseIndex={v} pack-objects --delta-base-offset --revs --stdout <in >/dev/null' Benchmark 1: git.compile -c pack.readReverseIndex=false pack-objects --delta-base-offset --revs --stdout <in >/dev/null Time (mean ± σ): 543.0 ms ± 20.3 ms [User: 616.2 ms, System: 58.8 ms] Range (min … max): 521.0 ms … 577.9 ms 10 runs Benchmark 2: git.compile -c pack.readReverseIndex=true pack-objects --delta-base-offset --revs --stdout <in >/dev/null Time (mean ± σ): 245.0 ms ± 11.4 ms [User: 335.6 ms, System: 31.3 ms] Range (min … max): 226.0 ms … 259.6 ms 13 runs Summary 'git.compile -c pack.readReverseIndex=true pack-objects --delta-base-offset --revs --stdout <in >/dev/null' ran 2.22 ± 0.13 times faster than 'git.compile -c pack.readReverseIndex=false pack-objects --delta-base-offset --revs --stdout <in >/dev/null' The same is true of writing a pack containing the objects for the 30 most-recent commits: $ { git rev-parse HEAD && printf '^' && git rev-parse HEAD~30 } >in $ hyperfine -L v false,true 'git.compile -c pack.readReverseIndex={v} pack-objects --delta-base-offset --revs --stdout <in >/dev/null' Benchmark 1: git.compile -c pack.readReverseIndex=false pack-objects --delta-base-offset --revs --stdout <in >/dev/null Time (mean ± σ): 866.5 ms ± 16.2 ms [User: 1414.5 ms, System: 97.0 ms] Range (min … max): 839.3 ms … 886.9 ms 10 runs Benchmark 2: git.compile -c pack.readReverseIndex=true pack-objects --delta-base-offset --revs --stdout <in >/dev/null Time (mean ± σ): 581.6 ms ± 10.2 ms [User: 1181.7 ms, System: 62.6 ms] Range (min … max): 567.5 ms … 599.3 ms 10 runs Summary 'git.compile -c pack.readReverseIndex=true pack-objects --delta-base-offset --revs --stdout <in >/dev/null' ran 1.49 ± 0.04 times faster than 'git.compile -c pack.readReverseIndex=false pack-objects --delta-base-offset --revs --stdout <in >/dev/null' ...and savings on trivial operations like computing the on-disk size of a single (packed) object are even more dramatic: $ git rev-parse HEAD >in $ hyperfine -L v false,true 'git.compile -c pack.readReverseIndex={v} cat-file --batch-check="%(objectsize:disk)" <in' Benchmark 1: git.compile -c pack.readReverseIndex=false cat-file --batch-check="%(objectsize:disk)" <in Time (mean ± σ): 305.8 ms ± 11.4 ms [User: 264.2 ms, System: 41.4 ms] Range (min … max): 290.3 ms … 331.1 ms 10 runs Benchmark 2: git.compile -c pack.readReverseIndex=true cat-file --batch-check="%(objectsize:disk)" <in Time (mean ± σ): 4.0 ms ± 0.3 ms [User: 1.7 ms, System: 2.3 ms] Range (min … max): 1.6 ms … 4.6 ms 1155 runs Summary 'git.compile -c pack.readReverseIndex=true cat-file --batch-check="%(objectsize:disk)" <in' ran 76.96 ± 6.25 times faster than 'git.compile -c pack.readReverseIndex=false cat-file --batch-check="%(objectsize:disk)" <in' In the more than two years since `e37d0b8730` was merged, Git's implementation of on-disk reverse indexes has been thoroughly tested, both from users enabling `pack.writeReverseIndexes`, and from GitHub's deployment of the feature. The latter has been running without incident for more than two years. This patch changes Git's behavior to write on-disk reverse indexes by default when indexing a pack, which should make the above operations faster for everybody's Git installation after a repack. (The previous commit explains some potential drawbacks of using on-disk reverse indexes in certain limited circumstances, that essentially boil down to a trade-off between time to generate, and time to access. For those limited cases, the `pack.readReverseIndex` escape hatch can be used). [1]: https://github.blog/2021-04-29-scaling-monorepo-maintenance/#reverse-indexes Signed-off-by: Taylor Blau <me@ttaylorr.com> Acked-by: Derrick Stolee <derrickstolee@github.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2023-04-13 07:55:46 -07:00
Taylor Blau	dbcf611617	pack-revindex: introduce `pack.readReverseIndex` Since `1615c567b8` (Documentation/config/pack.txt: advertise 'pack.writeReverseIndex', 2021-01-25), we have had the `pack.writeReverseIndex` configuration option, which tells Git whether or not it is allowed to write a ".rev" file when indexing a pack. Introduce a complementary configuration knob, `pack.readReverseIndex` to control whether or not Git will read any ".rev" file(s) that may be available on disk. This option is useful for debugging, as well as disabling the effect of ".rev" files in certain instances. This is useful because of the trade-off[^1] between the time it takes to generate a reverse index (slow from scratch, fast when reading an existing ".rev" file), and the time it takes to access a record (the opposite). For example, even though it is faster to use the on-disk reverse index when computing the on-disk size of a packed object, it is slower to enumerate the same value for all objects. Here are a couple of examples from linux.git. When computing the above for a single object, using the on-disk reverse index is significantly faster: $ git rev-parse HEAD >in $ hyperfine -L v false,true 'git.compile -c pack.readReverseIndex={v} cat-file --batch-check="%(objectsize:disk)" <in' Benchmark 1: git.compile -c pack.readReverseIndex=false cat-file --batch-check="%(objectsize:disk)" <in Time (mean ± σ): 302.5 ms ± 12.5 ms [User: 258.7 ms, System: 43.6 ms] Range (min … max): 291.1 ms … 328.1 ms 10 runs Benchmark 2: git.compile -c pack.readReverseIndex=true cat-file --batch-check="%(objectsize:disk)" <in Time (mean ± σ): 3.9 ms ± 0.3 ms [User: 1.6 ms, System: 2.4 ms] Range (min … max): 2.0 ms … 4.4 ms 801 runs Summary 'git.compile -c pack.readReverseIndex=true cat-file --batch-check="%(objectsize:disk)" <in' ran 77.29 ± 7.14 times faster than 'git.compile -c pack.readReverseIndex=false cat-file --batch-check="%(objectsize:disk)" <in' , but when instead trying to compute the on-disk object size for all objects in the repository, using the ".rev" file is a disadvantage over creating the reverse index from scratch: $ hyperfine -L v false,true 'git.compile -c pack.readReverseIndex={v} cat-file --batch-check="%(objectsize:disk)" --batch-all-objects' Benchmark 1: git.compile -c pack.readReverseIndex=false cat-file --batch-check="%(objectsize:disk)" --batch-all-objects Time (mean ± σ): 8.258 s ± 0.035 s [User: 7.949 s, System: 0.308 s] Range (min … max): 8.199 s … 8.293 s 10 runs Benchmark 2: git.compile -c pack.readReverseIndex=true cat-file --batch-check="%(objectsize:disk)" --batch-all-objects Time (mean ± σ): 16.976 s ± 0.107 s [User: 16.706 s, System: 0.268 s] Range (min … max): 16.839 s … 17.105 s 10 runs Summary 'git.compile -c pack.readReverseIndex=false cat-file --batch-check="%(objectsize:disk)" --batch-all-objects' ran 2.06 ± 0.02 times faster than 'git.compile -c pack.readReverseIndex=true cat-file --batch-check="%(objectsize:disk)" --batch-all-objects' Luckily, the results when running `git cat-file` with `--unordered` are closer together: $ hyperfine -L v false,true 'git.compile -c pack.readReverseIndex={v} cat-file --unordered --batch-check="%(objectsize:disk)" --batch-all-objects' Benchmark 1: git.compile -c pack.readReverseIndex=false cat-file --unordered --batch-check="%(objectsize:disk)" --batch-all-objects Time (mean ± σ): 5.066 s ± 0.105 s [User: 4.792 s, System: 0.274 s] Range (min … max): 4.943 s … 5.220 s 10 runs Benchmark 2: git.compile -c pack.readReverseIndex=true cat-file --unordered --batch-check="%(objectsize:disk)" --batch-all-objects Time (mean ± σ): 6.193 s ± 0.069 s [User: 5.937 s, System: 0.255 s] Range (min … max): 6.145 s … 6.356 s 10 runs Summary 'git.compile -c pack.readReverseIndex=false cat-file --unordered --batch-check="%(objectsize:disk)" --batch-all-objects' ran 1.22 ± 0.03 times faster than 'git.compile -c pack.readReverseIndex=true cat-file --unordered --batch-check="%(objectsize:disk)" --batch-all-objects' Because the equilibrium point between these two is highly machine- and repository-dependent, allow users to configure whether or not they will read any ".rev" file(s) with this configuration knob. [^1]: Generating a reverse index in memory takes O(N) time (where N is the number of objects in the repository), since we use a radix sort. Reading an entry from an on-disk ".rev" file is slower since each operation is bound by disk I/O instead of memory I/O. In order to compute the on-disk size of a packed object, we need to find the offset of our object, and the adjacent object (the on-disk size difference of these two). Finding the first offset requires a binary search. Finding the latter involves a single .rev lookup. Signed-off-by: Taylor Blau <me@ttaylorr.com> Acked-by: Derrick Stolee <derrickstolee@github.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2023-04-13 07:55:46 -07:00
Tao Klerks	42943b950e	mergetool: new config guiDefault supports auto-toggling gui by DISPLAY When no merge.tool or diff.tool is configured or manually selected, the selection of a default tool is sensitive to the DISPLAY variable; in a GUI session a gui-specific tool will be proposed if found, and otherwise a terminal-based one. This "GUI-optimizing" behavior is important because a GUI can make a huge difference to a user's ability to understand and correctly complete a non-trivial conflicting merge. Some time ago the merge.guitool and diff.guitool config options were introduced to enable users to configure both a GUI tool, and a non-GUI tool (with fallback if no GUI tool configured), in the same environment. Unfortunately, the --gui argument introduced to support the selection of the guitool is still explicit. When using configured tools, there is no equivalent of the no-tool-configured "propose a GUI tool if we are in a GUI environment" behavior. As proposed in <xmqqmtb8jsej.fsf@gitster.g>, introduce new configuration options, difftool.guiDefault and mergetool.guiDefault, supporting a special value "auto" which causes the corresponding tool or guitool to be selected depending on the presence of a non-empty DISPLAY value. Also support "true" to say "default to the guitool (unless --no-gui is passed on the commandline)", and "false" as the previous default behavior when these new configuration options are not specified. Signed-off-by: Tao Klerks <tao@klerks.biz> Acked-by: David Aguilar <davvid@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2023-04-05 21:03:29 -07:00
Junio C Hamano	9142fce9b0	Merge branch 'ah/rebase-merges-config' Streamline --rebase-merges command line option handling and introduce rebase.merges configuration variable. * ah/rebase-merges-config: rebase: add a config option for --rebase-merges rebase: deprecate --rebase-merges="" rebase: add documentation and test for --no-rebase-merges	2023-04-04 14:28:28 -07:00
Alex Henrie	6605fb70cb	rebase: add a config option for --rebase-merges The purpose of the new option is to accommodate users who would like --rebase-merges to be on by default and to facilitate turning on --rebase-merges by default without configuration in a future version of Git. Name the new option rebase.rebaseMerges, even though it is a little redundant, for consistency with the name of the command line option and to be clear when scrolling through values in the [rebase] section of .gitconfig. Support setting rebase.rebaseMerges to the nonspecific value "true" for users who don't need to or don't want to learn about the difference between rebase-cousins and no-rebase-cousins. Make --rebase-merges without an argument on the command line override any value of rebase.rebaseMerges in the configuration, for consistency with other command line flags with optional arguments that have an associated config option. Signed-off-by: Alex Henrie <alexhenrie24@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2023-03-27 09:32:49 -07:00
Junio C Hamano	15108de2fa	Merge branch 'jk/format-patch-ignore-noprefix' "git format-patch" honors the src/dst prefixes set to nonstandard values with configuration variables like "diff.noprefix", causing receiving end of the patch that expects the standard -p1 format to break. Teach "format-patch" to ignore end-user configuration and always use the standard prefixes. This is a backward compatibility breaking change. * jk/format-patch-ignore-noprefix: rebase: prefer --default-prefix to --{src,dst}-prefix for format-patch format-patch: add format.noprefix option format-patch: do not respect diff.noprefix diff: add --default-prefix option t4013: add tests for diff prefix options diff: factor out src/dst prefix setup	2023-03-21 14:18:55 -07:00
Jeff King	8d5213decf	format-patch: add format.noprefix option The previous commit dropped support for diff.noprefix in format-patch. While this will do the right thing in most cases (where sending patches without a prefix was an accidental side effect of the sender preferring to see their local patches without prefixes), it left no good option for a project or workflow where you really do want to send patches without prefixes. You'd be stuck using "--no-prefix" for every invocation. So let's add a config option specific to format-patch that enables this behavior. That gives people who have such a workflow a way to get what they want, but makes it hard to accidentally trigger it. A more backwards-compatible way of doing the transition would be to have format.noprefix default to diff.noprefix when it's not set. But that doesn't really help the "accidental" problem; people would have to manually set format.noprefix=false. And it's unlikely that anybody really wants format.noprefix=true in the first place. I'm adding it here mostly as an escape hatch, not because anybody has expressed any interest in it. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2023-03-09 08:37:27 -08:00
Felipe Contreras	765071a8f2	advice: add diverging advice for novices The user might not necessarily know why ff only was configured, maybe an admin did it, or the installer (Git for Windows), or perhaps they just followed some online advice. This can happen not only on pull.ff=only, but merge.ff=only too. Even worse if the user has configured pull.rebase=false and merge.ff=only, because in those cases a diverging merge will constantly keep failing. There's no trivial way to get out of this other than `git merge --no-ff`. Let's not assume our users are experts in git who completely understand all their configurations. Signed-off-by: Felipe Contreras <felipe.contreras@gmail.com> Acked-by: Taylor Blau <me@ttaylorr.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2023-03-08 09:28:42 -08:00
Junio C Hamano	9a4e18b701	Merge branch 'gm/signature-format-doc' Doc update. * gm/signature-format-doc: signature-format.txt: note SSH and X.509 signature delimiters	2023-03-06 21:51:56 -08:00
Gwyneth Morgan	31a431b18b	signature-format.txt: note SSH and X.509 signature delimiters This document only explains PGP signatures, but Git now supports X.509 signatures as of `1e7adb9756` (gpg-interface: introduce new signature format "x509" using gpgsm, 2018-07-17), and SSH signatures as of `29b315778e` (ssh signing: add ssh key format and signing code, 2021-09-10). Additionally, explain that these signature formats are controlled `gpg.format`, linking to its documentation, and explain in said `gpg.format` documentation that the underlying signature format is documented in signature-format.txt. Signed-off-by: Gwyneth Morgan <gwymor@tilde.club> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2023-02-27 13:42:43 -08:00
Junio C Hamano	50bebf98d9	format.attach: allow empty value to disable multi-part messages When a lower precedence configuration file (e.g. /etc/gitconfig) defines format.attach in any way, there was no way to disable it in a more specific configuration file (e.g. $HOME/.gitconfig). Change the behaviour of setting it to an empty string. It used to mean that the result is still a multipart message with only dashes used as a multi-part separator, but now it resets the setting to the default (which would be to give an inline patch, unless other command line options are in effect). Signed-off-by: Junio C Hamano <gitster@pobox.com>	2023-02-17 15:43:09 -08:00
Junio C Hamano	06bca9708a	Merge branch 'ab/retire-scripted-add-p' Finally retire the scripted "git add -p/-i" implementation and have everybody use the one reimplemented in C. * ab/retire-scripted-add-p: docs & comments: replace mentions of "git-add--interactive.perl" add API: remove run_add_interactive() wrapper function add: remove "add.interactive.useBuiltin" & Perl "git add--interactive"	2023-02-15 17:11:53 -08:00
Junio C Hamano	4f59836451	Merge branch 'ds/bundle-uri-5' The bundle-URI subsystem adds support for creation-token heuristics to help incremental fetches. * ds/bundle-uri-5: bundle-uri: test missing bundles with heuristic bundle-uri: store fetch.bundleCreationToken fetch: fetch from an external bundle URI bundle-uri: drop bundle.flag from design doc clone: set fetch.bundleURI if appropriate bundle-uri: download in creationToken order bundle-uri: parse bundle.<id>.creationToken values bundle-uri: parse bundle.heuristic=creationToken t5558: add tests for creationToken heuristic bundle: verify using check_connected() bundle: test unbundling with incomplete history	2023-02-15 17:11:52 -08:00
Ævar Arnfjörð Bjarmason	20b813d7d3	add: remove "add.interactive.useBuiltin" & Perl "git add--interactive" Since [1] first released with Git v2.37.0 the built-in version of "add -i" has been the default. That built-in implementation was added in [2], first released with Git v2.25.0. At this point enough time has passed to allow for finding any remaining bugs in this new implementation, so let's remove the fallback code. As with similar migrations for "stash"[3] and "rebase"[4] we're keeping a mention of "add.interactive.useBuiltin" in the documentation, but adding a warning() to notify any outstanding users that the built-in is now the default. As with [5] and [6] we should follow-up in the future and eventually remove that warning. 1. `0527ccb1b5` (add -i: default to the built-in implementation, 2021-11-30) 2. `f83dff60a7` (Start to implement a built-in version of `git add --interactive`, 2019-11-13) 3. `8a2cd3f512` (stash: remove the stash.useBuiltin setting, 2020-03-03) 4. `d03ebd411c` (rebase: remove the rebase.useBuiltin setting, 2019-03-18) 5. `deeaf5ee07` (stash: remove documentation for `stash.useBuiltin`, 2022-01-27) 6. `9bcde4d531` (rebase: remove transitory rebase.useBuiltin setting & env, 2021-03-23) Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2023-02-06 15:03:34 -08:00
Derrick Stolee	c429bed102	bundle-uri: store fetch.bundleCreationToken When a bundle list specifies the "creationToken" heuristic, the Git client downloads the list and then starts downloading bundles in descending creationToken order. This process stops as soon as all downloaded bundles can be applied to the repository (because all required commits are present in the repository or in the downloaded bundles). When checking the same bundle list twice, this strategy requires downloading the bundle with the maximum creationToken again, which is wasteful. The creationToken heuristic promises that the client will not have a use for that bundle if its creationToken value is at most the previous creationToken value. To prevent these wasteful downloads, create a fetch.bundleCreationToken config setting that the Git client sets after downloading bundles. This value allows skipping that maximum bundle download when this config value is the same value (or larger). To test that this works correctly, we can insert some "duplicate" fetches into existing tests and demonstrate that only the bundle list is downloaded. The previous logic for downloading bundles by creationToken worked even if the bundle list was empty, but now we have logic that depends on the first entry of the list. Terminate early in the (non-sensical) case of an empty bundle list. Signed-off-by: Derrick Stolee <derrickstolee@github.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2023-01-31 08:57:48 -08:00
Derrick Stolee	4074d3c7e1	clone: set fetch.bundleURI if appropriate Bundle providers may organize their bundle lists in a way that is intended to improve incremental fetches, not just initial clones. However, they do need to state that they have organized with that in mind, or else the client will not expect to save time by downloading bundles after the initial clone. This is done by specifying a bundle.heuristic value. There are two types of bundle lists: those at a static URI and those that are advertised from a Git remote over protocol v2. The new fetch.bundleURI config value applies for static bundle URIs that are not advertised over protocol v2. If the user specifies a static URI via 'git clone --bundle-uri', then Git can set this config as a reminder for future 'git fetch' operations to check the bundle list before connecting to the remote(s). For lists provided over protocol v2, we will want to take a different approach and create a property of the remote itself by creating a remote.<id>.* type config key. That is not implemented in this change. Later changes will update 'git fetch' to consume this option. Signed-off-by: Derrick Stolee <derrickstolee@github.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2023-01-31 08:57:48 -08:00
Derrick Stolee	c93c3d2fa4	bundle-uri: parse bundle.heuristic=creationToken The bundle.heuristic value communicates that the bundle list is organized to make use of the bundle.<id>.creationToken values that may be provided in the bundle list. Those values will create a total order on the bundles, allowing the Git client to download them in a specific order and even remember previously-downloaded bundles by storing the maximum creation token value. Before implementing any logic that parses or uses the bundle.<id>.creationToken values, teach Git to parse the bundle.heuristic value from a bundle list. We can use 'test-tool bundle-uri' to print the heuristic value and verify that the parsing works correctly. As an extra precaution, create the internal 'heuristics' array to be a list of (enum, string) pairs so we can iterate through the array entries carefully, regardless of the enum values. Signed-off-by: Derrick Stolee <derrickstolee@github.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2023-01-31 08:57:48 -08:00
Junio C Hamano	ffd9238685	Merge branch 'ds/omit-trailing-hash-in-index' Introduce an optional configuration to allow the trailing hash that protects the index file from bit flipping. * ds/omit-trailing-hash-in-index: features: feature.manyFiles implies fast index writes test-lib-functions: add helper for trailing hash read-cache: add index.skipHash config option hashfile: allow skipping the hash function	2023-01-16 12:07:47 -08:00
Derrick Stolee	17194b195d	features: feature.manyFiles implies fast index writes The recent addition of the index.skipHash config option allows index writes to speed up by skipping the hash computation for the trailing checksum. This is particularly critical for repositories with many files at HEAD, so add this config option to two cases where users in that scenario may opt-in to such behavior: 1. The feature.manyFiles config option enables some options that are helpful for repositories with many files at HEAD. 2. 'scalar register' and 'scalar reconfigure' set config options that optimize for large repositories. In both of these cases, set index.skipHash=true to gain this speedup. Add tests that demonstrate the proper way that index.skipHash=true can override feature.manyFiles=true. Signed-off-by: Derrick Stolee <derrickstolee@github.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2023-01-07 07:46:14 +09:00
Derrick Stolee	ee1f0c242e	read-cache: add index.skipHash config option The previous change allowed skipping the hashing portion of the hashwrite API, using it instead as a buffered write API. Disabling the hashwrite can be particularly helpful when the write operation is in a critical path. One such critical path is the writing of the index. This operation is so critical that the sparse index was created specifically to reduce the size of the index to make these writes (and reads) faster. This trade-off between file stability at rest and write-time performance is not easy to balance. The index is an interesting case for a couple reasons: 1. Writes block users. Writing the index takes place in many user- blocking foreground operations. The speed improvement directly impacts their use. Other file formats are typically written in the background (commit-graph, multi-pack-index) or are super-critical to correctness (pack-files). 2. Index files are short lived. It is rare that a user leaves an index for a long time with many staged changes. Outside of staged changes, the index can be completely destroyed and rewritten with minimal impact to the user. Following a similar approach to one used in the microsoft/git fork [1], add a new config option (index.skipHash) that allows disabling this hashing during the index write. The cost is that we can no longer validate the contents for corruption-at-rest using the trailing hash. [1] `21fed2d914` We load this config from the repository config given by istate->repo, with a fallback to the_repository if it is not set. While older Git versions will not recognize the null hash as a special case, the file format itself is still being met in terms of its structure. Using this null hash will still allow Git operations to function across older versions. The one exception is 'git fsck' which checks the hash of the index file. This used to be a check on every index read, but was split out to just the index in `a33fc72fe9` (read-cache: force_verify_index_checksum, 2017-04-14) and released first in Git 2.13.0. Document the versions that relaxed these restrictions, with the optimistic expectation that this change will be included in Git 2.40.0. Here, we disable this check if the trailing hash is all zeroes. We add a warning to the config option that this may cause undesirable behavior with older Git versions. As a quick comparison, I tested 'git update-index --force-write' with and without index.skipHash=true on a copy of the Linux kernel repository. Benchmark 1: with hash Time (mean ± σ): 46.3 ms ± 13.8 ms [User: 34.3 ms, System: 11.9 ms] Range (min … max): 34.3 ms … 79.1 ms 82 runs Benchmark 2: without hash Time (mean ± σ): 26.0 ms ± 7.9 ms [User: 11.8 ms, System: 14.2 ms] Range (min … max): 16.3 ms … 42.0 ms 69 runs Summary 'without hash' ran 1.78 ± 0.76 times faster than 'with hash' These performance benefits are substantial enough to allow users the ability to opt-in to this feature, even with the potential confusion with older 'git fsck' versions. Test this new config option, both at a command-line level and within a submodule. The confirmation is currently limited to confirm that 'git fsck' does not complain about the index. Future updates will make this test more robust. It is critical that this test is placed before the test_index_version tests, since those tests obliterate the .git/config file and hence lose the setting from GIT_TEST_DEFAULT_HASH, if set. Signed-off-by: Derrick Stolee <derrickstolee@github.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2023-01-07 07:46:14 +09:00
Junio C Hamano	e83d57e34a	Merge branch 'ew/format-patch-mboxrd' "git format-patch" learned to honor format.mboxrd even when sending patches to the standard output stream, * ew/format-patch-mboxrd: format-patch: support format.mboxrd with --stdout	2023-01-02 21:37:19 +09:00
Junio C Hamano	0903d8bbde	Merge branch 'ds/bundle-uri-4' Bundle URIs part 4. * ds/bundle-uri-4: clone: unbundle the advertised bundles bundle-uri: download bundles from an advertised list bundle-uri: allow relative URLs in bundle lists strbuf: introduce strbuf_strip_file_from_path() bundle-uri: serve bundle.* keys from config bundle-uri client: add helper for testing server transport: rename got_remote_heads bundle-uri client: add boolean transfer.bundleURI setting clone: request the 'bundle-uri' command when available t: create test harness for 'bundle-uri' command protocol v2: add server-side "bundle-uri" skeleton	2023-01-02 21:37:18 +09:00
Eric Wong	4810946f60	format-patch: support format.mboxrd with --stdout mboxrd is a more robust output format when used with --stdout and needs more exposure. Introducing this config knob lets users choose the more robust format for all their --stdout uses. Relying on --pretty=mboxrd and including all of pretty-formats.txt in the `git format-patch' documentation would likely be confusing to users. Furthermore, this setting is useful across multiple invocations. So introduce `format.mboxrd' as a boolean configuration knob that changes the default --pretty=email format to --pretty=mboxrd when (and only when) --stdout is in use. Signed-off-by: Eric Wong <e@80x24.org> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2022-12-25 16:32:45 +09:00
Ævar Arnfjörð Bjarmason	7cce9074a7	bundle-uri client: add boolean transfer.bundleURI setting The yet-to-be introduced client support for bundle-uri will always fall back on a full clone, but we'd still like to be able to ignore a server's bundle-uri advertisement entirely. The new transfer.bundleURI config option defaults to 'false', but a user can set it to 'true' to enable checking for bundle URIs from the origin Git server using protocol v2. Co-authored-by: Derrick Stolee <derrickstolee@github.com> Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com> Signed-off-by: Derrick Stolee <derrickstolee@github.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2022-12-25 16:24:23 +09:00
Junio C Hamano	173fc54b00	Merge branch 'jt/submodule-on-demand' Push all submodules recursively with '--recurse-submodules=on-demand'. * jt/submodule-on-demand: Doc: document push.recurseSubmodules=only	2022-11-23 11:22:25 +09:00
Jonathan Tan	e62f779ae6	Doc: document push.recurseSubmodules=only Git learned pushing submodules without pushing the superproject by the user specifying --recurse-submodules=only through `6c656c3fe4` ("submodules: add RECURSE_SUBMODULES_ONLY value", 2016-12-20) and `225e8bf778` ("push: add option to push only submodules", 2016-12-20). For users who use this feature regularly, it is desirable to have an equivalent configuration. It turns out that such a configuration (push.recurseSubmodules=only) is already supported, even though it is neither documented nor mentioned in the commit messages, due to the way the --recurse-submodules=only feature was implemented (a function used to parse --recurse-submodules was updated to support "only", but that same function is used to parse push.recurseSubmodules too). What is left is to document it and test it, which is what this commit does. There is a possible point of confusion when recursing into a submodule that itself has the push.recurseSubmodules=only configuration, because if a repository has only its submodules pushed and not itself, its superproject can never be pushed. Therefore, treat such configurations as being "on-demand", and print a warning message. Signed-off-by: Jonathan Tan <jonathantanmy@google.com> Signed-off-by: Taylor Blau <me@ttaylorr.com>	2022-11-14 16:55:50 -05:00
Taylor Blau	4b6302c72f	Merge branch 'po/glossary-around-traversal' The glossary entries for "commit-graph file" and "reachability bitmap" have been added. * po/glossary-around-traversal: glossary: add reachability bitmap description glossary: add "commit graph" description doc: use 'object database' not ODB or abbreviation doc: use "commit-graph" hyphenation consistently	2022-11-08 17:14:51 -05:00
Taylor Blau	bdd42e34e3	Merge branch 'es/mark-gc-cruft-as-experimental' Enable gc.cruftpacks by default for those who opt into feature.experimental setting. * es/mark-gc-cruft-as-experimental: config: let feature.experimental imply gc.cruftPacks=true gc: add tests for --cruft and friends	2022-11-08 17:14:48 -05:00
Taylor Blau	1e230dfd6c	Merge branch 'jc/doc-fsck-msgids' Add documentation for message IDs in fsck error messages. * jc/doc-fsck-msgids: Documentation: add lint-fsck-msgids fsck: document msg-id fsck: remove the unused MISSING_TREE_OBJECT fsck: remove the unused BAD_TAG_OBJECT	2022-10-30 21:04:44 -04:00
Taylor Blau	d32dd8add5	Merge branch 'ds/bundle-uri-3' Define the logical elements of a "bundle list", data structure to store them in-core, format to transfer them, and code to parse them. * ds/bundle-uri-3: bundle-uri: suppress stderr from remote-https bundle-uri: quiet failed unbundlings bundle: add flags to verify_bundle() bundle-uri: fetch a list of bundles bundle: properly clear all revision flags bundle-uri: limit recursion depth for bundle lists bundle-uri: parse bundle list in config format bundle-uri: unit test "key=value" parsing bundle-uri: create "key=value" line parsing bundle-uri: create base key-value pair parsing bundle-uri: create bundle_list struct and helpers bundle-uri: use plain string in find_temp_filename()	2022-10-30 21:04:44 -04:00
Philip Oakley	776ba91a5e	doc: use "commit-graph" hyphenation consistently Note, historical release notes have not been updated. Signed-off-by: Philip Oakley <philipoakley@iee.email> Signed-off-by: Taylor Blau <me@ttaylorr.com>	2022-10-30 19:58:40 -04:00
Junio C Hamano	1b97c136cc	Merge branch 'so/diff-merges-cleanup' into maint-2.38 Code clean-up. * so/diff-merges-cleanup: diff-merges: clarify log.diffMerges documentation diff-merges: cleanup set_diff_merges() diff-merges: cleanup func_by_opt()	2022-10-27 15:24:11 -07:00
Emily Shaffer	c695592850	config: let feature.experimental imply gc.cruftPacks=true We are interested in exploring whether gc.cruftPacks=true should become the default value. To determine whether it is safe to do so, let's encourage more users to try it out. Users who have set feature.experimental=true have already volunteered to try new and possibly-breaking config changes, so let's try this new default with that set of users. Signed-off-by: Emily Shaffer <emilyshaffer@google.com> Signed-off-by: Taylor Blau <me@ttaylorr.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2022-10-26 14:39:31 -07:00
Junio C Hamano	b30a4435ed	Merge branch 'nb/doc-mergetool-typofix' into maint-2.38 Typofix. * nb/doc-mergetool-typofix: mergetool.txt: typofix 'overwriten' -> 'overwritten'	2022-10-25 17:11:38 -07:00
John Cai	f6534dbda4	fsck: document msg-id The documentation lacks mention of specific <msg-id> that are supported. While git-help --config will display a list of these options, often developers' first instinct is to consult the git docs to find valid config values. Add a list of fsck error messages, and link to it from the git-fsck documentation. Signed-off-by: John Cai <johncai86@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2022-10-25 15:44:18 -07:00
Junio C Hamano	9c32cfb49c	Git 2.38.1 -----BEGIN PGP SIGNATURE----- iQIzBAABCAAdFiEE4fA2sf7nIh/HeOzvsLXohpav5ssFAmM/rwcACgkQsLXohpav 5stHpQ/9Eqd0dVwVA6FijqRr6Nsdt8ufGh4OPZUWlNoQeJbp6N1IDGydAxfzNRNC fQTqGyL0ZdvLkWZUQ5ACL+157ArJGINE1f+EjOy+MDcyClPfJpk3r4O/qftmowQk l3vnAKBqYRn5ta2+fg6a0R6Q3cH5qZsucXwvspEU+TcqMV6QAQYsbINxnO+VNCSV tmqeVO8bvNR+zsZ6p8J1EduWpgvh6XsBpr56UxnOim+XEp+nAzPOILJTbYnMx0Am HD6WO7Ws3Wp9hj6cKYjcXyNmXT0T4EOhXtIBCKaXxAjXvvX77a9dpUQNI5n91DAi HQ/viM4hhrqBfs3jtr6qnDB/c1wcCLH+1QiOlB/2TE9l4zjR25lAtv901uey4yg6 A8he9nr1eEiPN0k3vrhYE01rUi9I1arAZ9lVF28NF+JMM25F8dZc2YZbc3UHoBMZ 7ilpydBqXe43ll4/J8XRcMPQeR7++ss0ROqVN/xXnVB0UWvCYhMFleJ1KA7LHjQd XaRi9Xsiki9OTXFrr7u8QZ94RinpHPUkuGxODO7Jqo8uL5+9JIdVuNbJbzQDK8s4 aU6nfSM7clNebrjaTOeiQB8hv0/uZt6QpUQzT4Q7OBOJzO4uLbkDxChIw/sflQWB rWRb63/KOtap78DVvMJMw5OQC4hXi7lJIchgZ8hfBKKs83p5Smk= =bTdb -----END PGP SIGNATURE----- Sync with v2.38.1	2022-10-17 15:46:09 -07:00
Junio C Hamano	7b8cfe34d9	Merge branch 'ed/fsmonitor-on-networked-macos' By default, use of fsmonitor on a repository on networked filesystem is disabled. Add knobs to make it workable on macOS. * ed/fsmonitor-on-networked-macos: fsmonitor: fix leak of warning message fsmonitor: add documentation for allowRemote and socketDir options fsmonitor: check for compatability before communicating with fsmonitor fsmonitor: deal with synthetic firmlinks on macOS fsmonitor: avoid socket location check if using hook fsmonitor: relocate socket file if .git directory is remote fsmonitor: refactor filesystem checks to common interface	2022-10-17 14:56:31 -07:00
Derrick Stolee	bff03c47f7	bundle-uri: create base key-value pair parsing There will be two primary ways to advertise a bundle list: as a list of packet lines in Git's protocol v2 and as a config file served from a bundle URI. Both of these fundamentally use a list of key-value pairs. We will use the same set of key-value pairs across these formats. Create a new bundle_list_update() method that is currently unusued, but will be used in the next change. It inspects each key to see if it is understood and then applies it to the given bundle_list. Here are the keys that we teach Git to understand: * bundle.version: This value should be an integer. Git currently understands only version 1 and will ignore the list if the version is any other value. This version can be increased in the future if we need to add new keys that Git should not ignore. We can add new "heuristic" keys without incrementing the version. * bundle.mode: This value should be one of "all" or "any". If this mode is not understood, then Git will ignore the list. This mode indicates whether Git needs all of the bundle list items to make a complete view of the content or if any single item is sufficient. The rest of the keys use a bundle identifier "<id>" as part of the key name. Keys using the same "<id>" describe a single bundle list item. * bundle.<id>.uri: This stores the URI of the bundle item. This currently is expected to be an absolute URI, but will be relaxed to be a relative URI in the future. While parsing, return an error if a URI key is repeated, since we can make that restriction with bundle lists. Make the git_parse_int() method global so we can parse the integer version value carefully. Signed-off-by: Derrick Stolee <derrickstolee@github.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2022-10-12 09:13:24 -07:00
Junio C Hamano	39c1578c5e	Merge branch 'nb/doc-mergetool-typofix' Typofix. * nb/doc-mergetool-typofix: mergetool.txt: typofix 'overwriten' -> 'overwritten'	2022-10-11 10:36:12 -07:00
Taylor Blau	f64d4ca8d6	Sync with 2.37.4 Signed-off-by: Taylor Blau <me@ttaylorr.com>	2022-10-06 20:00:04 -04:00
Taylor Blau	f2798aa404	Sync with 2.36.3 Signed-off-by: Taylor Blau <me@ttaylorr.com>	2022-10-06 19:58:16 -04:00
Taylor Blau	58612f82b6	Sync with 2.35.5 Signed-off-by: Taylor Blau <me@ttaylorr.com>	2022-10-06 17:44:44 -04:00
Taylor Blau	ac8a1db867	Sync with 2.34.5 Signed-off-by: Taylor Blau <me@ttaylorr.com>	2022-10-06 17:43:37 -04:00
Taylor Blau	478a426f14	Sync with 2.33.5 Signed-off-by: Taylor Blau <me@ttaylorr.com>	2022-10-06 17:42:55 -04:00
Taylor Blau	3957f3c84e	Sync with 2.32.4 Signed-off-by: Taylor Blau <me@ttaylorr.com>	2022-10-06 17:42:02 -04:00
Taylor Blau	9cbd2827c5	Sync with 2.31.5 Signed-off-by: Taylor Blau <me@ttaylorr.com>	2022-10-06 17:40:44 -04:00

1 2 3 4 5 ...

593 commits