development/git - git

mirror of https://github.com/git/git synced 2024-07-07 19:39:27 +00:00

Author	SHA1	Message	Date
Junio C Hamano	2744383cbd	Merge branch 'tb/geometric-repack' "git repack" so far has been only capable of repacking everything under the sun into a single pack (or split by size). A cleverer strategy to reduce the cost of repacking a repository has been introduced. * tb/geometric-repack: builtin/pack-objects.c: ignore missing links with --stdin-packs builtin/repack.c: reword comment around pack-objects flags builtin/repack.c: be more conservative with unsigned overflows builtin/repack.c: assign pack split later t7703: test --geometric repack with loose objects builtin/repack.c: do not repack single packs with --geometric builtin/repack.c: add '--geometric' option packfile: add kept-pack cache for find_kept_pack_entry() builtin/pack-objects.c: rewrite honor-pack-keep logic p5303: measure time to repack with keep p5303: add missing &&-chains builtin/pack-objects.c: add '--stdin-packs' option revision: learn '--no-kept-objects' packfile: introduce 'find_kept_pack_entry()'	2021-03-24 14:36:27 -07:00
René Scharfe	ca56dadb4b	use CALLOC_ARRAY Add and apply a semantic patch for converting code that open-codes CALLOC_ARRAY to use it instead. It shortens the code and infers the element size automatically. Signed-off-by: René Scharfe <l.s.r@web.de> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2021-03-13 16:00:09 -08:00
Taylor Blau	c9fff00016	revision: learn '--no-kept-objects' A future caller will want to be able to perform a reachability traversal which terminates when visiting an object found in a kept pack. The closest existing option is '--honor-pack-keep', but this isn't quite what we want. Instead of halting the traversal midway through, a full traversal is always performed, and the results are only trimmed afterwords. Besides needing to introduce a new flag (since culling results post-facto can be different than halting the traversal as it's happening), there is an additional wrinkle handling the distinction in-core and on-disk kept packs. That is: what kinds of kept pack should stop the traversal? Introduce '--no-kept-objects[=<on-disk\|in-core>]' to specify which kinds of kept packs, if any, should stop a traversal. This can be useful for callers that want to perform a reachability analysis, but want to leave certain packs alone (for e.g., when doing a geometric repack that has some "large" packs which are kept in-core that it wants to leave alone). Note that this option is not guaranteed to produce exactly the set of objects that aren't in kept packs, since it's possible the traversal order may end up in a situation where a non-kept ancestor was "cut off" by a kept object (at which point we would stop traversing). But, we don't care about absolute correctness here, since this will eventually be used as a purely additive guide in an upcoming new repack mode. Explicitly avoid documenting this new flag, since it is only used internally. In theory we could avoid even adding it rev-list, but being able to spell this option out on the command-line makes some special cases easier to test without promising to keep it behaving consistently forever. Those tricky cases are exercised in t6114. Signed-off-by: Taylor Blau <me@ttaylorr.com> Reviewed-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2021-02-22 23:30:52 -08:00
Junio C Hamano	8b4701ae4f	Merge branch 'ak/corrected-commit-date' The commit-graph learned to use corrected commit dates instead of the generation number to help topological revision traversal. * ak/corrected-commit-date: doc: add corrected commit date info commit-reach: use corrected commit dates in paint_down_to_common() commit-graph: use generation v2 only if entire chain does commit-graph: implement generation data chunk commit-graph: implement corrected commit date commit-graph: return 64-bit generation number commit-graph: add a slab to store topological levels t6600-test-reach: generalize *_three_modes commit-graph: consolidate fill_commit_graph_info revision: parse parent in indegree_walk_step() commit-graph: fix regression when computing Bloom filters	2021-02-17 17:21:40 -08:00
Junio C Hamano	c9f94ab4fa	Merge branch 'ab/lose-grep-debug' Lose the debugging aid that may have been useful in the past, but no longer is, in the "grep" codepaths. * ab/lose-grep-debug: grep/log: remove hidden --debug and --grep-debug options	2021-02-10 14:48:31 -08:00
Junio C Hamano	aac006aa99	Merge branch 'so/log-diff-merge' "git log" learned a new "--diff-merges=<how>" option. * so/log-diff-merge: (32 commits) t4013: add tests for --diff-merges=first-parent doc/git-show: include --diff-merges description doc/rev-list-options: document --first-parent changes merges format doc/diff-generate-patch: mention new --diff-merges option doc/git-log: describe new --diff-merges options diff-merges: add '--diff-merges=1' as synonym for 'first-parent' diff-merges: add old mnemonic counterparts to --diff-merges diff-merges: let new options enable diff without -p diff-merges: do not imply -p for new options diff-merges: implement new values for --diff-merges diff-merges: make -m/-c/--cc explicitly mutually exclusive diff-merges: refactor opt settings into separate functions diff-merges: get rid of now empty diff_merges_init_revs() diff-merges: group diff-merge flags next to each other inside 'rev_info' diff-merges: split 'ignore_merges' field diff-merges: fix -m to properly override -c/--cc t4013: add tests for -m failing to override -c/--cc t4013: support test_expect_failure through ':failure' magic diff-merges: revise revs->diff flag handling diff-merges: handle imply -p on -c/--cc logic for log.c ...	2021-02-05 16:40:44 -08:00
Ævar Arnfjörð Bjarmason	15c9649730	grep/log: remove hidden --debug and --grep-debug options Remove the hidden "grep --debug" and "log --grep-debug" options added in `17bf35a3c7` (grep: teach --debug option to dump the parse tree, 2012-09-13). At the time these options seem to have been intended to go along with a documentation discussion and to help the author of relevant tests to perform ad-hoc debugging on them[1]. Reasons to want this gone: 1. They were never documented, and the only (rather trivial) use of them in our own codebase for testing is something I removed back in `e01b4dab01` (grep: change non-ASCII -i test to stop using --debug, 2017-05-20). 2. Googling around doesn't show any in-the-wild uses I could dig up, and on the Git ML the only mentions after the original discussion seem to have been when they came up in unrelated diff contexts, or that test commit of mine. 3. An exception to that is `c581e4a749` (grep: under --debug, show whether PCRE JIT is enabled, 2019-08-18) where we added the ability to dump out when PCREv2 has the JIT in effect. The combination of that and my earlier `b65abcafc7` (grep: use PCRE v2 for optimized fixed-string search, 2019-07-01) means Git prints this out in its most common in-the-wild configuration: $ git log --grep-debug --grep=foo --grep=bar --grep=baz --all-match pcre2_jit_on=1 pcre2_jit_on=1 pcre2_jit_on=1 [all-match] (or pattern_body<body>foo (or pattern_body<body>bar pattern_body<body>baz ) ) $ git grep --debug $ -e foo --and -e bar $ --or -e baz pcre2_jit_on=1 pcre2_jit_on=1 pcre2_jit_on=1 (or (and patternfoo patternbar ) patternbaz ) I.e. for each pattern we're considering for the and/or/--all-match etc. debugging we'll now diligently spew out another identical line saying whether the PCREv2 JIT is on or not. I think that nobody's complained about that rather glaringly obviously bad output says something about how much this is used, i.e. it's not. The need for this debugging aid for the composed grep/log patterns seems to have passed, and the desire to dump the JIT config seems to have been another one-off around the time we had JIT-related issues on the PCREv2 codepath. That the original author of this debugging facility seemingly hasn't noticed the bad output since then[2] is probably some indicator. 1. https://lore.kernel.org/git/cover.1347615361.git.git@drmicha.warpmail.net/ 2. https://lore.kernel.org/git/xmqqk1b8x0ac.fsf@gitster-ct.c.googlers.com/ Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2021-01-26 11:36:20 -08:00
Junio C Hamano	b69bed22c5	Merge branch 'jk/log-cherry-pick-duplicate-patches' When more than one commit with the same patch ID appears on one side, "git log --cherry-pick A...B" did not exclude them all when a commit with the same patch ID appears on the other side. Now it does. * jk/log-cherry-pick-duplicate-patches: patch-ids: handle duplicate hashmap entries	2021-01-25 14:19:19 -08:00
Abhishek Kumar	d7f92784c6	commit-graph: return 64-bit generation number In a preparatory step for introducing corrected commit dates, let's return timestamp_t values from commit_graph_generation(), use timestamp_t for local variables and define GENERATION_NUMBER_INFINITY as (2 ^ 63 - 1) instead. We rename GENERATION_NUMBER_MAX to GENERATION_NUMBER_V1_MAX to represent the largest topological level we can store in the commit data chunk. With corrected commit dates implemented, we will have two such *_MAX variables to denote the largest offset and largest topological level that can be stored. Signed-off-by: Abhishek Kumar <abhishekkumar8222@gmail.com> Reviewed-by: Taylor Blau <me@ttaylorr.com> Reviewed-by: Derrick Stolee <dstolee@microsoft.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2021-01-18 16:21:18 -08:00
Abhishek Kumar	2f9bbb6d91	revision: parse parent in indegree_walk_step() In indegree_walk_step(), we add unvisited parents to the indegree queue. However, parents are not guaranteed to be parsed. As the indegree queue sorts by generation number, let's parse parents before inserting them to ensure the correct priority order. Signed-off-by: Abhishek Kumar <abhishekkumar8222@gmail.com> Reviewed-by: Taylor Blau <me@ttaylorr.com> Reviewed-by: Derrick Stolee <dstolee@microsoft.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2021-01-18 16:21:18 -08:00
Jeff King	c9e3a4e76d	patch-ids: handle duplicate hashmap entries This fixes a bug introduced in `dfb7a1b4d0` (patch-ids: stop using a hand-rolled hashmap implementation, 2016-07-29) in which git rev-list --cherry-pick A...B will fail to suppress commits reachable from A even if a commit with matching patch-id appears in B. Around the time of that commit, the algorithm for "--cherry-pick" looked something like this: 0. Traverse all of the commits, marking them as being on the left or right side of the symmetric difference. 1. Iterate over the left-hand commits, inserting a patch-id struct for each into a hashmap, and pointing commit->util to the patch-id struct. 2. Iterate over the right-hand commits, checking which are present in the hashmap. If so, we exclude the commit from the output _and_ we mark the patch-id as "seen". 3. Iterate again over the left-hand commits, checking whether commit->util->seen is set; if so, exclude them from the output. At the end, we'll have eliminated commits from both sides that have a matching patch-id on the other side. But there's a subtle assumption here: for any given patch-id, we must have exactly one struct representing it. If two commits from A both have the same patch-id and we allow duplicates in the hashmap, then we run into a problem: a. In step 1, we insert two patch-id structs into the hashmap. b. In step 2, our lookups will find only one of these structs, so only one "seen" flag is marked. c. In step 3, one of the commits in A will have its commit->util->seen set, but the other will not. We'll erroneously output the latter. Prior to `dfb7a1b4d0`, our hashmap did not allow duplicates. Afterwards, it used hashmap_add(), which explicitly does allow duplicates. At that point, the solution would have been easy: when we are about to add a duplicate, skip doing so and return the existing entry which matches. But it gets more complicated. In `683f17ec44` (patch-ids: replace the seen indicator with a commit pointer, 2016-07-29), our step 3 goes away entirely. Instead, in step 2, when the right-hand side finds a matching patch_id from the left-hand side, we can directly mark the left-hand patch_id->commit to be omitted. Solving that would be easy, too; there's a one-to-many relationship of patch-ids to commits, so we just need to keep a list. But there's more. Commit `b3dfeebb92` (rebase: avoid computing unnecessary patch IDs, 2016-07-29) built on that by lazily computing the full patch-ids. So we don't even know when adding to the hashmap whether two commits truly have the same id. We'd have to tentatively assign them a list, and then possibly split them apart (possibly into N new structs) at the moment we compute the real patch-ids. This could work, but it's complicated and error-prone. Instead, let's accept that we may store duplicates, and teach the lookup side to be more clever. Rather than asking for a single matching patch-id, it will need to iterate over all matching patch-ids. This does mean examining every entry in a single hash bucket, but the worst-case for a hash lookup was already doing that. We'll keep the hashmap details out of the caller by providing a simple iteration interface. We can retain the simple has_commit_patch_id() interface for the other callers, but we'll simplify its return value into an integer, rather than returning the patch_id struct. That way they won't be tempted to look at the "commit" field of the return value without iterating. Reported-by: Arnaud Morin <arnaud.morin@gmail.com> Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2021-01-12 11:13:32 -08:00
Derrick Stolee	90b666da60	revision: trace topo-walk statistics We trace statistics about the effectiveness of changed-path Bloom filters since `42e50e78` (revision.c: add trace2 stats around Bloom filter usage, 2020-04-06). Add similar tracing for the topo-walk algorithm that uses generation numbers to limit the walk size. This information can help investigate and describe benefits to heuristics and other changes. The information that is printed is in JSON format and can be formatted nicely to present as follows: { "count_explort_walked":2603, "count_indegree_walked":2603, "count_topo_walked":473 } Each of these values count the number of commits are visited by each of the three "stages" of the topo-walk as detailed in `b4542418` (revision.c: generation-based topo-order algorithm, 2018-11-01). Signed-off-by: Derrick Stolee <dstolee@microsoft.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2021-01-04 15:18:22 -08:00
Sergey Organov	a6e66af923	diff-merges: get rid of now empty diff_merges_init_revs() After getting rid of 'ignore_merges' field, the diff_merges_init_revs() function became empty. Get rid of it. Signed-off-by: Sergey Organov <sorganov@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2020-12-21 13:47:31 -08:00
Sergey Organov	18f09473bf	diff-merges: rename all functions to have common prefix Use the same "diff_merges" prefix for all the diff merges function names. Signed-off-by: Sergey Organov <sorganov@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2020-12-21 13:47:30 -08:00
Sergey Organov	a37eec6333	revision: move diff merges functions to its own diff-merges.c Create separate diff-merges.c and diff-merges.h files, and move all the code related to handling of diff merges there. Signed-off-by: Sergey Organov <sorganov@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2020-12-21 13:47:30 -08:00
Sergey Organov	3d4fd94363	revision: provide implementation for diff merges tweaks Use these implementations from show_setup_revisions_tweak() and log_setup_revisions_tweak() in builtin/log.c. This completes moving of management of diff merges parameters to a single place, where we can finally observe them simultaneously. Signed-off-by: Sergey Organov <sorganov@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2020-12-21 13:47:30 -08:00
Sergey Organov	027c4783d7	revision: factor out initialization of diff-merge related settings Move initialization code related to diffing merges into new init_diff_merge_revs() function. Signed-off-by: Sergey Organov <sorganov@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2020-12-21 13:47:30 -08:00
Sergey Organov	299a663440	revision: factor out setup of diff-merge related settings Move all the setting code related to diffing merges into new setup_diff_merge_revs() function. Signed-off-by: Sergey Organov <sorganov@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2020-12-21 13:47:30 -08:00
Sergey Organov	891e417cbc	revision: factor out parsing of diff-merge related options Move all the parsing code related to diffing merges into new parse_diff_merge_opts() function. Signed-off-by: Sergey Organov <sorganov@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2020-12-21 13:47:30 -08:00
Junio C Hamano	5dfb976460	Merge branch 'ma/grep-init-default' Code clean-up. * ma/grep-init-default: MyFirstObjectWalk: drop `init_walken_defaults()` grep: copy struct in one fell swoop grep: use designated initializers for `grep_defaults` grep: don't set up a "default" repo for grep	2020-12-08 15:11:20 -08:00
Martin Ågren	96313423a7	grep: use designated initializers for `grep_defaults` In `15fabd1bbd` ("builtin/grep.c: make configuration callback more reusable", 2012-10-09), we learned to fill a `static struct grep_opt grep_defaults` which we can use as a blueprint for other such structs. At the time, we didn't consider designated initializers to be widely useable, but these days, we do. (See, e.g., `cbc0f81d96` ("strbuf: use designated initializers in STRBUF_INIT", 2017-07-10).) Use designated initializers to let the compiler set up the struct and so that we don't need to remember to call `init_grep_defaults()`. Signed-off-by: Martin Ågren <martin.agren@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2020-11-21 14:50:33 -08:00
Martin Ågren	1d3878799f	grep: don't set up a "default" repo for grep `init_grep_defaults()` fills a `static struct grep_opt grep_defaults`. This struct is then used by `grep_init()` as a blueprint for other such structs. Notably, `grep_init()` takes a `struct repo ` and assigns it into the target struct. As a result, it is unnecessary for us to take a `struct repo ` in `init_grep_defaults()` as well. We assign it into the default struct and never look at it again. And in light of how we return early if we have already set up the default struct, it's not just unnecessary, but is also a bit confusing: If we are called twice and with different repos, is it a bug or a feature that we ignore the second repo? Drop the repo parameter for `init_grep_defaults()`. Signed-off-by: Martin Ågren <martin.agren@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2020-11-21 14:50:29 -08:00
Elijah Newren	b19315d8ab	Use new HASHMAP_INIT macro to simplify hashmap initialization Now that hashamp has lazy initialization and a HASHMAP_INIT macro, hashmaps allocated on the stack can be initialized without a call to hashmap_init() and in some cases makes the code a bit shorter. Convert some callsites over to take advantage of this. Signed-off-by: Elijah Newren <newren@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2020-11-11 12:55:27 -08:00
Elijah Newren	6da1a25814	hashmap: provide deallocation function names hashmap_free(), hashmap_free_entries(), and hashmap_free_() have existed for a while, but aren't necessarily the clearest names, especially with hashmap_partial_clear() being added to the mix and lazy-initialization now being supported. Peff suggested we adopt the following names[1]: - hashmap_clear() - remove all entries and de-allocate any hashmap-specific data, but be ready for reuse - hashmap_clear_and_free() - ditto, but free the entries themselves - hashmap_partial_clear() - remove all entries but don't deallocate table - hashmap_partial_clear_and_free() - ditto, but free the entries This patch provides the new names and converts all existing callers over to the new naming scheme. [1] https://lore.kernel.org/git/20201030125059.GA3277724@coredump.intra.peff.net/ Signed-off-by: Elijah Newren <newren@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2020-11-02 12:15:50 -08:00
Jeff King	e885a84f1b	drop unused argc parameters Many functions take an argv/argc pair, but never actually look at argc. This makes it useless at best (we use the NULL sentinel in argv to find the end of the array), and misleading at worst (what happens if the argc count does not match the argv NULL?). In each of these instances, the argv NULL does match the argc count, so there are no bugs here. But let's tighten the interfaces to make it harder to get wrong (and to reduce some -Wunused-parameter complaints). Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2020-09-30 12:53:47 -07:00
Junio C Hamano	288ed98bf7	Merge branch 'tb/bloom-improvements' "git commit-graph write" learned to limit the number of bloom filters that are computed from scratch with the --max-new-filters option. * tb/bloom-improvements: commit-graph: introduce 'commitGraph.maxNewFilters' builtin/commit-graph.c: introduce '--max-new-filters=<n>' commit-graph: rename 'split_commit_graph_opts' bloom: encode out-of-bounds filters as non-empty bloom/diff: properly short-circuit on max_changes bloom: use provided 'struct bloom_filter_settings' bloom: split 'get_bloom_filter()' in two commit-graph.c: store maximum changed paths commit-graph: respect 'commitGraph.readChangedPaths' t/helper/test-read-graph.c: prepare repo settings commit-graph: pass a 'struct repository *' in more places t4216: use an '&&'-chain commit-graph: introduce 'get_bloom_filter_settings()'	2020-09-29 14:01:20 -07:00
Junio C Hamano	4831c23f75	Merge branch 'mf/submodule-summary-with-correct-repository' "git diff/show" on a change that involves a submodule used to read the information on commits in the submodule from a wrong repository and gave a wrong information when the commit-graph is involved. * mf/submodule-summary-with-correct-repository: submodule: use submodule repository when preparing summary revision: use repository from rev_info when parsing commits	2020-09-18 17:58:05 -07:00
Taylor Blau	312cff5207	bloom: split 'get_bloom_filter()' in two 'get_bloom_filter' takes a flag to control whether it will compute a Bloom filter if the requested one is missing. In the next patch, we'll add yet another parameter to this method, which would force all but one caller to specify an extra 'NULL' parameter at the end. Instead of doing this, split 'get_bloom_filter' into two functions: 'get_bloom_filter' and 'get_or_compute_bloom_filter'. The former only looks up a Bloom filter (and does not compute one if it's missing, thus dropping the 'compute_if_not_present' flag). The latter does compute missing Bloom filters, with an additional parameter to store whether or not it needed to do so. This simplifies many call-sites, since the majority of existing callers to 'get_bloom_filter' do not want missing Bloom filters to be computed (so they can drop the parameter entirely and use the simpler version of the function). While we're at it, instrument the new 'get_or_compute_bloom_filter()' with counters in the 'write_commit_graph_context' struct which store the number of filters that we did and didn't compute, as well as filters that were truncated. It would be nice to drop the 'compute_if_not_present' flag entirely, since all remaining callers of 'get_or_compute_bloom_filter' pass it as '1', but this will change in a future patch and hence cannot be removed. Signed-off-by: Taylor Blau <me@ttaylorr.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2020-09-17 09:31:25 -07:00
Junio C Hamano	0df670bc0b	Merge branch 'jt/interpret-branch-name-fallback' "git status" has trouble showing where it came from by interpreting reflog entries that recordcertain events, e.g. "checkout @{u}", and gives a hard/fatal error. Even though it inherently is impossible to give a correct answer because the reflog entries lose some information (e.g. "@{u}" does not record what branch the user was on hence which branch 'the upstream' needs to be computed, and even if the record were available, the relationship between branches may have changed), at least hide the error to allow "status" show its output. * jt/interpret-branch-name-fallback: wt-status: tolerate dangling marks refs: move dwim_ref() to header file sha1-name: replace unsigned int with option struct	2020-09-09 13:53:09 -07:00
Junio C Hamano	c1ce30d364	Merge branch 'so/separate-field-for-m-and-diff-merges' Internal API clean-up to handle two options "diff-index" and "log" have, which happen to share the same short form, more sensibly. * so/separate-field-for-m-and-diff-merges: revision: add separate field for "-m" of "diff-index -m"	2020-09-09 13:53:07 -07:00
Taylor Blau	4f3644056a	commit-graph: introduce 'get_bloom_filter_settings()' Many places in the code often need a pointer to the commit-graph's 'struct bloom_filter_settings', in which case they often take the value from the top-most commit-graph. In the non-split case, this works as expected. In the split case, however, things get a little tricky. Not all layers in a chain of incremental commit-graphs are required to themselves have Bloom data, and so whether or not some part of the code uses Bloom filters depends entirely on whether or not the top-most level of the commit-graph chain has Bloom filters. This has been the behavior since Bloom filters were introduced, and has been codified into the tests since `a759bfa9ee` (t4216: add end to end tests for git log with Bloom filters, 2020-04-06). In fact, t4216.130 requires that Bloom filters are not used in exactly the case described earlier. There is no reason that this needs to be the case, since it is perfectly valid for commits in an earlier layer to have Bloom filters when commits in a newer layer do not. Since Bloom settings are guaranteed in practice to be the same for any layer in a chain that has Bloom data, it is sufficient to traverse the '->base_graph' pointer until either (1) a non-null 'struct bloom_filter_settings ' is found, or (2) until we are at the root of the commit-graph chain. Introduce a 'get_bloom_filter_settings()' function that does just this, and use it instead of purely dereferencing the top-most graph's '->bloom_filter_settings' pointer. While we're at it, add an additional test in t5324 to guard against code in the commit-graph writing machinery that doesn't correctly handle a NULL 'struct bloom_filter '. Co-authored-by: Derrick Stolee <dstolee@microsoft.com> Signed-off-by: Taylor Blau <me@ttaylorr.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2020-09-09 12:51:48 -07:00
Jonathan Tan	a4f66a7876	sha1-name: replace unsigned int with option struct In preparation for a future patch adding a boolean parameter to repo_interpret_branch_name(), which might be easily confused with an existing unsigned int parameter, refactor repo_interpret_branch_name() to take an option struct instead of the unsigned int parameter. The static function interpret_branch_mark() is also updated to take the option struct in preparation for that future patch, since it will also make use of the to-be-introduced boolean parameter. Signed-off-by: Jonathan Tan <jonathantanmy@google.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2020-09-02 14:39:17 -07:00
Junio C Hamano	cacab0c856	Merge branch 'jk/rev-input-given-fix' Feeding "$ZERO_OID" to "git log --ignore-missing --stdin", and running "git log --ignore-missing $ZERO_OID" fell back to start digging from HEAD; it has been corrected to become a no-op, like "git log --tags=no-tag-matches-this-pattern" does. * jk/rev-input-given-fix: revision: set rev_input_given in handle_revision_arg()	2020-08-31 15:49:52 -07:00
Sergey Organov	572fc9aa54	revision: add separate field for "-m" of "diff-index -m" Add separate 'match_missing' field for diff-index to use and set it when we encounter "-m" option. This field won't then be cleared when another meaning of "-m" is reverted (e.g., by "--no-diff-merges"), nor it will be affected by future option(s) that might drive 'ignore_merges' field. Use this new field from diff-lib:do_oneway_diff() instead of reusing 'ignore_merges' field. Signed-off-by: Sergey Organov <sorganov@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2020-08-31 13:42:58 -07:00
Jeff King	04a0e98515	revision: set rev_input_given in handle_revision_arg() Commit `7ba826290a` (revision: add rev_input_given flag, 2017-08-02) added a flag to rev_info to tell whether we got any revision arguments. As explained there, this is necessary because some revision arguments may not produce any pending traversal objects, but should still inhibit default behaviors (e.g., a glob that matches nothing). However, it only set the flag in the globbing code, but not for revisions we get on the command-line or via stdin. This leads to two problems: - the command-line code keeps its own separate got_rev_arg flag; this isn't wrong, but it's confusing and an extra maintenance burden - even specifically-named rev arguments might end up not adding any pending objects: if --ignore-missing is set, then specifying a missing object is a noop rather than an error. And that leads to some user-visible bugs: - when deciding whether a default rev like "HEAD" should kick in, we check both got_rev_arg and rev_input_given. That means that "--ignore-missing $ZERO_OID" works on the command-line (where we set got_rev_arg) but not on --stdin (where we don't) - when rev-list decides whether it should complain that it wasn't given a starting point, it relies on rev_input_given. So it can't even get the command-line "--ignore-missing $ZERO_OID" right Let's consistently set the flag if we got any revision argument. That lets us clean up the redundant got_rev_arg, and fixes both of those bugs (but note there are three new tests: we'll confirm the already working git-log command-line case). A few implementation notes: - conceptually we want to set the flag whenever handle_revision_arg() finds an actual revision arg ("handles" it, you might say). But it covers a ton of cases with early returns. Rather than annotating each one, we just wrap it and use its success exit-code to set the flag in one spot. - the new rev-list test is in t6018, which is titled to cover globs. This isn't exactly a glob, but it made sense to stick it with the other tests that handle the "even though we got a rev, we have no pending objects" case, which are globs. - the tests check for the oid of a missing object, which it's pretty clear --ignore-missing should ignore. You can see the same behavior with "--ignore-missing a-ref-that-does-not-exist", because --ignore-missing treats them both the same. That's perhaps less clearly correct, and we may want to change that in the future. But the way the code and tests here are written, we'd continue to do the right thing even if it does. Reported-by: Bryan Turner <bturner@atlassian.com> Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2020-08-26 13:30:08 -07:00
Junio C Hamano	a555b514cd	Merge branch 'so/log-diff-merges-opt' Earlier, to countermand the implicit "-m" option when the "--first-parent" option is used with "git log", we added the "--[no-]diff-merges" option in the jk/log-fp-implies-m topic. To leave the door open to allow the "--diff-merges" option to take values that instructs how patches for merge commits should be computed (e.g. "cc"? "-p against first parent?"), redefine "--diff-merges" to take non-optional value, and implement "off" that means the same thing as "--no-diff-merges". * so/log-diff-merges-opt: t/t4013: add test for --diff-merges=off doc/git-log: describe --diff-merges=off revision: change "--diff-merges" option to require parameter	2020-08-17 17:02:50 -07:00
Junio C Hamano	eca8c62a50	Merge branch 'jk/log-fp-implies-m' "git log --first-parent -p" showed patches only for single-parent commits on the first-parent chain; the "--first-parent" option has been made to imply "-m". Use "--no-diff-merges" to restore the previous behaviour to omit patches for merge commits. * jk/log-fp-implies-m: doc/git-log: clarify handling of merge commit diffs doc/git-log: move "-t" into diff-options list doc/git-log: drop "-r" diff option doc/git-log: move "Diff Formatting" from rev-list-options log: enable "-m" automatically with "--first-parent" revision: add "--no-diff-merges" option to counteract "-m" log: drop "--cc implies -m" logic	2020-08-17 17:02:49 -07:00
Junio C Hamano	47f0f94bc7	Merge branch 'al/bisect-first-parent' "git bisect" learns the "--first-parent" option to find the first breakage along the first-parent chain. * al/bisect-first-parent: bisect: combine args passed to find_bisection() bisect: introduce first-parent flag cmd_bisect__helper: defer parsing no-checkout flag rev-list: allow bisect and first-parent flags t6030: modernize "git bisect run" tests	2020-08-17 17:02:45 -07:00
Sergey Organov	6501580ff8	revision: change "--diff-merges" option to require parameter --diff-merges=off is the only accepted form for now, a synonym for --no-diff-merges. This patch is a preparation for adding more values, as well as supporting --diff-merges=<parent>, where <parent> is single parent number to output diff against. Signed-off-by: Sergey Organov <sorganov@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2020-08-11 14:20:24 -07:00
Junio C Hamano	abde3d39ec	Merge branch 'so/rev-parser-errormessage-fix' Error message fix. * so/rev-parser-errormessage-fix: revision: fix die() message for "--unpacked="	2020-08-10 10:24:03 -07:00
Junio C Hamano	1aa3dff4ba	Merge branch 'jk/compiler-fixes-and-workarounds' Small fixes and workarounds. * jk/compiler-fixes-and-workarounds: revision: avoid leak when preparing bloom filter for "/" revision: avoid out-of-bounds read/write on empty pathspec config: work around gcc-10 -Wstringop-overflow warning	2020-08-10 10:24:02 -07:00
Junio C Hamano	46b225f153	Merge branch 'jk/strvec' The argv_array API is useful for not just managing argv but any "vector" (NULL-terminated array) of strings, and has seen adoption to a certain degree. It has been renamed to "strvec" to reduce the barrier to adoption. * jk/strvec: strvec: rename struct fields strvec: drop argv_array compatibility layer strvec: update documention to avoid argv_array strvec: fix indentation in renamed calls strvec: convert remaining callers away from argv_array name strvec: convert more callers away from argv_array name strvec: convert builtin/ callers away from argv_array name quote: rename sq_dequote_to_argv_array to mention strvec strvec: rename files from argv-array to strvec argv-array: rename to strvec argv-array: use size_t for count and alloc	2020-08-10 10:23:57 -07:00
Aaron Lipman	0fe305a5d3	rev-list: allow bisect and first-parent flags Add first_parent_only parameter to find_bisection(), removing the barrier that prevented combining the --bisect and --first-parent flags when using git rev-list Based-on-patch-by: Tiago Botelho <tiagonbotelho@hotmail.com> Signed-off-by: Aaron Lipman <alipman88@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2020-08-07 15:11:59 -07:00
Sergey Organov	f649aaaf82	revision: fix die() message for "--unpacked=" Get rid of the trailing dot and mark for translation. Signed-off-by: Sergey Organov <sorganov@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2020-08-04 17:01:37 -07:00
Jeff King	398e659e1e	revision: avoid leak when preparing bloom filter for "/" If we're given an empty pathspec, we refuse to set up bloom filters, as described in `f3c2a36810` (revision: empty pathspecs should not use Bloom filters, 2020-07-01). But before the empty string check, we drop any trailing slash by allocating a new string without it. So a pathspec consisting only of "/" will allocate that string, but then still cause us to bail, leaking the new string. Let's make sure to free it. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2020-08-04 09:31:57 -07:00
Jeff King	fd9a631c56	revision: avoid out-of-bounds read/write on empty pathspec Running t4216 with ASan results in it complaining of an out-of-bounds read in prepare_to_use_bloom_filter(). The issue is this code to strip a trailing slash: last_index = pi->len - 1; if (pi->match[last_index] == '/') { because we have no guarantee that pi->len isn't zero. This can happen if the pathspec is ".", as we translate that to an empty string. And if that read of random memory does trigger the conditional, we'd then do an out-of-bounds write: path_alloc = xstrdup(pi->match); path_alloc[last_index] = '\0'; Let's make sure to check the length before subtracting. Note that for an empty pathspec, we'd end up bailing from the function a few lines later, which makes it tempting to just: if (!pi->len) return; early here. But our code here is stripping a trailing slash, and we need to check for emptiness after stripping that slash, too. So we'd have two blocks, which would require repeating some cleanup code. Instead, just skip the trailing-slash for an empty string. Setting last_index at all in the case is awkward since it will have a nonsense value (and it uses an "int", which is a too-small type for a string anyway). So while we're here, let's: - drop last_index entirely; it's only used in two spots right next to each other and writing out "pi->len - 1" in both is actually easier to follow - use xmemdupz() to duplicate the string. This is slightly more efficient, but more importantly makes the intent more clear by allocating the correct-sized substring in the first place. It also eliminates any question of whether path_alloc is as long as pi->match (which it would not be if pi->match has any embedded NULs, though in practice this is probably impossible). Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2020-08-04 09:31:02 -07:00
Jeff King	d70a9eb611	strvec: rename struct fields The "argc" and "argv" names made sense when the struct was argv_array, but now they're just confusing. Let's rename them to "nr" (which we use for counts elsewhere) and "v" (which is rather terse, but reads well when combined with typical variable names like "args.v"). Note that we have to update all of the callers immediately. Playing tricks with the preprocessor is hard here, because we wouldn't want to rewrite unrelated tokens. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2020-07-30 19:18:06 -07:00
Junio C Hamano	70cdbbe3a7	Merge branch 'ds/commit-graph-bloom-updates' into master Updates to the changed-paths bloom filter. * ds/commit-graph-bloom-updates: commit-graph: check all leading directories in changed path Bloom filters revision: empty pathspecs should not use Bloom filters revision.c: fix whitespace commit-graph: check chunk sizes after writing commit-graph: simplify chunk writes into loop commit-graph: unify the signatures of all write_graph_chunk_*() functions commit-graph: persist existence of changed-paths bloom: fix logic in get_bloom_filter() commit-graph: change test to die on parse, not load commit-graph: place bloom_settings in context	2020-07-30 13:20:31 -07:00
Junio C Hamano	de6dda0dc3	Merge branch 'sg/commit-graph-cleanups' into master The changed-path Bloom filter is improved using ideas from an independent implementation. * sg/commit-graph-cleanups: commit-graph: simplify write_commit_graph_file() #2 commit-graph: simplify write_commit_graph_file() #1 commit-graph: simplify parse_commit_graph() #2 commit-graph: simplify parse_commit_graph() #1 commit-graph: clean up #includes diff.h: drop diff_tree_oid() & friends' return value commit-slab: add a function to deep free entries on the slab commit-graph-format.txt: all multi-byte numbers are in network byte order commit-graph: fix parsing the Chunk Lookup table tree-walk.c: don't match submodule entries for 'submod/anything'	2020-07-30 13:20:30 -07:00
Jeff King	6fae74b418	revision: add "--no-diff-merges" option to counteract "-m" The "-m" option sets revs->ignore_merges to "0", but there's no way to undo it. This probably isn't something anybody overly cares about, since "1" is already the default, but it will serve as an escape hatch when we flip the default for ignore_merges to "0" in more situations. We'll also add a few extra niceties: - initialize the value to "-1" to indicate "not set", and then resolve it to the normal 0/1 bool in setup_revisions(). This lets any tweak functions, as well as setup_revisions() itself, avoid clobbering the user's preference (which until now they couldn't actually express). - since we now have --no-diff-merges, let's add the matching --diff-merges, which is just a synonym for "-m". Then we don't even need to document --no-diff-merges separately; it countermands the long form of "-m" in the usual way. The new test shows that this behaves just the same as the current behavior without "-m". Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2020-07-29 13:43:57 -07:00
Jeff King	c972bf4cf5	strvec: convert remaining callers away from argv_array name We eventually want to drop the argv_array name and just use strvec consistently. There's no particular reason we have to do it all at once, or care about interactions between converted and unconverted bits. Because of our preprocessor compat layer, the names are interchangeable to the compiler (so even a definition and declaration using different names is OK). This patch converts all of the remaining files, as the resulting diff is reasonably sized. The conversion was done purely mechanically with: git ls-files '.c' '.h' \| xargs perl -i -pe ' s/ARGV_ARRAY/STRVEC/g; s/argv_array/strvec/g; ' We'll deal with any indentation/style fallouts separately. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2020-07-28 15:02:18 -07:00
Jeff King	dbbcd44fb4	strvec: rename files from argv-array to strvec This requires updating #include lines across the code-base, but that's all fairly mechanical, and was done with: git ls-files '.c' '.h' \| xargs perl -i -pe 's/argv-array.h/strvec.h/' Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2020-07-28 15:02:17 -07:00
Junio C Hamano	20d451c4da	Merge branch 'rs/line-log-until' into master "git log -Lx,y:path --before=date" lost track of where the range should be because it didn't take the changes made by the youngest commits that are omitted from the output into account. * rs/line-log-until: revision: disable min_age optimization with line-log	2020-07-09 14:00:42 -07:00
Junio C Hamano	645f63111b	Merge branch 'es/get-worktrees-unsort' API cleanup for get_worktrees() * es/get-worktrees-unsort: worktree: drop get_worktrees() unused 'flags' argument worktree: drop get_worktrees() special-purpose sorting option	2020-07-06 22:09:15 -07:00
René Scharfe	01faa91cb7	revision: disable min_age optimization with line-log If one of the options --before, --min-age or --until is given, limit_list() filters out younger commits early on. Line-log needs all those commits to trace the movement of line ranges, though. Skip this optimization if both are used together. Reported-by: Мария Долгополова <dolgopolovamariia@gmail.com> Signed-off-by: René Scharfe <l.s.r@web.de> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2020-07-06 18:38:03 -07:00
SZEDER Gábor	c525ce95b4	commit-graph: check all leading directories in changed path Bloom filters The file 'dir/subdir/file' can only be modified if its leading directories 'dir' and 'dir/subdir' are modified as well. So when checking modified path Bloom filters looking for commits modifying a path with multiple path components, then check not only the full path in the Bloom filters, but all its leading directories as well. Take care to check these paths in "deepest first" order, because it's the full path that is least likely to be modified, and the Bloom filter queries can short circuit sooner. This can significantly reduce the average false positive rate, by about an order of magnitude or three(!), and can further speed up pathspec-limited revision walks. The table below compares the average false positive rate and runtime of git rev-list HEAD -- "$path" before and after this change for 5000+ randomly* selected paths from each repository: Average false Average Average positive rate runtime runtime before after before after difference ------------------------------------------------------------------ git 3.220% 0.7853% 0.0558s 0.0387s -30.6% linux 2.453% 0.0296% 0.1046s 0.0766s -26.8% tensorflow 2.536% 0.6977% 0.0594s 0.0420s -29.2% *Path selection was done with the following pipeline: git ls-tree -r --name-only HEAD \| sort -R \| head -n 5000 The improvements in runtime are much smaller than the improvements in average false positive rate, as we are clearly reaching diminishing returns here. However, all these timings depend on that accessing tree objects is reasonably fast (warm caches). If we had a partial clone and the tree objects had to be fetched from a promisor remote, e.g.: $ git clone --filter=tree:0 --bare file://.../webkit.git webkit.notrees.git $ git -C webkit.git -c core.modifiedPathBloomFilters=1 \ commit-graph write --reachable $ cp webkit.git/objects/info/commit-graph webkit.notrees.git/objects/info/ $ git -C webkit.notrees.git -c core.modifiedPathBloomFilters=1 \ rev-list HEAD -- "$path" then checking all leading path component can reduce the runtime from over an hour to a few seconds (and this is with the clone and the promisor on the same machine). This adjusts the tracing values in t4216-log-bloom.sh, which provides a concrete way to notice the improvement. Helped-by: Taylor Blau <me@ttaylorr.com> Helped-by: René Scharfe <l.s.r@web.de> Signed-off-by: SZEDER Gábor <szeder.dev@gmail.com> Signed-off-by: Derrick Stolee <dstolee@microsoft.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2020-07-01 14:17:43 -07:00
Taylor Blau	f3c2a36810	revision: empty pathspecs should not use Bloom filters The prepare_to_use_bloom_filter() method was not intended to be called on an empty pathspec. However, 'git log -- .' and 'git log' are subtly different: the latter reports all commits while the former will simplify commits that do not change the root tree. This means that the path used to construct the bloom_key might be empty, and that value is not added to the Bloom filter during construction. That means that the results are likely incorrect! To resolve the issue, be careful about the length of the path and stop filling Bloom filters. To be completely sure we do not use them, drop the pointer to the bloom_filter_settings from the commit-graph. That allows our test to look at the trace2 logs to verify no Bloom filter statistics are reported. Signed-off-by: Taylor Blau <me@ttaylorr.com> Signed-off-by: Derrick Stolee <dstolee@microsoft.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2020-07-01 14:17:43 -07:00
Derrick Stolee	dc8e95ba7c	revision.c: fix whitespace Here, four spaces were used instead of tab characters. Reported-by: Taylor Blau <me@ttaylorr.com> Signed-off-by: Derrick Stolee <dstolee@microsoft.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2020-07-01 14:17:43 -07:00
Derrick Stolee	949197420e	bloom: fix logic in get_bloom_filter() The get_bloom_filter() method is a bit complicated in some parts where it does not need to be. In particular, it needs to return a NULL filter only when compute_if_not_present is zero AND the filter data cannot be loaded from a commit-graph file. This currently happens by accident because the commit-graph does not load changed-path Bloom filters from an existing commit-graph when writing a new one. This will change in a later patch. Also clean up some style issues while we are here. One side-effect of returning a NULL filter is that the filters that are reported as "too large" will now be reported as NULL insead of length zero. This case was not properly covered before, so add a test. Further, remote the counting of the zero-length filters from revision.c and the trace2 logs. Helped-by: René Scharfe <l.s.r@web.de> Helped-by: SZEDER Gábor <szeder.dev@gmail.com> Signed-off-by: Derrick Stolee <dstolee@microsoft.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2020-07-01 14:17:43 -07:00
Michael Forney	ea3f7e598c	revision: use repository from rev_info when parsing commits This is needed when repo_init_revisions() is called with a repository that is not the_repository to ensure appropriate repository is used in repo_parse_commit_internal(). If the wrong repository is used, a fatal error is the commit-graph machinery occurs: fatal: invalid commit position. commit-graph is likely corrupt Since revision.c was the only user of the parse_commit_gently compatibility define, remove it from commit.h. Signed-off-by: Michael Forney <mforney@mforney.org> Acked-by: Derrick Stolee <dstolee@microsoft.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2020-06-24 09:13:04 -07:00
Eric Sunshine	03f2465bb1	worktree: drop get_worktrees() unused 'flags' argument get_worktrees() accepts a 'flags' argument, however, there are no existing flags (the lone flag GWT_SORT_LINKED was recently retired) and no behavior which can be tweaked. Therefore, drop the 'flags' argument. Signed-off-by: Eric Sunshine <sunshine@sunshineco.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2020-06-22 10:31:15 -07:00
Abhishek Kumar	c752ad09c4	commit-graph: minimize commit_graph_data_slab access In an earlier patch, multiple struct acccesses to `graph_pos` and `generation` were auto-converted to multiple method calls. Since the values are fixed and commit-slab access costly, we would be better off with storing the values as a local variable and reusing it. Signed-off-by: Abhishek Kumar <abhishekkumar8222@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2020-06-17 14:37:52 -07:00
Abhishek Kumar	c49c82aa4c	commit: move members graph_pos, generation to a slab We remove members `graph_pos` and `generation` from the struct commit. The default assignments in init_commit_node() are no longer valid, which is fine as the slab helpers return appropriate default values and the assignments are removed. We will replace existing use of commit->generation and commit->graph_pos by commit_graph_data_slab helpers using `contrib/coccinelle/commit.cocci'. Signed-off-by: Abhishek Kumar <abhishekkumar8222@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2020-06-17 14:37:30 -07:00
Junio C Hamano	c3a02824cf	Merge branch 'ds/line-log-on-bloom' "git log -L..." now takes advantage of the "which paths are touched by this commit?" info stored in the commit-graph system. * ds/line-log-on-bloom: line-log: integrate with changed-path Bloom filters line-log: try to use generation number-based topo-ordering line-log: more responsive, incremental 'git log -L' t4211-line-log: add tests for parent oids line-log: remove unused fields from 'struct line_log_data'	2020-06-08 18:06:26 -07:00
SZEDER Gábor	0ee3cb888d	diff.h: drop diff_tree_oid() & friends' return value ll_diff_tree_oid() has only ever returned 0 [1], so it's return value is basically useless. It's only caller diff_tree_oid() has only ever returned the return value of ll_diff_tree_oid() as-is [2], so its return value is just as useless. Most of diff_tree_oid()'s callers simply ignore its return value, except: - diff_root_tree_oid() is a thin wrapper around diff_tree_oid() and returns with its return value, but all of diff_root_tree_oid()'s callers ignore its return value. - rev_compare_tree() and rev_same_tree_as_empty() do look at the return value in a condition, but, since the return value is always 0, the former's < 0 condition is never fulfilled, while the latter's >= 0 condition is always fulfilled. So let's drop the return value of ll_diff_tree_oid(), diff_tree_oid() and diff_root_tree_oid(), and drop those conditions from rev_compare_tree() and rev_same_tree_as_empty() as well. [1] ll_diff_tree_oid() and its ancestors have been returning only 0 ever since it was introduced as diff_tree() in `9174026cfe` (Add "diff-tree" program to show which files have changed between two trees., 2005-04-09). [2] diff_tree_oid() traces back to diff-tree.c:main() in `9174026cfe` as well. Signed-off-by: SZEDER Gábor <szeder.dev@gmail.com> Signed-off-by: Derrick Stolee <dstolee@microsoft.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2020-06-08 12:28:49 -07:00
Derrick Stolee	f32dde8c12	line-log: integrate with changed-path Bloom filters The previous changes to the line-log machinery focused on making the first result appear faster. This was achieved by no longer walking the entire commit history before returning the early results. There is still another way to improve the performance: walk most commits much faster. Let's use the changed-path Bloom filters to reduce time spent computing diffs. Since the line-log computation requires opening blobs and checking the content-diff, there is still a lot of necessary computation that cannot be replaced with changed-path Bloom filters. The part that we can reduce is most effective when checking the history of a file that is deep in several directories and those directories are modified frequently. In this case, the computation to check if a commit is TREESAME to its first parent takes a large fraction of the time. That is ripe for improvement with changed-path Bloom filters. We must ensure that prepare_to_use_bloom_filters() is called in revision.c so that the bloom_filter_settings are loaded into the struct rev_info from the commit-graph. Of course, some cases are still forbidden, but in the line-log case the pathspec is provided in a different way than normal. Since multiple paths and segments could be requested, we compute the struct bloom_key data dynamically during the commit walk. This could likely be improved, but adds code complexity that is not valuable at this time. There are two cases to care about: merge commits and "ordinary" commits. Merge commits have multiple parents, but if we are TREESAME to our first parent in every range, then pass the blame for all ranges to the first parent. Ordinary commits have the same condition, but each is done slightly differently in the process_ranges_[merge\|ordinary]_commit() methods. By checking if the changed-path Bloom filter can guarantee TREESAME, we can avoid that tree-diff cost. If the filter says "probably changed", then we need to run the tree-diff and then the blob-diff if there was a real edit. The Linux kernel repository is a good testing ground for the performance improvements claimed here. There are two different cases to test. The first is the "entire history" case, where we output the entire history to /dev/null to see how long it would take to compute the full line-log history. The second is the "first result" case, where we find how long it takes to show the first value, which is an indicator of how quickly a user would see responses when waiting at a terminal. To test, I selected the paths that were changed most frequently in the top 10,000 commits using this command (stolen from StackOverflow [1]): git log --pretty=format: --name-only -n 10000 \| sort \| \ uniq -c \| sort -rg \| head -10 which results in 121 MAINTAINERS 63 fs/namei.c 60 arch/x86/kvm/cpuid.c 59 fs/io_uring.c 58 arch/x86/kvm/vmx/vmx.c 51 arch/x86/kvm/x86.c 45 arch/x86/kvm/svm.c 42 fs/btrfs/disk-io.c 42 Documentation/scsi/index.rst (along with a bogus first result). It appears that the path arch/x86/kvm/svm.c was renamed, so we ignore that entry. This leaves the following results for the real command time: \| \| Entire History \| First Result \| \| Path \| Before \| After \| Before \| After \| \|------------------------------\|--------\|--------\|--------\|--------\| \| MAINTAINERS \| 4.26 s \| 3.87 s \| 0.41 s \| 0.39 s \| \| fs/namei.c \| 1.99 s \| 0.99 s \| 0.42 s \| 0.21 s \| \| arch/x86/kvm/cpuid.c \| 5.28 s \| 1.12 s \| 0.16 s \| 0.09 s \| \| fs/io_uring.c \| 4.34 s \| 0.99 s \| 0.94 s \| 0.27 s \| \| arch/x86/kvm/vmx/vmx.c \| 5.01 s \| 1.34 s \| 0.21 s \| 0.12 s \| \| arch/x86/kvm/x86.c \| 2.24 s \| 1.18 s \| 0.21 s \| 0.14 s \| \| fs/btrfs/disk-io.c \| 1.82 s \| 1.01 s \| 0.06 s \| 0.05 s \| \| Documentation/scsi/index.rst \| 3.30 s \| 0.89 s \| 1.46 s \| 0.03 s \| It is worth noting that the least speedup comes for the MAINTAINERS file which is * edited frequently, * low in the directory heirarchy, and * quite a large file. All of those points lead to spending more time doing the blob diff and less time doing the tree diff. Still, we see some improvement in that case and significant improvement in other cases. A 2-4x speedup is likely the more typical case as opposed to the small 5% change for that file. Signed-off-by: Derrick Stolee <dstolee@microsoft.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2020-05-11 09:33:56 -07:00
SZEDER Gábor	002933f3fe	line-log: try to use generation number-based topo-ordering The previous patch made it possible to perform line-level filtering during history traversal instead of in an expensive preprocessing step, but it still requires some simpler preprocessing steps, notably topo-ordering. However, nowadays we have commit-graphs storing generation numbers, which make it possible to incrementally traverse the history in topological order, without the preparatory limit_list() and sort_in_topological_order() steps; see `b45424181e` (revision.c: generation-based topo-order algorithm, 2018-11-01). This patch combines the two, so we can do both the topo-ordering and the line-level filtering during history traversal, eliminating even those simpler preprocessing steps, and thus further reducing the delay before showing the first commit modifying the given line range. The 'revs->limited' flag plays the central role in this, because, due to limitations of the current implementation, the generation number-based topo-ordering is only enabled when this flag remains unset. Line-level log, however, always sets this flag in setup_revisions() ever since the feature was introduced in `12da1d1f6f` (Implement line-history search (git log -L), 2013-03-28). The reason for setting 'limited' is unclear, though, because the line-level log itself doesn't directly depend on it, and it doesn't affect how the limit_list() function limits the revision range. However, there is an indirect dependency: the line-level log requires topo-ordering, and the "traditional" sort_in_topological_order() requires an already limited commit list since `e6c3505b44` (Make sure we generate the whole commit list before trying to sort it topologically, 2005-07-06). The new, generation numbers-based topo-ordering doesn't require a limited commit list anymore. So don't set 'revs->limited' for line-level log, unless it is really necessary, namely: - The user explicitly requested parent rewriting, because that is still done in the line_log_filter() preprocessing step (see previous patch), which requires sort_in_topological_order() and in turn limit_list() as well. - A commit-graph file is not available or it doesn't yet contain generation numbers. In these cases we had to fall back on sort_in_topological_order() and in turn limit_list(). The existing condition with generation_numbers_enabled() has already ensured that the 'limited' flag is set in these cases; this patch just makes sure that the line-level log sets 'revs->topo_order' before that condition. While the reduced delay before showing the first commit is measurable in git.git, it takes a bigger repository to make it clearly noticable. In both cases below the line ranges were chosen so that they were modified rather close to the starting revisions, so the effect of this change is most noticable. # git.git $ time git --no-pager log -L:read_alternate_refs:sha1-file.c -1 v2.23.0 Before: real 0m0.107s user 0m0.091s sys 0m0.013s After: real 0m0.058s user 0m0.050s sys 0m0.005s # linux.git $ time git --no-pager log \ -L:build_restore_work_registers:arch/mips/mm/tlbex.c -1 v5.2 Before: real 0m1.129s user 0m1.061s sys 0m0.069s After: real 0m0.096s user 0m0.087s sys 0m0.009s Additional testing by Derrick Stolee: Since this patch improves the performance for the first result, I repeated the experiment from the previous patch on the Linux kernel repository, reporting real time here: Command: git log -L 100,200:MAINTAINERS -n 1 >/dev/null Before: 0.71 s After: 0.05 s Now, we have dropped the full topo-order of all ~910,000 commits before reporting the first result. The remaining performance improvements then are: 1. Update the parent-rewriting logic to be incremental similar to how "git log --graph" behaves. 2. Use changed-path Bloom filters to reduce the time spend in the tree-diff to see if the path(s) changed. Signed-off-by: SZEDER Gábor <szeder.dev@gmail.com> Signed-off-by: Derrick Stolee <dstolee@microsoft.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2020-05-11 09:33:56 -07:00
SZEDER Gábor	3cb9d2b6f9	line-log: more responsive, incremental 'git log -L' The current line-level log implementation performs a preprocessing step in prepare_revision_walk(), during which the line_log_filter() function filters and rewrites history to keep only commits modifying the given line range. This preprocessing affects both responsiveness and correctness: - Git doesn't produce any output during this preprocessing step. Checking whether a commit modified the given line range is somewhat expensive, so depending on the size of the given revision range this preprocessing can result in a significant delay before the first commit is shown. - Limiting the number of displayed commits (e.g. 'git log -3 -L...') doesn't limit the amount of work during preprocessing, because that limit is applied during history traversal. Alas, by that point this expensive preprocessing step has already churned through the whole revision range to find all commits modifying the revision range, even though only a few of them need to be shown. - It rewrites parents, with no way to turn it off. Without the user explicitly requesting parent rewriting any parent object ID shown should be that of the immediate parent, just like in case of a pathspec-limited history traversal without parent rewriting. However, after that preprocessing step rewrote history, the subsequent "regular" history traversal (i.e. get_revision() in a loop) only sees commits modifying the given line range. Consequently, it can only show the object ID of the last ancestor that modified the given line range (which might happen to be the immediate parent, but many-many times it isn't). This patch addresses both the correctness and, at least for the common case, the responsiveness issues by integrating line-level log filtering into the regular revision walking machinery: - Make process_ranges_arbitrary_commit(), the static function in 'line-log.c' deciding whether a commit modifies the given line range, public by removing the static keyword and adding the 'line_log_' prefix, so it can be called from other parts of the revision walking machinery. - If the user didn't explicitly ask for parent rewriting (which, I believe, is the most common case): - Call this now-public function during regular history traversal, namely from get_commit_action() to ignore any commits not modifying the given line range. Note that while this check is relatively expensive, it must be performed before other, much cheaper conditions, because the tracked line range must be adjusted even when the commit will end up being ignored by other conditions. - Skip the line_log_filter() call, i.e. the expensive preprocessing step, in prepare_revision_walk(), because, thanks to the above points, the revision walking machinery is now able to filter out commits not modifying the given line range while traversing history. This way the regular history traversal sees the unmodified history, and is therefore able to print the object ids of the immediate parents of the listed commits. The eliminated preprocessing step can greatly reduce the delay before the first commit is shown, see the numbers below. - However, if the user did explicitly ask for parent rewriting via '--parents' or a similar option, then stick with the current implementation for now, i.e. perform that expensive filtering and history rewriting in the preprocessing step just like we did before, leaving the initial delay as long as it was. I tried to integrate line-level log filtering with parent rewriting into the regular history traversal, but, unfortunately, several subtleties resisted... :) Maybe someday we'll figure out how to do that, but until then at least the simple and common (i.e. without parent rewriting) 'git log -L:func:file' commands can benefit from the reduced delay. This change makes the failing 'parent oids without parent rewriting' test in 't4211-line-log.sh' succeed. The reduced delay is most noticable when there's a commit modifying the line range near the tip of a large-ish revision range: # no parent rewriting requested, no commit-graph present $ time git --no-pager log -L:read_alternate_refs:sha1-file.c -1 v2.23.0 Before: real 0m9.570s user 0m9.494s sys 0m0.076s After: real 0m0.718s user 0m0.674s sys 0m0.044s A significant part of the remaining delay is spent reading and parsing commit objects in limit_list(). With the help of the commit-graph we can eliminate most of that reading and parsing overhead, so here are the timing results of the same command as above, but this time using the commit-graph: Before: real 0m8.874s user 0m8.816s sys 0m0.057s After: real 0m0.107s user 0m0.091s sys 0m0.013s The next patch will further reduce the remaining delay. To be clear: this patch doesn't actually optimize the line-level log, but merely moves most of the work from the preprocessing step to the history traversal, so the commits modifying the line range can be shown as soon as they are processed, and the traversal can be terminated as soon as the given number of commits are shown. Consequently, listing the full history of a line range, potentially all the way to the root commit, will take the same time as before (but at least the user might start reading the output earlier). Furthermore, if the most recent commit modifying the line range is far away from the starting revision, then that initial delay will still be significant. Additional testing by Derrick Stolee: In the Linux kernel repository, the MAINTAINERS file was changed ~3,500 times across the ~915,000 commits. In addition to that edit frequency, the file itself is quite large (~18,700 lines). This means that a significant portion of the computation is taken up by computing the patch-diff of the file. This patch improves the real time it takes to output the first result quite a bit: Command: git log -L 100,200:MAINTAINERS -n 1 >/dev/null Before: 3.88 s After: 0.71 s If we drop the "-n 1" in the command, then there is no change in end-to-end process time. This is because the command still needs to walk the entire commit history, which negates the point of this patch. This is expected. As a note for future reference, the ~4.3 seconds in the old code spends ~2.6 seconds computing the patch-diffs, and the rest of the time is spent walking commits and computing diffs for which paths changed at each commit. The changed-path Bloom filters could improve the end-to-end computation time (i.e. no "-n 1" in the command). Signed-off-by: SZEDER Gábor <szeder.dev@gmail.com> Signed-off-by: Derrick Stolee <dstolee@microsoft.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2020-05-11 09:33:56 -07:00
Junio C Hamano	6d56d4c7dc	Merge branch 'ds/blame-on-bloom' "git blame" learns to take advantage of the "changed-paths" Bloom filter stored in the commit-graph file. * ds/blame-on-bloom: test-bloom: check that we have expected arguments test-bloom: fix some whitespace issues blame: drop unused parameter from maybe_changed_path blame: use changed-path Bloom filters tests: write commit-graph with Bloom filters revision: complicated pathspecs disable filters	2020-05-01 13:39:54 -07:00
Junio C Hamano	9b6606f43d	Merge branch 'gs/commit-graph-path-filter' Introduce an extension to the commit-graph to make it efficient to check for the paths that were modified at each commit using Bloom filters. * gs/commit-graph-path-filter: bloom: ignore renames when computing changed paths commit-graph: add GIT_TEST_COMMIT_GRAPH_CHANGED_PATHS test flag t4216: add end to end tests for git log with Bloom filters revision.c: add trace2 stats around Bloom filter usage revision.c: use Bloom filters to speed up path based revision walks commit-graph: add --changed-paths option to write subcommand commit-graph: reuse existing Bloom filters during write commit-graph: write Bloom filters to commit graph file commit-graph: examine commits by generation number commit-graph: examine changed-path objects in pack order commit-graph: compute Bloom filters for changed paths diff: halt tree-diff early after max_changes bloom.c: core Bloom filter implementation for changed paths. bloom.c: introduce core Bloom filter constructs bloom.c: add the murmur3 hash implementation commit-graph: define and use MAX_NUM_CHUNKS	2020-05-01 13:39:53 -07:00
Junio C Hamano	9af3a7cb4d	Merge branch 'ds/revision-show-pulls' "git log" learned "--show-pulls" that helps pathspec limited history views; a merge commit that takes the whole change from a side branch, which is normally omitted from the output, is shown in addition to the commits that introduce real changes. * ds/revision-show-pulls: revision: --show-pulls adds helpful merges	2020-04-22 13:42:57 -07:00
Derrick Stolee	8918e379aa	revision: complicated pathspecs disable filters The changed-path Bloom filters work only when we can compute an explicit Bloom filter key in advance. When a pathspec is given that allows case-insensitive checks or wildcard matching, we must disable the Bloom filter performance checks. By checking the pathspec in prepare_to_use_bloom_filters(), we avoid setting up the Bloom filter data and thus revert to the usual logic. Before this change, the following tests would fail: t6004-rev-list-path-optim.sh (Tests 6-7) t6130-pathspec-noglob.sh (Tests 3-6) t6131-pathspec-icase.sh (Tests 3-5) These tests would fail when using GIT_TEST_COMMIT_GRAPH and GIT_TEST_COMMIT_GRAPH_BLOOM_FILTERS except that the latter environment variable was not set up correctly to write the changed- path Bloom filters in the test suite. That will be fixed in the next change. Helped-by: Taylor Blau <me@ttaylorr.com> Signed-off-by: Derrick Stolee <dstolee@microsoft.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2020-04-16 15:38:02 -07:00
Derrick Stolee	8d049e182e	revision: --show-pulls adds helpful merges The default file history simplification of "git log -- <path>" or "git rev-list -- <path>" focuses on providing the smallest set of commits that first contributed a change. The revision walk greatly restricts the set of walked commits by visiting only the first TREESAME parent of a merge commit, when one exists. This means that portions of the commit-graph are not walked, which can be a performance benefit, but can also "hide" commits that added changes but were ignored by a merge resolution. The --full-history option modifies this by walking all commits and reporting a merge commit as "interesting" if it has _any_ parent that is not TREESAME. This tends to be an over-representation of important commits, especially in an environment where most merge commits are created by pull request completion. Suppose we have a commit A and we create a commit B on top that changes our file. When we merge the pull request, we create a merge commit M. If no one else changed the file in the first-parent history between M and A, then M will not be TREESAME to its first parent, but will be TREESAME to B. Thus, the simplified history will be "B". However, M will appear in the --full-history mode. However, suppose that a number of topics T1, T2, ..., Tn were created based on commits C1, C2, ..., Cn between A and M as follows: A----C1----C2--- ... ---Cn----M------P1---P2--- ... ---Pn \ \ \ \ / / / / \ \__.. \ \/ ..__T1 / Tn \ \__.. /\ ..__T2 / \_____________________B \____________________/ If the commits T1, T2, ... Tn did not change the file, then all of P1 through Pn will be TREESAME to their first parent, but not TREESAME to their second. This means that all of those merge commits appear in the --full-history view, with edges that immediately collapse into the lower history without introducing interesting single-parent commits. The --simplify-merges option was introduced to remove these extra merge commits. By noticing that the rewritten parents are reachable from their first parents, those edges can be simplified away. Finally, the commits now look like single-parent commits that are TREESAME to their "only" parent. Thus, they are removed and this issue does not cause issues anymore. However, this also ends up removing the commit M from the history view! Even worse, the --simplify-merges option requires walking the entire history before returning a single result. Many Git users are using Git alongside a Git service that provides code storage alongside a code review tool commonly called "Pull Requests" or "Merge Requests" against a target branch. When these requests are accepted and merged, they typically create a merge commit whose first parent is the previous branch tip and the second parent is the tip of the topic branch used for the request. This presents a valuable order to the parents, but also makes that merge commit slightly special. Users may want to see not only which commits changed a file, but which pull requests merged those commits into their branch. In the previous example, this would mean the users want to see the merge commit "M" in addition to the single- parent commit "C". Users are even more likely to want these merge commits when they use pull requests to merge into a feature branch before merging that feature branch into their trunk. In some sense, users are asking for the "first" merge commit to bring in the change to their branch. As long as the parent order is consistent, this can be handled with the following rule: Include a merge commit if it is not TREESAME to its first parent, but is TREESAME to a later parent. These merges look like the merge commits that would result from running "git pull <topic>" on a main branch. Thus, the option to show these commits is called "--show-pulls". This has the added benefit of showing the commits created by closing a pull request or merge request on any of the Git hosting and code review platforms. To test these options, extend the standard test example to include a merge commit that is not TREESAME to its first parent. It is surprising that that option was not already in the example, as it is instructive. In particular, this extension demonstrates a common issue with file history simplification. When a user resolves a merge conflict using "-Xours" or otherwise ignoring one side of the conflict, they create a TREESAME edge that probably should not be TREESAME. This leads users to become frustrated and complain that "my change disappeared!" In my experience, showing them history with --full-history and --simplify-merges quickly reveals the problematic merge. As mentioned, this option is expensive to compute. The --show-pulls option _might_ show the merge commit (usually titled "resolving conflicts") more quickly. Of course, this depends on the user having the correct parent order, which is backwards when using "git pull master" from a topic branch. There are some special considerations when combining the --show-pulls option with --simplify-merges. This requires adding a new PULL_MERGE object flag to store the information from the initial TREESAME comparisons. This helps avoid dropping those commits in later filters. This is covered by a test, including how the parents can be simplified. Since "struct object" has already ruined its 32-bit alignment by using 33 bits across parsed, type, and flags member, let's not make it worse. PULL_MERGE is used in revision.c with the same value (1u<<15) as REACHABLE in commit-graph.c. The REACHABLE flag is only used when writing a commit-graph file, and a revision walk using --show-pulls does not happen in the same process. Care must be taken in the future to ensure this remains the case. Update Documentation/rev-list-options.txt with significant details around this option. This requires updating the example in the History Simplification section to demonstrate some of the problems with TREESAME second parents. Signed-off-by: Derrick Stolee <dstolee@microsoft.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2020-04-10 09:58:55 -07:00
Emma Brooks	19d097e3d7	format-patch: teach --no-encode-email-headers When commit subjects or authors have non-ASCII characters, git format-patch Q-encodes them so they can be safely sent over email. However, if the patch transfer method is something other than email (web review tools, sneakernet), this only serves to make the patch metadata harder to read without first applying it (unless you can decode RFC 2047 in your head). git am as well as some email software supports non-Q-encoded mail as described in RFC 6531. Add --[no-]encode-email-headers and format.encodeEmailHeaders to let the user control this behavior. Signed-off-by: Emma Brooks <me@pluvano.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2020-04-07 22:37:18 -07:00
Garima Singh	42e50e78c6	revision.c: add trace2 stats around Bloom filter usage Add trace2 statistics around Bloom filter usage and behavior for 'git log -- path' commands that are hoping to benefit from the presence of computed changed paths Bloom filters. These statistics are great for performance analysis work and for formal testing, which we will see in the commit following this one. Helped-by: Derrick Stolee <dstolee@microsoft.com Helped-by: SZEDER Gábor <szeder.dev@gmail.com> Helped-by: Jonathan Tan <jonathantanmy@google.com> Signed-off-by: Garima Singh <garima.singh@microsoft.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2020-04-06 11:08:37 -07:00
Garima Singh	a56b9464cd	revision.c: use Bloom filters to speed up path based revision walks Revision walk will now use Bloom filters for commits to speed up revision walks for a particular path (for computing history for that path), if they are present in the commit-graph file. We load the Bloom filters during the prepare_revision_walk step, currently only when dealing with a single pathspec. Extending it to work with multiple pathspecs can be explored and built on top of this series in the future. While comparing trees in rev_compare_trees(), if the Bloom filter says that the file is not different between the two trees, we don't need to compute the expensive diff. This is where we get our performance gains. The other response of the Bloom filter is '`:maybe', in which case we fall back to the full diff calculation to determine if the path was changed in the commit. We do not try to use Bloom filters when the '--walk-reflogs' option is specified. The '--walk-reflogs' option does not walk the commit ancestry chain like the rest of the options. Incorporating the performance gains when walking reflog entries would add more complexity, and can be explored in a later series. Performance Gains: We tested the performance of `git log -- <path>` on the git repo, the linux and some internal large repos, with a variety of paths of varying depths. On the git and linux repos: - we observed a 2x to 5x speed up. On a large internal repo with files seated 6-10 levels deep in the tree: - we observed 10x to 20x speed ups, with some paths going up to 28 times faster. Helped-by: Derrick Stolee <dstolee@microsoft.com Helped-by: SZEDER Gábor <szeder.dev@gmail.com> Helped-by: Jonathan Tan <jonathantanmy@google.com> Signed-off-by: Garima Singh <garima.singh@microsoft.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2020-04-06 11:08:37 -07:00
Junio C Hamano	17066bea38	Merge branch 'dl/format-patch-notes-config-fixup' "git format-patch" can take a set of configured format.notes values to specify which notes refs to use in the log message part of the output. The behaviour of this was not consistent with multiple --notes command line options, which has been corrected. * dl/format-patch-notes-config-fixup: notes.h: fix typos in comment notes: break set_display_notes() into smaller functions config/format.txt: clarify behavior of multiple format.notes format-patch: move git_config() before repo_init_revisions() format-patch: use --notes behavior for format.notes notes: extract logic into set_display_notes() notes: create init_display_notes() helper notes: rename to load_display_notes()	2019-12-25 11:21:58 -08:00
Denton Liu	1d7297513d	notes: break set_display_notes() into smaller functions In `8164c961e1` (format-patch: use --notes behavior for format.notes, 2019-12-09), we introduced set_display_notes() which was a monolithic function with three mutually exclusive branches. Break the function up into three small and simple functions that each are only responsible for one task. This family of functions accepts an `int show_notes` instead of returning a value suitable for assignment to `show_notes`. This is for two reasons. First of all, this guarantees that the external `show_notes` variable changes in lockstep with the `struct display_notes_opt`. Second, this prompts future developers to be careful about doing something meaningful with this value. In fact, a NULL check is intentionally omitted because causing a segfault here would tell the future developer that they are meant to use the value for something meaningful. One alternative was making the family of functions accept a `struct rev_info ` instead of the `struct display_notes_opt *`, since the former contains the `show_notes` field as well. This does not work because we have to call git_config() before repo_init_revisions(). However, if we had a `struct rev_info`, we'd need to initialize it before it gets assigned values from git_config(). As a result, we break the circular dependency by having standalone `int show_notes` and `struct display_notes_opt notes_opt` variables which temporarily hold values from git_config() until the values are copied over to `rev`. To implement this change, we need to get a pointer to `rev_info::show_notes`. Unfortunately, this is not possible with bitfields and only direct-assignment is possible. Change `rev_info::show_notes` to a non-bitfield int so that we can get its address. Signed-off-by: Denton Liu <liu.denton@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2019-12-13 11:07:15 -08:00
Junio C Hamano	d37cfe3b5c	Merge branch 'dl/pretty-reference' "git log" family learned "--pretty=reference" that gives the name of a commit in the format that is often used to refer to it in log messages. * dl/pretty-reference: SubmittingPatches: use `--pretty=reference` pretty: implement 'reference' format pretty: add struct cmt_fmt_map::default_date_mode_type pretty: provide short date format t4205: cover `git log --reflog -z` blindspot pretty.c: inline initalize format_context revision: make get_revision_mark() return const pointer completion: complete `tformat:` pretty format SubmittingPatches: remove dq from commit reference pretty-formats.txt: use generic terms for hash SubmittingPatches: use generic terms for hash	2019-12-10 13:11:43 -08:00
Denton Liu	452538c358	notes: extract logic into set_display_notes() Instead of open coding the logic that tweaks the variables in `struct display_notes_opt` within handle_revision_opt(), abstract away the logic into set_display_notes() so that it can be reused. Signed-off-by: Denton Liu <liu.denton@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2019-12-09 13:36:45 -08:00
Denton Liu	e6e230eeae	notes: create init_display_notes() helper We currently open code the initialization for revs->notes_opt. Abstract this away into a helper function so that the logic can be reused in a future commit. This is slightly wasteful as we memset the struct twice but this is only run once so it shouldn't have any major effect. Signed-off-by: Denton Liu <liu.denton@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2019-12-09 13:36:44 -08:00
Junio C Hamano	fd952307ec	Merge branch 'mh/clear-topo-walk-upon-reset' The revision walking machinery uses resources like per-object flag bits that need to be reset before a new iteration of walking begins, but the resources related to topological walk were not cleared correctly, which has been corrected. * mh/clear-topo-walk-upon-reset: revision: free topo_walk_info before creating a new one in init_topo_walk revision: clear the topo-walk flags in reset_revision_walk	2019-12-05 12:52:48 -08:00
Mike Hommey	0aa0c2b2ec	revision: free topo_walk_info before creating a new one in init_topo_walk init_topo_walk doesn't reuse an existing topo_walk_info, and currently leaks the one that might exist on the current rev_info if it was already used for a topo walk beforehand. Signed-off-by: Mike Hommey <mh@glandium.org> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2019-11-25 11:48:48 +09:00
Mike Hommey	ffa1f28fea	revision: clear the topo-walk flags in reset_revision_walk Not doing so can lead to wrong topo-walks when using the revision walk API consecutively. Signed-off-by: Mike Hommey <mh@glandium.org> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2019-11-25 11:48:47 +09:00
Denton Liu	4982516451	revision: make get_revision_mark() return const pointer get_revision_mark() used to return a `char `, even though all of the strings it was returning were string literals. Make get_revision_mark() return a `const char ` so that callers won't be tempted to modify the returned string. Signed-off-by: Denton Liu <liu.denton@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2019-11-20 13:33:36 +09:00
Junio C Hamano	5efabc7ed9	Merge branch 'ew/hashmap' Code clean-up of the hashmap API, both users and implementation. * ew/hashmap: hashmap_entry: remove first member requirement from docs hashmap: remove type arg from hashmap_{get,put,remove}_entry OFFSETOF_VAR macro to simplify hashmap iterators hashmap: introduce hashmap_free_entries hashmap: hashmap_{put,remove} return hashmap_entry * hashmap: use _entry APIs for iteration hashmap_cmp_fn takes hashmap_entry params hashmap_get{,_from_hash} return "struct hashmap_entry " hashmap: use _entry APIs to wrap container_of hashmap_get_next returns "struct hashmap_entry " introduce container_of macro hashmap_put takes "struct hashmap_entry " hashmap_remove takes "const struct hashmap_entry " hashmap_get takes "const struct hashmap_entry " hashmap_add takes "struct hashmap_entry " hashmap_get_next takes "const struct hashmap_entry " hashmap_entry_init takes "struct hashmap_entry " packfile: use hashmap_entry in delta_base_cache_entry coccicheck: detect hashmap_entry.hash assignment diff: use hashmap_entry_init on moved_entry.ent	2019-10-15 13:48:02 +09:00
Junio C Hamano	a73f91774c	Merge branch 'ab/pcre-jit-fixes' A few simplification and bugfixes to PCRE interface. * ab/pcre-jit-fixes: grep: under --debug, show whether PCRE JIT is enabled grep: do not enter PCRE2_UTF mode on fixed matching grep: stess test PCRE v2 on invalid UTF-8 data grep: create a "is_fixed" member in "grep_pat" grep: consistently use "p->fixed" in compile_regexp() grep: stop using a custom JIT stack with PCRE v1 grep: stop "using" a custom JIT stack with PCRE v2 grep: remove overly paranoid BUG(...) code grep: use PCRE v2 for optimized fixed-string search grep: remove the kwset optimization grep: drop support for \0 in --fixed-strings <pattern> grep: make the behavior for NUL-byte in patterns sane grep tests: move binary pattern tests into their own file grep tests: move "grep binary" alongside the rest grep: inline the return value of a function call used only once t4210: skip more command-line encoding tests on MinGW grep: don't use PCRE2?_UTF8 with "log --encoding=<non-utf8>" log tests: test regex backends in "--encode=<enc>" tests	2019-10-11 14:24:47 +09:00
Junio C Hamano	ed6822896b	Merge branch 'rs/simplify-by-deco-with-deco-refs-exclude' "git log --decorate-refs-exclude=<pattern>" was incorrectly overruled when the "--simplify-by-decoration" option is used, which has been corrected. * rs/simplify-by-deco-with-deco-refs-exclude: log-tree: call load_ref_decorations() in get_name_decoration() log: test --decorate-refs-exclude with --simplify-by-decoration	2019-10-07 11:32:54 +09:00
Eric Wong	404ab78e39	hashmap: remove type arg from hashmap_{get,put,remove}_entry Since these macros already take a `keyvar' pointer of a known type, we can rely on OFFSETOF_VAR to get the correct offset without relying on non-portable `__typeof__' and `offsetof'. Argument order is also rearranged, so `keyvar' and `member' are sequential as they are used as: `keyvar->member' Signed-off-by: Eric Wong <e@80x24.org> Reviewed-by: Derrick Stolee <stolee@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2019-10-07 10:20:12 +09:00
Eric Wong	23dee69f53	OFFSETOF_VAR macro to simplify hashmap iterators While we cannot rely on a `__typeof__' operator being portable to use with `offsetof'; we can calculate the pointer offset using an existing pointer and the address of a member using pointer arithmetic for compilers without `__typeof__'. This allows us to simplify usage of hashmap iterator macros by not having to specify a type when a pointer of that type is already given. In the future, list iterator macros (e.g. list_for_each_entry) may also be implemented using OFFSETOF_VAR to save hackers the trouble of using container_of/list_entry macros and without relying on non-portable `__typeof__'. v3: use `__typeof__' to avoid clang warnings Signed-off-by: Eric Wong <e@80x24.org> Reviewed-by: Derrick Stolee <stolee@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2019-10-07 10:20:11 +09:00
Eric Wong	c8e424c9c9	hashmap: introduce hashmap_free_entries `hashmap_free_entries' behaves like `container_of' and passes the offset of the hashmap_entry struct to the internal `hashmap_free_' function, allowing the function to free any struct pointer regardless of where the hashmap_entry field is located. `hashmap_free' no longer takes any arguments aside from the hashmap itself. Signed-off-by: Eric Wong <e@80x24.org> Reviewed-by: Derrick Stolee <stolee@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2019-10-07 10:20:11 +09:00
Eric Wong	87571c3f71	hashmap: use *_entry APIs for iteration Inspired by list_for_each_entry in the Linux kernel. Once again, these are somewhat compromised usability-wise by compilers lacking __typeof__ support. Signed-off-by: Eric Wong <e@80x24.org> Reviewed-by: Derrick Stolee <stolee@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2019-10-07 10:20:11 +09:00
Eric Wong	939af16eac	hashmap_cmp_fn takes hashmap_entry params Another step in eliminating the requirement of hashmap_entry being the first member of a struct. Signed-off-by: Eric Wong <e@80x24.org> Reviewed-by: Derrick Stolee <stolee@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2019-10-07 10:20:11 +09:00
Eric Wong	f23a465132	hashmap_get{,_from_hash} return "struct hashmap_entry *" Update callers to use hashmap_get_entry, hashmap_get_entry_from_hash or container_of as appropriate. This is another step towards eliminating the requirement of hashmap_entry being the first field in a struct. Signed-off-by: Eric Wong <e@80x24.org> Reviewed-by: Derrick Stolee <stolee@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2019-10-07 10:20:11 +09:00
Eric Wong	26b455f21e	hashmap_put takes "struct hashmap_entry " This is less error-prone than "void " as the compiler now detects invalid types being passed. Signed-off-by: Eric Wong <e@80x24.org> Reviewed-by: Derrick Stolee <stolee@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2019-10-07 10:20:10 +09:00
Eric Wong	b6c5241606	hashmap_get takes "const struct hashmap_entry " This is less error-prone than "const void " as the compiler now detects invalid types being passed. Signed-off-by: Eric Wong <e@80x24.org> Reviewed-by: Derrick Stolee <stolee@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2019-10-07 10:20:10 +09:00
Eric Wong	d22245a2e3	hashmap_entry_init takes "struct hashmap_entry " C compilers do type checking to make life easier for us. So rely on that and update all hashmap_entry_init callers to take "struct hashmap_entry " to avoid future bugs while improving safety and readability. Signed-off-by: Eric Wong <e@80x24.org> Reviewed-by: Derrick Stolee <stolee@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2019-10-07 10:20:09 +09:00
Junio C Hamano	cf861cd7a0	Merge branch 'rs/get-tagged-oid' Code cleanup. * rs/get-tagged-oid: use get_tagged_oid() tag: factor out get_tagged_oid()	2019-09-30 13:19:29 +09:00
René Scharfe	0cc7380d88	log-tree: call load_ref_decorations() in get_name_decoration() Load a default set of ref name decorations at the first lookup. This frees direct and indirect callers from doing so. They can still do it if they want to use a filter or are interested in full decorations instead of the default short ones -- the first load_ref_decorations() call wins. This means that the load in builtin/log.c::cmd_log_init_finish() is respected even if --simplify-by-decoration is given, as the previously dominating earlier load in handle_revision_opt() is gone. So a filter given with --decorate-refs-exclude is used for simplification in that case, as expected. Signed-off-by: René Scharfe <l.s.r@web.de> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2019-09-09 11:16:40 -07:00
René Scharfe	dad3f0607b	tag: factor out get_tagged_oid() Add a function for accessing the ID of the object referenced by a tag safely, i.e. without causing a segfault when encountering a broken tag where ->tagged is NULL. Signed-off-by: René Scharfe <l.s.r@web.de> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2019-09-05 14:10:18 -07:00
Jeff King	19e8789b23	revision: allow --end-of-options to end option parsing There's currently no robust way to tell Git that a particular option is meant to be a revision, and not an option. So if you have a branch "refs/heads/--foo", you cannot just say: git rev-list --foo You can say: git rev-list refs/heads/--foo But that breaks down if you don't know the refname, and in particular if you're a script passing along a value from elsewhere. In most programs, you can use "--" to end option parsing, like this: some-prog -- "$revision" But that doesn't work for the revision parser, because "--" is already meaningful there: it separates revisions from pathspecs. So we need some other marker to separate options from revisions. This patch introduces "--end-of-options", which serves that purpose: git rev-list --oneline --end-of-options "$revision" will work regardless of what's in "$revision" (well, if you say "--" it may fail, but it won't do something dangerous, like triggering an unexpected option). The name is verbose, but that's probably a good thing; this is meant to be used for scripted invocations where readability is more important than terseness. One alternative would be to introduce an explicit option to mark a revision, like: git rev-list --oneline --revision="$revision" That's slightly _more_ informative than this commit (because it makes even something silly like "--" unambiguous). But the pattern of using a separator like "--" is well established in git and in other commands, and it makes some scripting tasks simpler like: git rev-list --end-of-options "$@" There's no documentation in this patch, because it will make sense to describe the feature once it is available everywhere (and support will be added in further patches). Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2019-08-06 13:05:39 -07:00
Junio C Hamano	68e65ded5b	Merge branch 'jk/check-connected-with-alternates' The tips of refs from the alternate object store can be used as starting point for reachability computation now. * jk/check-connected-with-alternates: check_everything_connected: assume alternate ref tips are valid object-store.h: move for_each_alternate_ref() from transport.h	2019-07-19 11:30:21 -07:00
Jeff King	39b44ba771	check_everything_connected: assume alternate ref tips are valid When we receive a remote ref update to sha1 "X", we want to check that we have all of the objects needed by "X". We can assume that our repository is not currently corrupted, and therefore if we have a ref pointing at "Y", we have all of its objects. So we can stop our traversal from "X" as soon as we hit "Y". If we make the same non-corruption assumption about any repositories we use to store alternates, then we can also use their ref tips to shorten the traversal. This is especially useful when cloning with "--reference", as we otherwise do not have any local refs to check against, and have to traverse the whole history, even though the other side may have sent us few or no objects. Here are results for the included perf test (which shows off more or less the maximal savings, getting one new commit and sharing the whole history): Test HEAD^ HEAD -------------------------------------------------------------------- [on git.git] 5600.3: clone --reference 2.94(2.86+0.08) 0.09(0.08+0.01) -96.9% [on linux.git] 5600.3: clone --reference 45.74(45.34+0.41) 0.36(0.30+0.08) -99.2% Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2019-07-01 10:11:09 -07:00
Ævar Arnfjörð Bjarmason	44570188a0	grep: don't use PCRE2?_UTF8 with "log --encoding=<non-utf8>" Fix a bug introduced in `18547aacf5` ("grep/pcre: support utf-8", 2016-06-25) that was missed due to a blindspot in our tests, as discussed in the previous commit. I then blindly copied the same bug in `94da9193a6` ("grep: add support for PCRE v2", 2017-06-01) when adding the PCRE v2 code. We should not tell PCRE that we're processing UTF-8 just because we're dealing with non-ASCII. In the case of e.g. "log --encoding=<...>" under is_utf8_locale() the haystack might be in ISO-8859-1, and the needle might be in a non-UTF-8 encoding. Maybe we should be more strict here and die earlier? Should we also be converting the needle to the encoding in question, and failing if it's not a string that's valid in that encoding? Maybe. But for now matching this as non-UTF8 at least has some hope of producing sensible results, since we know that our default heuristic of assuming the text to be matched is in the user locale encoding isn't true when we've explicitly encoded it to be in a different encoding. Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2019-06-28 09:11:09 -07:00
Derrick Stolee	1d8e31a3f6	revision: keep topo-walk free of unintersting commits When updating the topo-order walk in `b454241` (revision.c: generation-based topo-order algorithm, 2018-11-01), the logic was a huge rewrite of the walk logic. In that massive change, we accidentally included the UNINTERESTING commits in expand_topo_walk(). This means that a simple query like git rev-list --topo-order HEAD~1..HEAD will expand the topo walk for all commits reachable from HEAD, and not just one commit. This change should speed up these cases, but there is still a need for corrected commit-date for some A..B queries. Signed-off-by: Derrick Stolee <dstolee@microsoft.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2019-05-28 10:39:49 -07:00
Derrick Stolee	1b4d8827ff	revision: use generation for A..B --topo-order queries If a commit-graph exists with computed generation numbers, then a 'git rev-list --topo-order -n <N> <rev>' query will use those generation numbers to reduce the number of commits walked before writing N commits. One caveat put in `b454241` (revision.c: generation-based topo-order algorithm, 2018-11-01) was to not enable the new algorithm for queries with a revision range "A..B". The logic was placed to walk from "A" and mark those commits as uninteresting, but the performance was actually worse than the existing logic in some cases. The root cause of this performance degradation is that generation numbers _increase_ the number of commits we walk relative to the existing heuristic of walking by commit date. While generation numbers actually guarantee that the algorithm is correct, the existing logic is very rarely wrong and that added requirement is not worth the cost. This motivates the planned "corrected commit date" to replace generation numbers in a future version of Git. The current change enables the logic to use whatever reachability index is currently in the commit-graph (generation numbers or corrected commit date). The limited flag in struct rev_info forces a full walk of the commit history (after discovering the A..B range). Previosuly, it is enabled whenever we see an uninteresting commit. We prevent enabling the parameter when we are planning to use the reachability index for a topo-order. Signed-off-by: Derrick Stolee <dstolee@microsoft.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2019-05-28 10:39:47 -07:00
Junio C Hamano	d9d65e9f6a	Merge branch 'jk/revision-rewritten-parents-in-prio-queue' Performance fix for "rev-list --parents -- pathspec". * jk/revision-rewritten-parents-in-prio-queue: revision: use a prio_queue to hold rewritten parents	2019-04-25 16:41:18 +09:00
Junio C Hamano	4284497396	Merge branch 'jk/unused-params-even-more' Code cleanup. * jk/unused-params-even-more: parse_opt_ref_sorting: always use with NONEG flag pretty: drop unused strbuf from parse_padding_placeholder() pretty: drop unused "type" parameter in needs_rfc2047_encoding() parse-options: drop unused ctx parameter from show_gitcomp() fetch_pack(): drop unused parameters report_path_error(): drop unused prefix parameter unpack-trees: drop unused error_type parameters unpack-trees: drop name_entry from traverse_by_cache_tree() test-date: drop unused "now" parameter from parse_dates() update-index: drop unused prefix_length parameter from do_reupdate() log: drop unused "len" from show_tagger() log: drop unused rev_info from early output revision: drop some unused "revs" parameters	2019-04-25 16:41:12 +09:00
Junio C Hamano	31df2c1019	Merge branch 'jk/line-log-with-patch' "git log -L<from>,<to>:<path>" with "-s" did not suppress the patch output as it should. This has been corrected. * jk/line-log-with-patch: line-log: detect unsupported formats line-log: suppress diff output with "-s"	2019-04-10 02:14:23 +09:00
Jeff King	8320b1dbe7	revision: use a prio_queue to hold rewritten parents This patch fixes a quadratic list insertion in rewrite_one() when pathspec limiting is combined with --parents. What happens is something like this: 1. We see that some commit X touches the path, so we try to rewrite its parents. 2. rewrite_one() loops forever, rewriting parents, until it finds a relevant parent (or hits the root and decides there are none). The heavy lifting is done by process_parent(), which uses try_to_simplify_commit() to drop parents. 3. process_parent() puts any intermediate parents into the &revs->commits list, inserting by commit date as usual. So if commit X is recent, and then there's a large chunk of history that doesn't touch the path, we may add a lot of commits to &revs->commits. And insertion by commit date is O(n) in the worst case, making the whole thing quadratic. We tried to deal with this long ago in `fce87ae538` (Fix quadratic performance in rewrite_one., 2008-07-12). In that scheme, we cache the oldest commit in the list; if the new commit to be added is older, we can start our linear traversal there. This often works well in practice because parents are older than their descendants, and thus we tend to add older and older commits as we traverse. But this isn't guaranteed, and in fact there's a simple case where it is not: merges. Imagine we look at the first parent of a merge and see a very old commit (let's say 3 years old). And on the second parent, as we go back 3 years in history, we might have many commits. That one first-parent commit has polluted our oldest-commit cache; it will remain the oldest while we traverse a huge chunk of history, during which we have to fall back to the slow, linear method of adding to the list. Naively, one might imagine that instead of caching the oldest commit, we'd start at the last-added one. But that just makes some cases faster while making others slower (and indeed, while it made a real-world test case much faster, it does quite poorly in the perf test include here). Fundamentally, these are just heuristics; our worst case is still quadratic, and some cases will approach that. Instead, let's use a data structure with better worst-case performance. Swapping out revs->commits for something else would have repercussions all over the code base, but we can take advantage of one fact: for the rewrite_one() case, nobody actually needs to see those commits in revs->commits until we've finished generating the whole list. That leaves us with two obvious options: 1. We can generate the list _unordered_, which should be O(n), and then sort it afterwards, which would be O(n log n) total. This is "sort-after" below. 2. We can insert the commits into a separate data structure, like a priority queue. This is "prio-queue" below. I expected that sort-after would be the fastest (since it saves us the extra step of copying the items into the linked list), but surprisingly the prio-queue seems to be a bit faster. Here are timings for the new p0001.6 for all three techniques across a few repositories, as compared to master: master cache-last sort-after prio-queue -------------------------------------------------------------------------------------------- GIT_PERF_REPO=git.git 0.52(0.50+0.02) 0.53(0.51+0.02) +1.9% 0.37(0.33+0.03) -28.8% 0.37(0.32+0.04) -28.8% GIT_PERF_REPO=linux.git 20.81(20.74+0.07) 20.31(20.24+0.07) -2.4% 0.94(0.86+0.07) -95.5% 0.91(0.82+0.09) -95.6% GIT_PERF_REPO=llvm-project.git 83.67(83.57+0.09) 4.23(4.15+0.08) -94.9% 3.21(3.15+0.06) -96.2% 2.98(2.91+0.07) -96.4% A few items to note: - the cache-list tweak does improve the bad case for llvm-project.git that started my digging into this problem. But it performs terribly on linux.git, barely helping at all. - the sort-after and prio-queue techniques work well. They approach the timing for running without --parents at all, which is what you'd expect (see below for more data). - prio-queue just barely outperforms sort-after. As I said, I'm not really sure why this is the case, but it is. You can see it even more prominently in this real-world case on llvm-project.git: git rev-list --parents 07ef786652e7 -- llvm/test/CodeGen/Generic/bswap.ll where prio-queue routinely outperforms sort-after by about 7%. One guess is that the prio-queue may just be more efficient because it uses a compact array. There are three new perf tests: - "rev-list --parents" gives us a baseline for running with --parents. This isn't sped up meaningfully here, because the bad case is triggered only with simplification. But it's good to make sure we don't screw it up (now, or in the future). - "rev-list -- dummy" gives us a baseline for just traversing with pathspec limiting. This gives a lower bound for the next test (and it's also a good thing for us to be checking in general for regressions, since we don't seem to have any existing tests). - "rev-list --parents -- dummy" shows off the problem (and our fix) Here are the timings for those three on llvm-project.git, before and after the fix: Test master prio-queue ------------------------------------------------------------------------------ 0001.3: rev-list --parents 2.24(2.12+0.12) 2.22(2.11+0.11) -0.9% 0001.5: rev-list -- dummy 2.89(2.82+0.07) 2.92(2.89+0.03) +1.0% 0001.6: rev-list --parents -- dummy 83.67(83.57+0.09) 2.98(2.91+0.07) -96.4% Changes in the first two are basically noise, and you can see we approach our lower bound in the final one. Note that we can't fully get rid of the list argument from process_parents(). Other callers do have lists, and it would be hard to convert them. They also don't seem to have this problem (probably because they actually remove items from the list as they loop, meaning it doesn't grow so large in the first place). So this basically just drops the "cache_ptr" parameter (which was used only by the one caller we're fixing here) and replaces it with a prio_queue. Callers are free to use either data structure, depending on what they're prepared to handle. Reported-by: Björn Pettersson A <bjorn.a.pettersson@ericsson.com> Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2019-04-04 18:21:54 +09:00
Jeff King	9163399535	revision: drop some unused "revs" parameters There are several internal helpers that take a rev_info struct but don't actually look at it. While one could argue that all helpers in revision.c should take a rev_info struct for consistency, dropping the unused parameter makes it clear that they don't actually depend on any other rev options. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2019-03-20 18:34:08 +09:00
Jeff King	05314efaea	line-log: detect unsupported formats If you use "log -L" with an output format like "--raw" or "--stat", we'll silently ignore the format and just output the normal patch. Let's detect and complain about this, which at least tells the user what's going on. The tests here aren't exhaustive over the set of all formats, but it should at least let us know if somebody breaks the format-checking. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2019-03-11 16:31:07 +09:00
Junio C Hamano	c425d361f5	Merge branch 'en/combined-all-paths' Output from "diff --cc" did not show the original paths when the merge involved renames. A new option adds the paths in the original trees to the output. * en/combined-all-paths: log,diff-tree: add --combined-all-paths option	2019-03-07 09:59:54 +09:00
Elijah Newren	d76ce4f734	log,diff-tree: add --combined-all-paths option The combined diff format for merges will only list one filename, even if rename or copy detection is active. For example, with raw format one might see: ::100644 100644 100644 `fabadb8` `cc95eb0` `4866510` MM describe.c ::100755 100755 100755 52b7a2d 6d1ac04 d2ac7d7 RM bar.sh ::100644 100644 100644 e07d6c5 9042e82 ee91881 RR phooey.c This doesn't let us know what the original name of bar.sh was in the first parent, and doesn't let us know what either of the original names of phooey.c were in either of the parents. In contrast, for non-merge commits, raw format does provide original filenames (and a rename score to boot). In order to also provide original filenames for merge commits, add a --combined-all-paths option (which must be used with either -c or --cc, and is likely only useful with rename or copy detection active) so that we can print tab-separated filenames when renames are involved. This transforms the above output to: ::100644 100644 100644 `fabadb8` `cc95eb0` `4866510` MM desc.c desc.c desc.c ::100755 100755 100755 52b7a2d 6d1ac04 d2ac7d7 RM foo.sh bar.sh bar.sh ::100644 100644 100644 e07d6c5 9042e82 ee91881 RR fooey.c fuey.c phooey.c Further, in patch format, this changes the from/to headers so that instead of just having one "from" header, we get one for each parent. For example, instead of having --- a/phooey.c +++ b/phooey.c we would see --- a/fooey.c --- a/fuey.c +++ b/phooey.c Signed-off-by: Elijah Newren <newren@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2019-02-07 20:15:25 -08:00
Junio C Hamano	5fda343321	Merge branch 'ds/push-sparse-tree-walk' "git pack-objects" learned another algorithm to compute the set of objects to send, that trades the resulting packfile off to save traversal cost to favor small pushes. * ds/push-sparse-tree-walk: pack-objects: create GIT_TEST_PACK_SPARSE pack-objects: create pack.useSparse setting revision: implement sparse algorithm list-objects: consume sparse tree walk revision: add mark_tree_uninteresting_sparse	2019-02-06 22:05:25 -08:00
Junio C Hamano	7589e63648	Merge branch 'nd/the-index-final' The assumption to work on the single "in-core index" instance has been reduced from the library-ish part of the codebase. * nd/the-index-final: cache.h: flip NO_THE_INDEX_COMPATIBILITY_MACROS switch read-cache.c: remove the_* from index_has_changes() merge-recursive.c: remove implicit dependency on the_repository merge-recursive.c: remove implicit dependency on the_index sha1-name.c: remove implicit dependency on the_index read-cache.c: replace update_index_if_able with repo_& read-cache.c: kill read_index() checkout: avoid the_index when possible repository.c: replace hold_locked_index() with repo_hold_locked_index() notes-utils.c: remove the_repository references grep: use grep_opt->repo instead of explict repo argument	2019-02-06 22:05:23 -08:00
Junio C Hamano	09b2e40944	Merge branch 'jt/get-reference-with-commit-graph' Micro-optimize the code that prepares commit objects to be walked by "git rev-list" when the commit-graph is available. * jt/get-reference-with-commit-graph: revision: use commit graph in get_reference()	2019-02-05 14:26:10 -08:00
Junio C Hamano	371820d5f1	Merge branch 'bc/tree-walk-oid' The code to walk tree objects has been taught that we may be working with object names that are not computed with SHA-1. * bc/tree-walk-oid: cache: make oidcpy always copy GIT_MAX_RAWSZ bytes tree-walk: store object_id in a separate member match-trees: use hashcpy to splice trees match-trees: compute buffer offset correctly when splicing tree-walk: copy object ID before use	2019-01-29 12:47:56 -08:00
Derrick Stolee	d5d2e93577	revision: implement sparse algorithm When enumerating objects to place in a pack-file during 'git pack-objects --revs', we discover the "frontier" of commits that we care about and the boundary with commit we find uninteresting. From that point, we walk trees to discover which trees and blobs are uninteresting. Finally, we walk trees from the interesting commits to find the interesting objects that are placed in the pack. This commit introduces a new, "sparse" way to discover the uninteresting trees. We use the perspective of a single user trying to push their topic to a large repository. That user likely changed a very small fraction of the paths in their working directory, but we spend a lot of time walking all reachable trees. The way to switch the logic to work in this sparse way is to start caring about which paths introduce new trees. While it is not possible to generate a diff between the frontier boundary and all of the interesting commits, we can simulate that behavior by inspecting all of the root trees as a whole, then recursing down to the set of trees at each path. We already had taken the first step by passing an oidset to mark_trees_uninteresting_sparse(). We now create a dictionary whose keys are paths and values are oidsets. We consider the set of trees that appear at each path. While we inspect a tree, we add its subtrees to the oidsets corresponding to the tree entry's path. We also mark trees as UNINTERESTING if the tree we are parsing is UNINTERESTING. To actually improve the performance, we need to terminate our recursion. If the oidset contains only UNINTERESTING trees, then we do not continue the recursion. This avoids walking trees that are likely to not be reachable from interesting trees. If the oidset contains only interesting trees, then we will walk these trees in the final stage that collects the intersting objects to place in the pack. Thus, we only recurse if the oidset contains both interesting and UNINITERESTING trees. There are a few ways that this is not a universally better option. First, we can pack extra objects. If someone copies a subtree from one tree to another, the first tree will appear UNINTERESTING and we will not recurse to see that the subtree should also be UNINTERESTING. We will walk the new tree and see the subtree as a "new" object and add it to the pack. A test is modified to demonstrate this behavior and to verify that the new logic is being exercised. Second, we can have extra memory pressure. If instead of being a single user pushing a small topic we are a server sending new objects from across the entire working directory, then we will gain very little (the recursion will rarely terminate early) but will spend extra time maintaining the path-oidset dictionaries. Despite these potential drawbacks, the benefits of the algorithm are clear. By adding a counter to 'add_children_by_path' and 'mark_tree_contents_uninteresting', I measured the number of parsed trees for the two algorithms in a variety of repos. For git.git, I used the following input: v2.19.0 ^v2.19.0~10 Objects to pack: 550 Walked (old alg): 282 Walked (new alg): 130 For the Linux repo, I used the following input: v4.18 ^v4.18~10 Objects to pack: 518 Walked (old alg): 4,836 Walked (new alg): 188 The two repos above are rather "wide and flat" compared to other repos that I have used in the past. As a comparison, I tested an old topic branch in the Azure DevOps repo, which has a much deeper folder structure than the Linux repo. Objects to pack: 220 Walked (old alg): 22,804 Walked (new alg): 129 I used the number of walked trees the main metric above because it is consistent across multiple runs. When I ran my tests, the performance of the pack-objects command with the same options could change the end-to-end time by 10x depending on the file system being warm. However, by repeating the same test on repeat I could get more consistent timing results. The git.git and Linux tests were too fast overall (less than 0.5s) to measure an end-to-end difference. The Azure DevOps case was slow enough to see the time improve from 15s to 1s in the warm case. The cold case was 90s to 9s in my testing. These improvements will have even larger benefits in the super- large Windows repository. In our experiments, we see the "Enumerate objects" phase of pack-objects taking 60-80% of the end-to-end time of non-trivial pushes, taking longer than the network time to send the pack and the server time to verify the pack. Signed-off-by: Derrick Stolee <dstolee@microsoft.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2019-01-17 13:44:42 -08:00
Derrick Stolee	f1f5de442f	revision: add mark_tree_uninteresting_sparse In preparation for a new algorithm that walks fewer trees when creating a pack from a set of revisions, create a method that takes an oidset of tree oids and marks reachable objects as UNINTERESTING. The current implementation uses the existing mark_tree_uninteresting to recursively walk the trees and blobs. This will walk the same number of trees as the old mechanism. To ensure that mark_tree_uninteresting walks the tree, we need to remove the UNINTERESTING flag before calling the method. This implementation will be replaced entirely in a later commit. There is one new assumption in this approach: we are also given the oids of the interesting trees. This implementation does not use those trees at the moment, but we will use them in a later rewrite of this method. Signed-off-by: Derrick Stolee <dstolee@microsoft.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2019-01-17 13:44:37 -08:00
brian m. carlson	ea82b2a085	tree-walk: store object_id in a separate member When parsing a tree, we read the object ID directly out of the tree buffer. This is normally fine, but such an object ID cannot be used with oidcpy, which copies GIT_MAX_RAWSZ bytes, because if we are using SHA-1, there may not be that many bytes to copy. Instead, store the object ID in a separate struct member. Since we can no longer efficiently compute the path length, store that information as well in struct name_entry. Ensure we only copy the object ID into the new buffer if the path length is nonzero, as some callers will pass us an empty path with no object ID following it, and we will not want to read past the end of the buffer. Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2019-01-15 09:57:41 -08:00
Junio C Hamano	f2b6aa98be	Merge branch 'nd/indentation-fix' Code cleanup. * nd/indentation-fix: Indent code with TABs	2019-01-14 15:29:32 -08:00
Junio C Hamano	6e5be1f2d5	Merge branch 'md/exclude-promisor-objects-fix-cleanup' Code clean-up. * md/exclude-promisor-objects-fix-cleanup: revision.c: put promisor option in specialized struct	2019-01-14 15:29:31 -08:00
Junio C Hamano	d6f05a435f	Merge branch 'nd/attr-pathspec-in-tree-walk' The traversal over tree objects has learned to honor ":(attr:label)" pathspec match, which has been implemented only for enumerating paths on the filesystem. * nd/attr-pathspec-in-tree-walk: tree-walk: support :(attr) matching dir.c: move, rename and export match_attrs() pathspec.h: clean up "extern" in function declarations tree-walk.c: make tree_entry_interesting() take an index tree.c: make read_tree() take 'struct repository '	2019-01-14 15:29:28 -08:00
Junio C Hamano	c333fe7368	Merge branch 'md/list-lazy-objects-fix' "git rev-list --exclude-promisor-objects" had to take an object that does not exist locally (and is lazily available) from the command line without barfing, but the code dereferenced NULL. * md/list-lazy-objects-fix: list-objects.c: don't segfault for missing cmdline objects	2019-01-14 15:29:28 -08:00
Nguyễn Thái Ngọc Duy	3a7a698e93	sha1-name.c: remove implicit dependency on the_index This kills the_index dependency in get_oid_with_context() but for get_oid() and friends, they still assume the_repository (which also means the_index). Unfortunately the widespread use of get_oid() will make it hard to make the conversion now. We probably will add repo_get_oid() at some point and limit the use of get_oid() in builtin/ instead of forcing all get_oid() call sites to carry struct repository. Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2019-01-14 12:13:04 -08:00
Nguyễn Thái Ngọc Duy	e1ff0a32e4	read-cache.c: kill read_index() read_index() shares the same problem as hold_locked_index(): it assumes $GIT_DIR/index. Move all call sites to repo_read_index() instead. read_index_preload() and read_index_unmerged() are also killed as a consequence. Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2019-01-14 12:13:04 -08:00
Jonathan Tan	ec0c5798ee	revision: use commit graph in get_reference() When fetching into a repository, a connectivity check is first made by check_exist_and_connected() in builtin/fetch.c that runs: git rev-list --objects --stdin --not --all --quiet <(list of objects) If the client repository has many refs, this command can be slow, regardless of the nature of the server repository or what is being fetched. A profiler reveals that most of the time is spent in setup_revisions() (approx. 60/63), and of the time spent in setup_revisions(), most of it is spent in parse_object() (approx. 49/60). This is because setup_revisions() parses the target of every ref (from "--all"), and parse_object() reads the buffer of the object. Reading the buffer is unnecessary if the repository has a commit graph and if the ref points to a commit (which is typically the case). This patch uses the commit graph wherever possible; on my computer, when I run the above command with a list of 1 object on a many-ref repository, I get a speedup from 1.8s to 1.0s. Another way to accomplish this effect would be to modify parse_object() to use the commit graph if possible; however, I did not want to change parse_object()'s current behavior of always checking the object signature of the returned object. Signed-off-by: Jonathan Tan <jonathantanmy@google.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2018-12-28 10:39:48 -08:00
Nguyễn Thái Ngọc Duy	ec36c42a63	Indent code with TABs We indent with TABs and sometimes for fine alignment, TABs followed by spaces, but never all spaces (unless the indentation is less than 8 columns). Indenting with spaces slips through in some places. Fix them. Imported code and compat/ are left alone on purpose. The former should remain as close as upstream as possible. The latter pretty much has separate maintainers, it's up to them to decide. Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2018-12-09 12:37:32 +09:00
Matthew DeVore	bbcde41a70	revision.c: put promisor option in specialized struct Put the allow_exclude_promisor_objects flag in setup_revision_opt. When it was in rev_info, it was unclear when it was used, since rev_info is passed to functions that don't use the flag. This resulted in unnecessary setting of the flag in prune.c, so fix that as well. Signed-off-by: Matthew DeVore <matvore@google.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2018-12-06 14:52:56 +09:00
Matthew DeVore	4cf67869b2	list-objects.c: don't segfault for missing cmdline objects When a command is invoked with both --exclude-promisor-objects, --objects-edge-aggressive, and a missing object on the command line, the rev_info.cmdline array could get a NULL pointer for the value of an 'item' field. Prevent dereferencing of a NULL pointer in that situation. Properly handle --ignore-missing. If it is not passed, die when an object is missing. Otherwise, just silently ignore it. Signed-off-by: Matthew DeVore <matvore@google.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2018-12-06 10:10:13 +09:00
Nguyễn Thái Ngọc Duy	67022e0214	tree-walk.c: make tree_entry_interesting() take an index In order to support :(attr) when matching pathspec on a tree, tree_entry_interesting() needs to take an index (because git_check_attr() needs it). This is the preparation step for it. This also makes it clearer what index we fall back to when looking up attributes during an unpack-trees operation: the source index. This also fixes revs->pruning.repo initialization that should have been done in `2abf350385` (revision.c: remove implicit dependency on the_index - 2018-09-21). Without it, skip_uninteresting() will dereference a NULL pointer through this call chain get_revision(revs) get_revision_internal get_revision_1 try_to_simplify_commit rev_compare_tree diff_tree_oid(..., &revs->pruning) ll_diff_tree_oid diff_tree_paths ll_diff_tree skip_uninteresting Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2018-11-19 10:50:33 +09:00
Junio C Hamano	62ca33e02a	Merge branch 'ds/reachable-topo-order' The revision walker machinery learned to take advantage of the commit generation numbers stored in the commit-graph file. * ds/reachable-topo-order: t6012: make rev-list tests more interesting revision.c: generation-based topo-order algorithm commit/revisions: bookkeeping before refactoring revision.c: begin refactoring --topo-order logic test-reach: add rev-list tests test-reach: add run_three_modes method prio-queue: add 'peek' operation	2018-11-18 18:23:52 +09:00
Junio C Hamano	f22838aa7a	Merge branch 'jk/misc-unused-fixes' Assorted fixes for bugs found while auditing -Wunused-parameter warnings. * jk/misc-unused-fixes: approxidate: fix NULL dereference in date_time() pathspec: handle non-terminated strings with :(attr) approxidate: handle pending number for "specials" rev-list: handle flags for --indexed-objects	2018-11-13 22:37:26 +09:00
Junio C Hamano	e146cc97be	Merge branch 'nd/per-worktree-ref-iteration' The code to traverse objects for reachability, used to decide what objects are unreferenced and expendable, have been taught to also consider per-worktree refs of other worktrees as starting points to prevent data loss. * nd/per-worktree-ref-iteration: git-worktree.txt: correct linkgit command name reflog expire: cover reflog from all worktrees fsck: check HEAD and reflog from other worktrees fsck: move fsck_head_link() to get_default_heads() to avoid some globals revision.c: better error reporting on ref from different worktrees revision.c: correct a parameter name refs: new ref types to make per-worktree refs visible to all worktrees Add a place for (not) sharing stuff between worktrees refs.c: indent with tabs, not spaces	2018-11-13 22:37:26 +09:00
Junio C Hamano	a29b8bcf62	Merge branch 'md/exclude-promisor-objects-fix' Operations on promisor objects make sense in the context of only a small subset of the commands that internally use the revisions machinery, but the "--exclude-promisor-objects" option were taken and led to nonsense results by commands like "log", to which it didn't make much sense. This has been corrected. * md/exclude-promisor-objects-fix: exclude-promisor-objects: declare when option is allowed Documentation/git-log.txt: do not show --exclude-promisor-objects	2018-11-06 15:50:21 +09:00
Jeff King	b4cfcde4db	rev-list: handle flags for --indexed-objects When a traversal sees the --indexed-objects option, it adds all blobs and valid cache-trees from the index to the traversal using add_index_objects_to_pending(). But that function totally ignores its flags parameter! That means that doing: git rev-list --objects --indexed-objects and git rev-list --objects --not --indexed-objects produce the same output, because we ignore the UNINTERESTING flag when walking the index in the second example. Nobody noticed because this feature was added as a way for tools like repack to increase their coverage of reachable objects, meaning it would only be used like the first example above. But since it's user facing (and because the documentation describes it "as if the objects are listed on the command line"), we should make sure the negative case behaves sensibly. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2018-11-02 20:49:52 +09:00
Derrick Stolee	b45424181e	revision.c: generation-based topo-order algorithm The current --topo-order algorithm requires walking all reachable commits up front, topo-sorting them, all before outputting the first value. This patch introduces a new algorithm which uses stored generation numbers to incrementally walk in topo-order, outputting commits as we go. This can dramatically reduce the computation time to write a fixed number of commits, such as when limiting with "-n <N>" or filling the first page of a pager. When running a command like 'git rev-list --topo-order HEAD', Git performed the following steps: 1. Run limit_list(), which parses all reachable commits, adds them to a linked list, and distributes UNINTERESTING flags. If all unprocessed commits are UNINTERESTING, then it may terminate without walking all reachable commits. This does not occur if we do not specify UNINTERESTING commits. 2. Run sort_in_topological_order(), which is an implementation of Kahn's algorithm. It first iterates through the entire set of important commits and computes the in-degree of each (plus one, as we use 'zero' as a special value here). Then, we walk the commits in priority order, adding them to the priority queue if and only if their in-degree is one. As we remove commits from this priority queue, we decrement the in-degree of their parents. 3. While we are peeling commits for output, get_revision_1() uses pop_commit on the full list of commits computed by sort_in_topological_order(). In the new algorithm, these three steps correspond to three different commit walks. We run these walks simultaneously, and advance each only as far as necessary to satisfy the requirements of the 'higher order' walk. We know when we can pause each walk by using generation numbers from the commit- graph feature. Recall that the generation number of a commit satisfies: * If the commit has at least one parent, then the generation number is one more than the maximum generation number among its parents. * If the commit has no parent, then the generation number is one. There are two special generation numbers: * GENERATION_NUMBER_INFINITY: this value is 0xffffffff and indicates that the commit is not stored in the commit-graph and the generation number was not previously calculated. * GENERATION_NUMBER_ZERO: this value (0) is a special indicator to say that the commit-graph was generated by a version of Git that does not compute generation numbers (such as v2.18.0). Since we use generation_numbers_enabled() before using the new algorithm, we do not need to worry about GENERATION_NUMBER_ZERO. However, the existence of GENERATION_NUMBER_INFINITY implies the following weaker statement than the usual we expect from generation numbers: If A and B are commits with generation numbers gen(A) and gen(B) and gen(A) < gen(B), then A cannot reach B. Thus, we will walk in each of our stages until the "maximum unexpanded generation number" is strictly lower than the generation number of a commit we are about to use. The walks are as follows: 1. EXPLORE: using the explore_queue priority queue (ordered by maximizing the generation number), parse each reachable commit until all commits in the queue have generation number strictly lower than needed. During this walk, update the UNINTERESTING flags as necessary. 2. INDEGREE: using the indegree_queue priority queue (ordered by maximizing the generation number), add one to the in- degree of each parent for each commit that is walked. Since we walk in order of decreasing generation number, we know that discovering an in-degree value of 0 means the value for that commit was not initialized, so should be initialized to two. (Recall that in-degree value "1" is what we use to say a commit is ready for output.) As we iterate the parents of a commit during this walk, ensure the EXPLORE walk has walked beyond their generation numbers. 3. TOPO: using the topo_queue priority queue (ordered based on the sort_order given, which could be commit-date, author- date, or typical topo-order which treats the queue as a LIFO stack), remove a commit from the queue and decrement the in-degree of each parent. If a parent has an in-degree of one, then we add it to the topo_queue. Before we decrement the in-degree, however, ensure the INDEGREE walk has walked beyond that generation number. The implementations of these walks are in the following methods: * explore_walk_step and explore_to_depth * indegree_walk_step and compute_indegrees_to_depth * next_topo_commit and expand_topo_walk These methods have some patterns that may seem strange at first, but they are probably carry-overs from their equivalents in limit_list and sort_in_topological_order. One thing that is missing from this implementation is a proper way to stop walking when the entire queue is UNINTERESTING, so this implementation is not enabled by comparisions, such as in 'git rev-list --topo-order A..B'. This can be updated in the future. In my local testing, I used the following Git commands on the Linux repository in three modes: HEAD~1 with no commit-graph, HEAD~1 with a commit-graph, and HEAD with a commit-graph. This allows comparing the benefits we get from parsing commits from the commit-graph and then again the benefits we get by restricting the set of commits we walk. Test: git rev-list --topo-order -100 HEAD HEAD~1, no commit-graph: 6.80 s HEAD~1, w/ commit-graph: 0.77 s HEAD, w/ commit-graph: 0.02 s Test: git rev-list --topo-order -100 HEAD -- tools HEAD~1, no commit-graph: 9.63 s HEAD~1, w/ commit-graph: 6.06 s HEAD, w/ commit-graph: 0.06 s This speedup is due to a few things. First, the new generation- number-enabled algorithm walks commits on order of the number of results output (subject to some branching structure expectations). Since we limit to 100 results, we are running a query similar to filling a single page of results. Second, when specifying a path, we must parse the root tree object for each commit we walk. The previous benefits from the commit-graph are entirely from reading the commit-graph instead of parsing commits. Since we need to parse trees for the same number of commits as before, we slow down significantly from the non-path-based query. For the test above, I specifically selected a path that is changed frequently, including by merge commits. A less-frequently-changed path (such as 'README') has similar end-to-end time since we need to walk the same number of commits (before determining we do not have 100 hits). However, get the benefit that the output is presented to the user as it is discovered, much the same as a normal 'git log' command (no '--topo-order'). This is an improved user experience, even if the command has the same runtime. Helped-by: Jeff King <peff@peff.net> Signed-off-by: Derrick Stolee <dstolee@microsoft.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2018-11-02 12:14:22 +09:00
Derrick Stolee	5284fc5cc9	commit/revisions: bookkeeping before refactoring There are a few things that need to move around a little before making a big refactoring in the topo-order logic: 1. We need access to record_author_date() and compare_commits_by_author_date() in revision.c. These are used currently by sort_in_topological_order() in commit.c. 2. Moving these methods to commit.h requires adding an author_date_slab declaration to commit.h. Consumers will need their own implementation. 3. The add_parents_to_list() method in revision.c performs logic around the UNINTERESTING flag and other special cases depending on the struct rev_info. Allow this method to ignore a NULL 'list' parameter, as we will not be populating the list for our walk. Also rename the method to the slightly more generic name process_parents() to make clear that this method does more than add to a list (and no list is required anymore). Helped-by: Jeff King <peff@peff.net> Signed-off-by: Derrick Stolee <dstolee@microsoft.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2018-11-02 12:14:22 +09:00
Derrick Stolee	f0d9cc4196	revision.c: begin refactoring --topo-order logic When running 'git rev-list --topo-order' and its kin, the topo_order setting in struct rev_info implies the limited setting. This means that the following things happen during prepare_revision_walk(): * revs->limited implies we run limit_list() to walk the entire reachable set. There are some short-cuts here, such as if we perform a range query like 'git rev-list COMPARE..HEAD' and we can stop limit_list() when all queued commits are uninteresting. * revs->topo_order implies we run sort_in_topological_order(). See the implementation of that method in commit.c. It implies that the full set of commits to order is in the given commit_list. These two methods imply that a 'git rev-list --topo-order HEAD' command must walk the entire reachable set of commits _twice_ before returning a single result. If we have a commit-graph file with generation numbers computed, then there is a better way. This patch introduces some necessary logic redirection when we are in this situation. In v2.18.0, the commit-graph file contains zero-valued bytes in the positions where the generation number is stored in v2.19.0 and later. Thus, we use generation_numbers_enabled() to check if the commit-graph is available and has non-zero generation numbers. When setting revs->limited only because revs->topo_order is true, only do so if generation numbers are not available. There is no reason to use the new logic as it will behave similarly when all generation numbers are INFINITY or ZERO. In prepare_revision_walk(), if we have revs->topo_order but not revs->limited, then we trigger the new logic. It breaks the logic into three pieces, to fit with the existing framework: 1. init_topo_walk() fills a new struct topo_walk_info in the rev_info struct. We use the presence of this struct as a signal to use the new methods during our walk. In this patch, this method simply calls limit_list() and sort_in_topological_order(). In the future, this method will set up a new data structure to perform that logic in-line. 2. next_topo_commit() provides get_revision_1() with the next topo- ordered commit in the list. Currently, this simply pops the commit from revs->commits. 3. expand_topo_walk() provides get_revision_1() with a way to signal walking beyond the latest commit. Currently, this calls add_parents_to_list() exactly like the old logic. While this commit presents method redirection for performing the exact same logic as before, it allows the next commit to focus only on the new logic. Signed-off-by: Derrick Stolee <dstolee@microsoft.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2018-11-02 12:14:22 +09:00
Junio C Hamano	77d503757d	Merge branch 'md/filter-trees' The "rev-list --filter" feature learned to exclude all trees via "tree:0" filter. * md/filter-trees: list-objects: support for skipping tree traversal filter-trees: code clean-up of tests list-objects-filter: implement filter tree:0 list-objects-filter-options: do not over-strbuf_init list-objects-filter: use BUG rather than die revision: mark non-user-given objects instead rev-list: handle missing tree objects properly list-objects: always parse trees gently list-objects: refactor to process_tree_contents list-objects: store common func args in struct	2018-10-30 15:43:39 +09:00
Matthew DeVore	669b1d2aae	exclude-promisor-objects: declare when option is allowed The --exclude-promisor-objects option causes some funny behavior in at least two commands: log and blame. It causes a BUG crash: $ git log --exclude-promisor-objects BUG: revision.c:2143: exclude_promisor_objects can only be used when fetch_if_missing is 0 Aborted [134] Fix this such that the option is treated like any other unknown option. The commands that must support it are limited, so declare in those commands that the flag is supported. In particular: pack-objects prune rev-list The commands were found by searching for logic which parses --exclude-promisor-objects outside of revision.c. Extra logic outside of revision.c is needed because fetch_if_missing must be turned on before revision.c sees the option or it will BUG-crash. The above list is supported by the fact that no other command is introspectively invoked by another command passing --exclude-promisor-object. Signed-off-by: Matthew DeVore <matvore@google.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2018-10-23 13:52:57 +09:00
Nguyễn Thái Ngọc Duy	ab3e1f78ae	revision.c: better error reporting on ref from different worktrees Make use of the new ref aliases to pass refs from another worktree around and access them from the current ref store instead. This does not change any functionality, but when a problem arises, we would like the reported messages to mention full ref aliases, like this: fatal: bad object worktrees/ztemp/HEAD warning: reflog of 'main-worktree/HEAD' references pruned commits instead of fatal: bad object HEAD warning: reflog of 'HEAD' references pruned commits which does not really tell where the refs are from. Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2018-10-22 13:32:29 +09:00
Nguyễn Thái Ngọc Duy	061e420a4d	revision.c: correct a parameter name This function is a callback of for_each_reflog() which will pass a ref name as the first argument, not a path (to a reflog file). Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2018-10-22 13:32:29 +09:00
Junio C Hamano	11877b9ebe	Merge branch 'nd/the-index' Various codepaths in the core-ish part learn to work on an arbitrary in-core index structure, not necessarily the default instance "the_index". * nd/the-index: (23 commits) revision.c: reduce implicit dependency the_repository revision.c: remove implicit dependency on the_index ws.c: remove implicit dependency on the_index tree-diff.c: remove implicit dependency on the_index submodule.c: remove implicit dependency on the_index line-range.c: remove implicit dependency on the_index userdiff.c: remove implicit dependency on the_index rerere.c: remove implicit dependency on the_index sha1-file.c: remove implicit dependency on the_index patch-ids.c: remove implicit dependency on the_index merge.c: remove implicit dependency on the_index merge-blobs.c: remove implicit dependency on the_index ll-merge.c: remove implicit dependency on the_index diff-lib.c: remove implicit dependency on the_index read-cache.c: remove implicit dependency on the_index diff.c: remove implicit dependency on the_index grep.c: remove implicit dependency on the_index diff.c: remove the_index dependency in textconv() functions blame.c: rename "repo" argument to "r" combine-diff.c: remove implicit dependency on the_index ...	2018-10-19 13:34:02 +09:00
Matthew DeVore	99c9aa9579	revision: mark non-user-given objects instead Currently, list-objects.c incorrectly treats all root trees of commits as USER_GIVEN. Also, it would be easier to mark objects that are non-user-given instead of user-given, since the places in the code where we access an object through a reference are more obvious than the places where we access an object that was given by the user. Resolve these two problems by introducing a flag NOT_USER_GIVEN that marks blobs and trees that are non-user-given, replacing USER_GIVEN. (Only blobs and trees are marked because this mark is only used when filtering objects, and filtering of other types of objects is not supported yet.) This fixes a bug in that git rev-list behaved differently from git pack-objects. pack-objects would not filter objects given explicitly on the command line and rev-list would filter. This was because the two commands used a different function to add objects to the rev_info struct. This seems to have been an oversight, and pack-objects has the correct behavior, so I added a test to make sure that rev-list now behaves properly. Signed-off-by: Matthew DeVore <matvore@google.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2018-10-07 08:55:00 +09:00
Nguyễn Thái Ngọc Duy	b3c7eef9b0	revision.c: reduce implicit dependency the_repository Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2018-09-21 09:51:19 -07:00
Nguyễn Thái Ngọc Duy	2abf350385	revision.c: remove implicit dependency on the_index Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2018-09-21 09:51:19 -07:00
Nguyễn Thái Ngọc Duy	a7edadda59	patch-ids.c: remove implicit dependency on the_index Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2018-09-21 09:48:11 -07:00
Nguyễn Thái Ngọc Duy	e675765235	diff.c: remove implicit dependency on the_index A new variant repo_diff_setup() is added that takes 'struct repository *' and diff_setup() becomes a thin macro around it that is protected by NO_THE_REPOSITORY_COMPATIBILITY_MACROS, similar to NO_THE_INDEX_.... The plan is these macros will always be defined for all library files and the macros are only accessible in builtin/ Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2018-09-21 09:48:10 -07:00
Nguyễn Thái Ngọc Duy	38bbc2ea39	grep.c: remove implicit dependency on the_index Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2018-09-21 09:48:10 -07:00
Junio C Hamano	769af0fd9e	Merge branch 'jk/cocci' spatch transformation to replace boolean uses of !hashcmp() to newly introduced oideq() is added, and applied, to regain performance lost due to support of multiple hash algorithms. * jk/cocci: show_dirstat: simplify same-content check read-cache: use oideq() in ce_compare functions convert hashmap comparison functions to oideq() convert "hashcmp() != 0" to "!hasheq()" convert "oidcmp() != 0" to "!oideq()" convert "hashcmp() == 0" to hasheq() convert "oidcmp() == 0" to oideq() introduce hasheq() and oideq() coccinelle: use <...> for function exclusion	2018-09-17 13:53:57 -07:00
Junio C Hamano	1b7a91da71	Merge branch 'ds/reachable' The code for computing history reachability has been shuffled, obtained a bunch of new tests to cover them, and then being improved. * ds/reachable: commit-reach: correct accidental #include of C file commit-reach: use can_all_from_reach commit-reach: make can_all_from_reach... linear commit-reach: replace ref_newer logic test-reach: test commit_contains test-reach: test can_all_from_reach_with_flags test-reach: test reduce_heads test-reach: test get_merge_bases_many test-reach: test is_descendant_of test-reach: test in_merge_bases test-reach: create new test tool for ref_newer commit-reach: move can_all_from_reach_with_flags upload-pack: generalize commit date cutoff upload-pack: refactor ok_to_give_up() upload-pack: make reachable() more generic commit-reach: move commit_contains from ref-filter commit-reach: move ref_newer from remote.c commit.h: remove method declarations commit-reach: move walk methods from commit.c	2018-09-17 13:53:52 -07:00
Junio C Hamano	8b6f6075be	Merge branch 'jk/rev-list-stdin-noop-is-ok' "git rev-list --stdin </dev/null" used to be an error; it now shows no output without an error. "git rev-list --stdin --default HEAD" still falls back to the given default when nothing is given on the standard input. * jk/rev-list-stdin-noop-is-ok: rev-list: make empty --stdin not an error	2018-09-17 13:53:48 -07:00
Jeff King	4a7e27e957	convert "oidcmp() == 0" to oideq() Using the more restrictive oideq() should, in the long run, give the compiler more opportunities to optimize these callsites. For now, this conversion should be a complete noop with respect to the generated code. The result is also perhaps a little more readable, as it avoids the "zero is equal" idiom. Since it's so prevalent in C, I think seasoned programmers tend not to even notice it anymore, but it can sometimes make for awkward double negations (e.g., we can drop a few !!oidcmp() instances here). This patch was generated almost entirely by the included coccinelle patch. This mechanical conversion should be completely safe, because we check explicitly for cases where oidcmp() is compared to 0, which is what oideq() is doing under the hood. Note that we don't have to catch "!oidcmp()" separately; coccinelle's standard isomorphisms make sure the two are treated equivalently. I say "almost" because I did hand-edit the coccinelle output to fix up a few style violations (it mostly keeps the original formatting, but sometimes unwraps long lines). Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2018-08-29 11:32:49 -07:00
Jeff King	a12cbe23ef	rev-list: make empty --stdin not an error When we originally did the series that contains `7ba826290a` (revision: add rev_input_given flag, 2017-08-02) the intent was that "git rev-list --stdin </dev/null" would similarly become a successful noop. However, an attempt at the time to do that did not work[1]. The problem is that rev_input_given serves two roles: - it tells rev-list.c that it should not error out - it tells revision.c that it should not have the "default" ref kick (e.g., "HEAD" in "git log") We want to trigger the former, but not the latter. This is technically possible with a single flag, if we set the flag only after revision.c's revs->def check. But this introduces a rather subtle ordering dependency. Instead, let's keep two flags: one to denote when we got actual input (which triggers both roles) and one for when we read stdin (which triggers only the first). This does mean a caller interested in the first role has to check both flags, but there's only one such caller. And any future callers might want to make the distinction anyway (e.g., if they care less about erroring out, and more about whether revision.c soaked up our stdin). In fact, we already keep such a flag internally in revision.c for this purpose, so this is really just exposing that to the caller (and the old function-local flag can go away in favor of our new one). [1] https://public-inbox.org/git/20170802223416.gwiezhbuxbdmbjzx@sigill.intra.peff.net/ Helped-by: Junio C Hamano <gitster@pobox.com> Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2018-08-22 14:44:50 -07:00
Nguyễn Thái Ngọc Duy	6d2df284e7	dir.c: remove an implicit dependency on the_index in pathspec code Make the match_patchspec API and friends take an index_state instead of assuming the_index in dir.c. All external call sites are converted blindly to keep the patch simple and retain current behavior. Individual call sites may receive further updates to use the right index instead of the_index. Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2018-08-13 14:14:42 -07:00
Junio C Hamano	09ca613049	Merge branch 'jt/tags-to-promised-blobs-fix' The lazy clone support had a few places where missing but promised objects were not correctly tolerated, which have been fixed. * jt/tags-to-promised-blobs-fix: tag: don't warn if target is missing but promised revision: tolerate promised targets of tags	2018-08-02 15:30:46 -07:00
Junio C Hamano	3a2a1dc170	Merge branch 'sb/object-store-lookup' lookup_commit_reference() and friends have been updated to find in-core object for a specific in-core repository instance. * sb/object-store-lookup: (32 commits) commit.c: allow lookup_commit_reference to handle arbitrary repositories commit.c: allow lookup_commit_reference_gently to handle arbitrary repositories tag.c: allow deref_tag to handle arbitrary repositories object.c: allow parse_object to handle arbitrary repositories object.c: allow parse_object_buffer to handle arbitrary repositories commit.c: allow get_cached_commit_buffer to handle arbitrary repositories commit.c: allow set_commit_buffer to handle arbitrary repositories commit.c: migrate the commit buffer to the parsed object store commit-slabs: remove realloc counter outside of slab struct commit.c: allow parse_commit_buffer to handle arbitrary repositories tag: allow parse_tag_buffer to handle arbitrary repositories tag: allow lookup_tag to handle arbitrary repositories commit: allow lookup_commit to handle arbitrary repositories tree: allow lookup_tree to handle arbitrary repositories blob: allow lookup_blob to handle arbitrary repositories object: allow lookup_object to handle arbitrary repositories object: allow object_as_type to handle arbitrary repositories tag: add repository argument to deref_tag tag: add repository argument to parse_tag_buffer tag: add repository argument to lookup_tag ...	2018-08-02 15:30:42 -07:00
Junio C Hamano	8fa8a4f1ec	Merge branch 'jt/partial-clone-fsck-connectivity' Partial clone support of "git clone" has been updated to correctly validate the objects it receives from the other side. The server side has been corrected to send objects that are directly requested, even if they may match the filtering criteria (e.g. when doing a "lazy blob" partial clone). * jt/partial-clone-fsck-connectivity: clone: check connectivity even if clone is partial upload-pack: send refs' objects despite "filter"	2018-07-24 14:50:47 -07:00
Derrick Stolee	6404355657	commit.h: remove method declarations These methods are now declared in commit-reach.h. Remove them from commit.h and add new include statements in all files that require these declarations. Signed-off-by: Derrick Stolee <dstolee@microsoft.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2018-07-20 15:38:54 -07:00
Junio C Hamano	00624d608c	Merge branch 'sb/object-store-grafts' The conversion to pass "the_repository" and then "a_repository" throughout the object access API continues. * sb/object-store-grafts: commit: allow lookup_commit_graft to handle arbitrary repositories commit: allow prepare_commit_graft to handle arbitrary repositories shallow: migrate shallow information into the object parser path.c: migrate global git_path_* to take a repository argument cache: convert get_graft_file to handle arbitrary repositories commit: convert read_graft_file to handle arbitrary repositories commit: convert register_commit_graft to handle arbitrary repositories commit: convert commit_graft_pos() to handle arbitrary repositories shallow: add repository argument to is_repository_shallow shallow: add repository argument to check_shallow_file_for_update shallow: add repository argument to register_shallow shallow: add repository argument to set_alternate_shallow_file commit: add repository argument to lookup_commit_graft commit: add repository argument to prepare_commit_graft commit: add repository argument to read_graft_file commit: add repository argument to register_commit_graft commit: add repository argument to commit_graft_pos object: move grafts to object parser object-store: move object access functions to object-store.h	2018-07-18 12:20:28 -07:00
Jonathan Tan	dc0a13f681	revision: tolerate promised targets of tags In handle_commit(), it is fatal for an annotated tag to point to a non-existent object. --exclude-promisor-objects should relax this rule and allow non-existent objects that are promisor objects, but this is not the case. Update handle_commit() to tolerate this situation. This was observed when cloning from a repository with an annotated tag pointing to a blob. The test included in this patch demonstrates this case. Signed-off-by: Jonathan Tan <jonathantanmy@google.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2018-07-16 12:56:14 -07:00
Jonathan Tan	a0c9016abd	upload-pack: send refs' objects despite "filter" A filter line in a request to upload-pack filters out objects regardless of whether they are directly referenced by a "want" line or not. This means that cloning with "--filter=blob:none" (or another filter that excludes blobs) from a repository with at least one ref pointing to a blob (for example, the Git repository itself) results in output like the following: error: missing object referenced by 'refs/tags/junio-gpg-pub' and if that particular blob is not referenced by a fetched tree, the resulting clone fails fsck because there is no object from the remote to vouch that the missing object is a promisor object. Update both the protocol and the upload-pack implementation to include all explicitly specified "want" objects in the packfile regardless of the filter specification. Signed-off-by: Jonathan Tan <jonathantanmy@google.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2018-07-09 12:37:38 -07:00
Stefan Beller	2122f6754c	commit: add repository argument to lookup_commit_reference Add a repository argument to allow callers of lookup_commit_reference to be more specific about which repository to handle. This is a small mechanical change; it doesn't change the implementation to handle repositories other than the_repository yet. As with the previous commits, use a macro to catch callers passing a repository other than the_repository at compile time. Signed-off-by: Stefan Beller <sbeller@google.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2018-06-29 10:43:39 -07:00
Stefan Beller	f86bcc7b2c	tree: add repository argument to lookup_tree Add a repository argument to allow the callers of lookup_tree to be more specific about which repository to act on. This is a small mechanical change; it doesn't change the implementation to handle repositories other than the_repository yet. As with the previous commits, use a macro to catch callers passing a repository other than the_repository at compile time. Signed-off-by: Jonathan Nieder <jrnieder@gmail.com> Signed-off-by: Stefan Beller <sbeller@google.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2018-06-29 10:43:38 -07:00
Stefan Beller	da14a7ff99	blob: add repository argument to lookup_blob Add a repository argument to allow the callers of lookup_blob to be more specific about which repository to act on. This is a small mechanical change; it doesn't change the implementation to handle repositories other than the_repository yet. As with the previous commits, use a macro to catch callers passing a repository other than the_repository at compile time. Signed-off-by: Jonathan Nieder <jrnieder@gmail.com> Signed-off-by: Stefan Beller <sbeller@google.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2018-06-29 10:43:38 -07:00
Stefan Beller	109cd76dd3	object: add repository argument to parse_object Add a repository argument to allow the callers of parse_object to be more specific about which repository to act on. This is a small mechanical change; it doesn't change the implementation to handle repositories other than the_repository yet. As with the previous commits, use a macro to catch callers passing a repository other than the_repository at compile time. Signed-off-by: Jonathan Nieder <jrnieder@gmail.com> Signed-off-by: Stefan Beller <sbeller@google.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2018-06-29 10:43:38 -07:00
Junio C Hamano	b16b60f71b	Merge branch 'sb/object-store-grafts' into sb/object-store-lookup * sb/object-store-grafts: commit: allow lookup_commit_graft to handle arbitrary repositories commit: allow prepare_commit_graft to handle arbitrary repositories shallow: migrate shallow information into the object parser path.c: migrate global git_path_* to take a repository argument cache: convert get_graft_file to handle arbitrary repositories commit: convert read_graft_file to handle arbitrary repositories commit: convert register_commit_graft to handle arbitrary repositories commit: convert commit_graft_pos() to handle arbitrary repositories shallow: add repository argument to is_repository_shallow shallow: add repository argument to check_shallow_file_for_update shallow: add repository argument to register_shallow shallow: add repository argument to set_alternate_shallow_file commit: add repository argument to lookup_commit_graft commit: add repository argument to prepare_commit_graft commit: add repository argument to read_graft_file commit: add repository argument to register_commit_graft commit: add repository argument to commit_graft_pos object: move grafts to object parser object-store: move object access functions to object-store.h	2018-06-29 10:43:28 -07:00
Junio C Hamano	b3b2aaf0fd	Merge branch 'nd/commit-util-to-slab' The in-core "commit" object had an all-purpose "void util" field, which was tricky to use especially in library-ish part of the code. All of the existing uses of the field has been migrated to a more dedicated "commit-slab" mechanism and the field is eliminated. nd/commit-util-to-slab: commit.h: delete 'util' field in struct commit merge: use commit-slab in merge remote desc instead of commit->util log: use commit-slab in prepare_bases() instead of commit->util show-branch: note about its object flags usage show-branch: use commit-slab for commit-name instead of commit->util name-rev: use commit-slab for rev-name instead of commit->util bisect.c: use commit-slab for commit weight instead of commit->util revision.c: use commit-slab for show_source sequencer.c: use commit-slab to associate todo items to commits sequencer.c: use commit-slab to mark seen commits shallow.c: use commit-slab for commit depth instead of commit->util describe: use commit-slab for commit names instead of commit->util blame: use commit-slab for blame suspects instead of commit->util commit-slab: support shared commit-slab commit-slab.h: code split	2018-06-25 13:22:35 -07:00
Junio C Hamano	42c8ce1c49	Merge branch 'bc/object-id' Conversion from uchar[20] to struct object_id continues. * bc/object-id: (42 commits) merge-one-file: compute empty blob object ID add--interactive: compute the empty tree value Update shell scripts to compute empty tree object ID sha1_file: only expose empty object constants through git_hash_algo dir: use the_hash_algo for empty blob object ID sequencer: use the_hash_algo for empty tree object ID cache-tree: use is_empty_tree_oid sha1_file: convert cached object code to struct object_id builtin/reset: convert use of EMPTY_TREE_SHA1_BIN builtin/receive-pack: convert one use of EMPTY_TREE_SHA1_HEX wt-status: convert two uses of EMPTY_TREE_SHA1_HEX submodule: convert several uses of EMPTY_TREE_SHA1_HEX sequencer: convert one use of EMPTY_TREE_SHA1_HEX merge: convert empty tree constant to the_hash_algo builtin/merge: switch tree functions to use object_id builtin/am: convert uses of EMPTY_TREE_SHA1_BIN to the_hash_algo sha1-file: add functions for hex empty tree and blob OIDs builtin/receive-pack: avoid hard-coded constants for push certs diff: specify abbreviation size in terms of the_hash_algo upload-pack: replace use of several hard-coded constants ...	2018-05-30 14:04:10 +09:00
Junio C Hamano	cf315793c1	Merge branch 'jk/unavailable-can-be-missing' Code clean-up to turn history traversal more robust in a semi-corrupt repository. * jk/unavailable-can-be-missing: mark_parents_uninteresting(): avoid most allocation mark_parents_uninteresting(): replace list with stack mark_parents_uninteresting(): drop missing object check mark_tree_contents_uninteresting(): drop missing object check	2018-05-30 14:04:08 +09:00
Junio C Hamano	50f08db594	Merge branch 'js/use-bug-macro' Developer support update, by using BUG() macro instead of die() to mark codepaths that should not happen more clearly. * js/use-bug-macro: BUG_exit_code: fix sparse "symbol not declared" warning Convert remaining die*(BUG) messages Replace all die("BUG: ...") calls by BUG() ones run-command: use BUG() to report bugs, not die() test-tool: help verifying BUG() code paths	2018-05-30 14:04:07 +09:00
Junio C Hamano	c89b6e136e	Merge branch 'ds/lazy-load-trees' The code has been taught to use the duplicated information stored in the commit-graph file to learn the tree object name for a commit to avoid opening and parsing the commit object when it makes sense to do so. * ds/lazy-load-trees: coccinelle: avoid wrong transformation suggestions from commit.cocci commit-graph: lazy-load trees for commits treewide: replace maybe_tree with accessor methods commit: create get_commit_tree() method treewide: rename tree to maybe_tree	2018-05-23 14:38:13 +09:00
Nguyễn Thái Ngọc Duy	87be252333	revision.c: use commit-slab for show_source Instead of relying on commit->util to store the source string, let the user provide a commit-slab to store the source strings in. It's done so that commit->util can be removed. See more explanation in the commit that removes commit->util. Signed-off-by: Junio C Hamano <gitster@pobox.com>	2018-05-21 14:07:20 +09:00
Stefan Beller	cbd53a2193	object-store: move object access functions to object-store.h This should make these functions easier to find and cache.h less overwhelming to read. In particular, this moves: - read_object_file - oid_object_info - write_object_file As a result, most of the codebase needs to #include object-store.h. In this patch the #include is only added to files that would fail to compile otherwise. It would be better to #include wherever identifiers from the header are used. That can happen later when we have better tooling for it. Signed-off-by: Stefan Beller <sbeller@google.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2018-05-16 11:42:03 +09:00
Jeff King	8702b30fd7	mark_parents_uninteresting(): avoid most allocation Commit `941ba8db57` (Eliminate recursion in setting/clearing marks in commit list, 2012-01-14) used a clever double-loop to avoid allocations for single-parent chains of history. However, it did so only when following parents of parents (which was an uncommon case), and _always_ incurred at least one allocation to populate the list of pending parents in the first place. We can turn this into zero-allocation in the common case by iterating directly over the initial parent list, and then following up on any pending items we might have discovered. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2018-05-13 11:33:09 +09:00
Jeff King	43fc643b75	mark_parents_uninteresting(): replace list with stack The mark_parents_uninteresting() function uses two loops: an outer one to process our queue of pending parents, and an inner one to process first-parent chains. This is a clever optimization from `941ba8db57` (Eliminate recursion in setting/clearing marks in commit list, 2012-01-14) to limit the number of linked-list allocations when following single-parent chains. Unfortunately, this makes the result a little hard to read. Let's replace the list with a stack. Then we don't have to worry about doing this double-loop optimization, as we'll just reuse the top element of the stack as we pop/push. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2018-05-13 11:23:59 +09:00
Jeff King	577dd0d29b	mark_parents_uninteresting(): drop missing object check We allow UNINTERESTING objects in a traversal to be unavailable. As part of this, mark_parents_uninteresting() checks whether we have a particular uninteresting parent; if not, we will mark it as "parsed" so that later code skips it. This code is redundant and even a little bit harmful, so let's drop it. It's redundant because when our parse_object() call in add_parents_to_list() fails, we already quietly skip UNINTERESTING parents. This redundancy is a historical artifact. The mark_parents_uninteresting() protection is from `454fbbcde3` (git-rev-list: allow missing objects when the parent is marked UNINTERESTING, 2005-07-10). Much later, `aeeae1b771` (revision traversal: allow UNINTERESTING objects to be missing, 2009-01-27) covered more cases by making the actual parse more gentle. As an aside, even if this weren't redundant, it would be insufficient. The gentle parsing handles both missing and corrupted objects, whereas the has_object_file() check we're getting rid of covers only missing ones. And the code we're dropping is harmful for two reasons: 1. We spend extra time on the object lookup, even though we don't actually need the information at this point (and will just repeat that lookup later when we parse for the common case that we _do_ have the object). 2. It "lies" about the commit by setting the parsed flag, even though we didn't load any useful data into the struct. This shouldn't matter for the UNINTERESTING case, but we may later clear our flags and do another traversal in the same process. While pretty unlikely, it's possible that we could then look at the same commit without the UNINTERESTING flag, in which case we'd produce the wrong result (we'd think it's a commit with no parents, when in fact we should probably die due to the missing object). Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2018-05-13 11:08:58 +09:00
Jeff King	a5411df0d9	mark_tree_contents_uninteresting(): drop missing object check It's generally acceptable for UNINTERESTING objects in a traversal to be unavailable (e.g., see `aeeae1b771`). When marking trees UNINTERESTING, we access the object database twice: once to check if the object is missing (and return quietly if it is), and then again to actually parse it. We can instead just try to parse; if that fails, we can then return quietly. That halves the effort we spend on locating the object. Note that this isn't _exactly_ the same as the original behavior, as the parse failure could be due to other problems than a missing object: it could be corrupted, in which case the original code would have died. But the new behavior is arguably better, as it covers the object being unavailable for any reason. We'll also still issue a warning to stderr in such a case. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2018-05-13 11:08:56 +09:00
Johannes Schindelin	033abf97fc	Replace all die("BUG: ...") calls by BUG() ones In `d8193743e0` (usage.c: add BUG() function, 2017-05-12), a new macro was introduced to use for reporting bugs instead of die(). It was then subsequently used to convert one single caller in `588a538ae5` (setup_git_env: convert die("BUG") to BUG(), 2017-05-12). The cover letter of the patch series containing this patch (cf 20170513032414.mfrwabt4hovujde2@sigill.intra.peff.net) is not terribly clear why only one call site was converted, or what the plan is for other, similar calls to die() to report bugs. Let's just convert all remaining ones in one fell swoop. This trick was performed by this invocation: sed -i 's/die("BUG: /BUG("/g' $(git grep -l 'die("BUG' \*.c) Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2018-05-06 19:06:13 +09:00
brian m. carlson	fd521245e6	revision: replace use of hard-coded constants Replace two uses of the hard-coded constant 40 with references to the_hash_algo. Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2018-05-02 13:59:51 +09:00
brian m. carlson	14c3c80c81	packfile: convert has_sha1_pack to object_id Convert this function to take a pointer to struct object_id and rename it has_object_pack for consistency with has_object_file. Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2018-05-02 13:59:49 +09:00
Stefan Beller	23a3f0cb16	refs: add repository argument to get_main_ref_store Add a repository argument to allow the get_main_ref_store caller to be more specific about which repository to handle. This is a small mechanical change; it doesn't change the implementation to handle repositories other than the_repository yet. As with the previous commits, use a macro to catch callers passing a repository other than the_repository at compile time. Signed-off-by: Jonathan Nieder <jrnieder@gmail.com> Signed-off-by: Stefan Beller <sbeller@google.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2018-04-12 11:38:56 +09:00
Derrick Stolee	2e27bd7731	treewide: replace maybe_tree with accessor methods In anticipation of making trees load lazily, create a Coccinelle script (contrib/coccinelle/commit.cocci) to ensure that all references to the 'maybe_tree' member of struct commit are either mutations or accesses through get_commit_tree() or get_commit_tree_oid(). Apply the Coccinelle script to create the rest of the patch. Signed-off-by: Derrick Stolee <dstolee@microsoft.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2018-04-11 10:47:16 +09:00
Derrick Stolee	891435d55d	treewide: rename tree to maybe_tree Using the commit-graph file to walk commit history removes the large cost of parsing commits during the walk. This exposes a performance issue: lookup_tree() takes a large portion of the computation time, even when Git never uses those trees. In anticipation of lazy-loading these trees, rename the 'tree' member of struct commit to 'maybe_tree'. This serves two purposes: it hints at the future role of possibly being NULL even if the commit has a valid tree, and it allows for unambiguous transformation from simple member access (i.e. commit->maybe_tree) to method access. Signed-off-by: Derrick Stolee <dstolee@microsoft.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2018-04-11 10:47:16 +09:00
Junio C Hamano	2ee13a780b	Merge branch 'jk/cached-commit-buffer' into HEAD * jk/cached-commit-buffer: revision: drop --show-all option commit: drop uses of get_cached_commit_buffer() Git 2.16.2	2018-03-13 13:35:25 -07:00
Junio C Hamano	d5120daba4	Merge branch 'ds/mark-parents-uninteresting-optim' Micro optimization in revision traversal code. * ds/mark-parents-uninteresting-optim: revision.c: reduce object database queries	2018-03-08 12:36:27 -08:00
Junio C Hamano	e33c3322b6	Merge branch 'jk/cached-commit-buffer' Code clean-up. * jk/cached-commit-buffer: revision: drop --show-all option commit: drop uses of get_cached_commit_buffer()	2018-03-06 14:54:05 -08:00
Derrick Stolee	ebbed3ba04	revision.c: reduce object database queries In mark_parents_uninteresting(), we check for the existence of an object file to see if we should treat a commit as parsed. The result is to set the "parsed" bit on the commit. Modify the condition to only check has_object_file() if the result would change the parsed bit. When a local branch is different from its upstream ref, "git status" will compute ahead/behind counts. This uses paint_down_to_common() and hits mark_parents_uninteresting(). On a copy of the Linux repo with a local instance of "master" behind the remote branch "origin/master" by ~60,000 commits, we find the performance of "git status" went from 1.42 seconds to 1.32 seconds, for a relative difference of -7.0%. Signed-off-by: Derrick Stolee <dstolee@microsoft.com> Reviewed-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2018-02-27 15:04:52 -08:00
Jeff King	f74bbc8dd2	revision: drop --show-all option This was an undocumented debugging aid that does not seem to have come in handy in the past decade, judging from its lack of mentions on the mailing list. Let's drop it in the name of simplicity. This is morally a revert of `3131b71301` (Add "--show-all" revision walker flag for debugging, 2008-02-09), but note that I did leave in the mapping of UNINTERESTING to "^" in get_revision_mark(). I don't think this would be possible to trigger with the current code, but it's the only sensible marker. We'll skip the usual deprecation period because this was explicitly a debugging aid that was never documented. Signed-off-by: Jeff King <peff@peff.net> Acked-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2018-02-22 12:15:25 -08:00
Junio C Hamano	e17cec27d1	Merge branch 'rs/lose-leak-pending' into maint API clean-up around revision traversal. * rs/lose-leak-pending: commit: remove unused function clear_commit_marks_for_object_array() revision: remove the unused flag leak_pending checkout: avoid using the rev_info flag leak_pending bundle: avoid using the rev_info flag leak_pending bisect: avoid using the rev_info flag leak_pending object: add clear_commit_marks_all() ref-filter: use clear_commit_marks_many() in do_merge_filter() commit: use clear_commit_marks_many() in remove_redundant() commit: avoid allocation in clear_commit_marks_many()	2018-02-15 15:18:11 -08:00
Junio C Hamano	e75c862125	Merge branch 'tg/split-index-fixes' The split-index mode had a few corner case bugs fixed. * tg/split-index-fixes: travis: run tests with GIT_TEST_SPLIT_INDEX split-index: don't write cache tree with null oid entries read-cache: fix reading the shared index for other repos	2018-02-13 13:39:13 -08:00
Junio C Hamano	f3d618d2bf	Merge branch 'jh/fsck-promisors' In preparation for implementing narrow/partial clone, the machinery for checking object connectivity used by gc and fsck has been taught that a missing object is OK when it is referenced by a packfile specially marked as coming from trusted repository that promises to make them available on-demand and lazily. * jh/fsck-promisors: gc: do not repack promisor packfiles rev-list: support termination at promisor objects sha1_file: support lazily fetching missing objects introduce fetch-object: fetch one promisor object index-pack: refactor writing of .keep files fsck: support promisor objects as CLI argument fsck: support referenced promisor objects fsck: support refs pointing to promisor objects fsck: introduce partialclone extension extension.partialclone: introduce partial clone extension	2018-02-13 13:39:03 -08:00
Junio C Hamano	c0d75f0e2e	Merge branch 'sb/diff-blobfind-pickaxe' "diff" family of commands learned "--find-object=<object-id>" option to limit the findings to changes that involve the named object. * sb/diff-blobfind-pickaxe: diff: use HAS_MULTI_BITS instead of counting bits manually diff: properly error out when combining multiple pickaxe options diffcore: add a pickaxe option to find a specific blob diff: introduce DIFF_PICKAXE_KINDS_MASK diff: migrate diff_flags.pickaxe_ignore_case to a pickaxe_opts bit diff.h: make pickaxe_opts an unsigned bit field	2018-01-23 13:16:37 -08:00
Junio C Hamano	0bbab7d2ab	Merge branch 'rs/lose-leak-pending' API clean-up around revision traversal. * rs/lose-leak-pending: commit: remove unused function clear_commit_marks_for_object_array() revision: remove the unused flag leak_pending checkout: avoid using the rev_info flag leak_pending bundle: avoid using the rev_info flag leak_pending bisect: avoid using the rev_info flag leak_pending object: add clear_commit_marks_all() ref-filter: use clear_commit_marks_many() in do_merge_filter() commit: use clear_commit_marks_many() in remove_redundant() commit: avoid allocation in clear_commit_marks_many()	2018-01-23 13:16:36 -08:00
Thomas Gummerer	a125a22334	read-cache: fix reading the shared index for other repos read_index_from() takes a path argument for the location of the index file. For reading the shared index in split index mode however it just ignores that path argument, and reads it from the gitdir of the current repository. This works as long as an index in the_repository is read. Once that changes, such as when we read the index of a submodule, or of a different working tree than the current one, the gitdir of the_repository will no longer contain the appropriate shared index, and git will fail to read it. For example t3007-ls-files-recurse-submodules.sh was broken with GIT_TEST_SPLIT_INDEX set in `188dce131f` ("ls-files: use repository object", 2017-06-22), and t7814-grep-recurse-submodules.sh was also broken in a similar manner, probably by introducing struct repository there, although I didn't track down the exact commit for that. `be489d02d2` ("revision.c: --indexed-objects add objects from all worktrees", 2017-08-23) breaks with split index mode in a similar manner, not erroring out when it can't read the index, but instead carrying on with pruning, without taking the index of the worktree into account. Fix this by passing an additional gitdir parameter to read_index_from, to indicate where it should look for and read the shared index from. read_cache_from() defaults to using the gitdir of the_repository. As it is mostly a convenience macro, having to pass get_git_dir() for every call seems overkill, and if necessary users can have more control by using read_index_from(). Helped-by: Brandon Williams <bmwill@google.com> Signed-off-by: Thomas Gummerer <t.gummerer@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2018-01-19 10:36:34 -08:00
Stefan Beller	15af58c1ad	diffcore: add a pickaxe option to find a specific blob Sometimes users are given a hash of an object and they want to identify it further (ex.: Use verify-pack to find the largest blobs, but what are these? or [1]) One might be tempted to extend git-describe to also work with blobs, such that `git describe <blob-id>` gives a description as '<commit-ish>:<path>'. This was implemented at [2]; as seen by the sheer number of responses (>110), it turns out this is tricky to get right. The hard part to get right is picking the correct 'commit-ish' as that could be the commit that (re-)introduced the blob or the blob that removed the blob; the blob could exist in different branches. Junio hinted at a different approach of solving this problem, which this patch implements. Teach the diff machinery another flag for restricting the information to what is shown. For example: $ ./git log --oneline --find-object=v2.0.0:Makefile `b2feb64309` Revert the whole "ask curl-config" topic for now `47fbfded53` i18n: only extract comments marked with "TRANSLATORS:" we observe that the Makefile as shipped with 2.0 was appeared in v1.9.2-471-g47fbfded53 and in v2.0.0-rc1-5-gb2feb6430b. The reason why these commits both occur prior to v2.0.0 are evil merges that are not found using this new mechanism. [1] https://stackoverflow.com/questions/223678/which-commit-has-this-blob [2] https://public-inbox.org/git/20171028004419.10139-1-sbeller@google.com/ Signed-off-by: Stefan Beller <sbeller@google.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2018-01-04 15:02:40 -08:00
Stefan Beller	cf63051ada	diff: introduce DIFF_PICKAXE_KINDS_MASK Currently the check whether to perform pickaxing is done via checking `diffopt->pickaxe`, which contains the command line argument that we want to pickaxe for. Soon we'll introduce a new type of pickaxing, that will not store anything in the `.pickaxe` field, so let's migrate the check to be dependent on pickaxe_opts. It is not enough to just replace the check for pickaxe by pickaxe_opts, because flags might be set, but pickaxing was not requested ('-i'). To cope with that, introduce a mask to check only for the bits indicating the modes of operation. Signed-off-by: Stefan Beller <sbeller@google.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2018-01-04 15:02:40 -08:00
Stefan Beller	c1ddc4610c	diff: migrate diff_flags.pickaxe_ignore_case to a pickaxe_opts bit Currently flags for pickaxing are found in different places. Unify the flags into the `pickaxe_opts` field, which will contain any pickaxe related flags. Signed-off-by: Stefan Beller <sbeller@google.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2018-01-04 15:02:40 -08:00

... 2 3 4 5 6 ...

1003 Commits