development/git - HydraGit

mirror of https://github.com/git/git synced 2024-11-05 18:59:29 +00:00

Author	SHA1	Message	Date
Junio C Hamano	288ed98bf7	Merge branch 'tb/bloom-improvements' "git commit-graph write" learned to limit the number of bloom filters that are computed from scratch with the --max-new-filters option. * tb/bloom-improvements: commit-graph: introduce 'commitGraph.maxNewFilters' builtin/commit-graph.c: introduce '--max-new-filters=<n>' commit-graph: rename 'split_commit_graph_opts' bloom: encode out-of-bounds filters as non-empty bloom/diff: properly short-circuit on max_changes bloom: use provided 'struct bloom_filter_settings' bloom: split 'get_bloom_filter()' in two commit-graph.c: store maximum changed paths commit-graph: respect 'commitGraph.readChangedPaths' t/helper/test-read-graph.c: prepare repo settings commit-graph: pass a 'struct repository *' in more places t4216: use an '&&'-chain commit-graph: introduce 'get_bloom_filter_settings()'	2020-09-29 14:01:20 -07:00
Taylor Blau	312cff5207	bloom: split 'get_bloom_filter()' in two 'get_bloom_filter' takes a flag to control whether it will compute a Bloom filter if the requested one is missing. In the next patch, we'll add yet another parameter to this method, which would force all but one caller to specify an extra 'NULL' parameter at the end. Instead of doing this, split 'get_bloom_filter' into two functions: 'get_bloom_filter' and 'get_or_compute_bloom_filter'. The former only looks up a Bloom filter (and does not compute one if it's missing, thus dropping the 'compute_if_not_present' flag). The latter does compute missing Bloom filters, with an additional parameter to store whether or not it needed to do so. This simplifies many call-sites, since the majority of existing callers to 'get_bloom_filter' do not want missing Bloom filters to be computed (so they can drop the parameter entirely and use the simpler version of the function). While we're at it, instrument the new 'get_or_compute_bloom_filter()' with counters in the 'write_commit_graph_context' struct which store the number of filters that we did and didn't compute, as well as filters that were truncated. It would be nice to drop the 'compute_if_not_present' flag entirely, since all remaining callers of 'get_or_compute_bloom_filter' pass it as '1', but this will change in a future patch and hence cannot be removed. Signed-off-by: Taylor Blau <me@ttaylorr.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2020-09-17 09:31:25 -07:00
Jeff King	ef8d7ac42a	strvec: convert more callers away from argv_array name We eventually want to drop the argv_array name and just use strvec consistently. There's no particular reason we have to do it all at once, or care about interactions between converted and unconverted bits. Because of our preprocessor compat layer, the names are interchangeable to the compiler (so even a definition and declaration using different names is OK). This patch converts remaining files from the first half of the alphabet, to keep the diff to a manageable size. The conversion was done purely mechanically with: git ls-files '.c' '.h' \| xargs perl -i -pe ' s/ARGV_ARRAY/STRVEC/g; s/argv_array/strvec/g; ' and then selectively staging files with "git add '[abcdefghjkl]*'". We'll deal with any indentation/style fallouts separately. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2020-07-28 15:02:18 -07:00
Jeff King	dbbcd44fb4	strvec: rename files from argv-array to strvec This requires updating #include lines across the code-base, but that's all fairly mechanical, and was done with: git ls-files '.c' '.h' \| xargs perl -i -pe 's/argv-array.h/strvec.h/' Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2020-07-28 15:02:17 -07:00
Junio C Hamano	c3a02824cf	Merge branch 'ds/line-log-on-bloom' "git log -L..." now takes advantage of the "which paths are touched by this commit?" info stored in the commit-graph system. * ds/line-log-on-bloom: line-log: integrate with changed-path Bloom filters line-log: try to use generation number-based topo-ordering line-log: more responsive, incremental 'git log -L' t4211-line-log: add tests for parent oids line-log: remove unused fields from 'struct line_log_data'	2020-06-08 18:06:26 -07:00
Derrick Stolee	f32dde8c12	line-log: integrate with changed-path Bloom filters The previous changes to the line-log machinery focused on making the first result appear faster. This was achieved by no longer walking the entire commit history before returning the early results. There is still another way to improve the performance: walk most commits much faster. Let's use the changed-path Bloom filters to reduce time spent computing diffs. Since the line-log computation requires opening blobs and checking the content-diff, there is still a lot of necessary computation that cannot be replaced with changed-path Bloom filters. The part that we can reduce is most effective when checking the history of a file that is deep in several directories and those directories are modified frequently. In this case, the computation to check if a commit is TREESAME to its first parent takes a large fraction of the time. That is ripe for improvement with changed-path Bloom filters. We must ensure that prepare_to_use_bloom_filters() is called in revision.c so that the bloom_filter_settings are loaded into the struct rev_info from the commit-graph. Of course, some cases are still forbidden, but in the line-log case the pathspec is provided in a different way than normal. Since multiple paths and segments could be requested, we compute the struct bloom_key data dynamically during the commit walk. This could likely be improved, but adds code complexity that is not valuable at this time. There are two cases to care about: merge commits and "ordinary" commits. Merge commits have multiple parents, but if we are TREESAME to our first parent in every range, then pass the blame for all ranges to the first parent. Ordinary commits have the same condition, but each is done slightly differently in the process_ranges_[merge\|ordinary]_commit() methods. By checking if the changed-path Bloom filter can guarantee TREESAME, we can avoid that tree-diff cost. If the filter says "probably changed", then we need to run the tree-diff and then the blob-diff if there was a real edit. The Linux kernel repository is a good testing ground for the performance improvements claimed here. There are two different cases to test. The first is the "entire history" case, where we output the entire history to /dev/null to see how long it would take to compute the full line-log history. The second is the "first result" case, where we find how long it takes to show the first value, which is an indicator of how quickly a user would see responses when waiting at a terminal. To test, I selected the paths that were changed most frequently in the top 10,000 commits using this command (stolen from StackOverflow [1]): git log --pretty=format: --name-only -n 10000 \| sort \| \ uniq -c \| sort -rg \| head -10 which results in 121 MAINTAINERS 63 fs/namei.c 60 arch/x86/kvm/cpuid.c 59 fs/io_uring.c 58 arch/x86/kvm/vmx/vmx.c 51 arch/x86/kvm/x86.c 45 arch/x86/kvm/svm.c 42 fs/btrfs/disk-io.c 42 Documentation/scsi/index.rst (along with a bogus first result). It appears that the path arch/x86/kvm/svm.c was renamed, so we ignore that entry. This leaves the following results for the real command time: \| \| Entire History \| First Result \| \| Path \| Before \| After \| Before \| After \| \|------------------------------\|--------\|--------\|--------\|--------\| \| MAINTAINERS \| 4.26 s \| 3.87 s \| 0.41 s \| 0.39 s \| \| fs/namei.c \| 1.99 s \| 0.99 s \| 0.42 s \| 0.21 s \| \| arch/x86/kvm/cpuid.c \| 5.28 s \| 1.12 s \| 0.16 s \| 0.09 s \| \| fs/io_uring.c \| 4.34 s \| 0.99 s \| 0.94 s \| 0.27 s \| \| arch/x86/kvm/vmx/vmx.c \| 5.01 s \| 1.34 s \| 0.21 s \| 0.12 s \| \| arch/x86/kvm/x86.c \| 2.24 s \| 1.18 s \| 0.21 s \| 0.14 s \| \| fs/btrfs/disk-io.c \| 1.82 s \| 1.01 s \| 0.06 s \| 0.05 s \| \| Documentation/scsi/index.rst \| 3.30 s \| 0.89 s \| 1.46 s \| 0.03 s \| It is worth noting that the least speedup comes for the MAINTAINERS file which is * edited frequently, * low in the directory heirarchy, and * quite a large file. All of those points lead to spending more time doing the blob diff and less time doing the tree diff. Still, we see some improvement in that case and significant improvement in other cases. A 2-4x speedup is likely the more typical case as opposed to the small 5% change for that file. Signed-off-by: Derrick Stolee <dstolee@microsoft.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2020-05-11 09:33:56 -07:00
SZEDER Gábor	3cb9d2b6f9	line-log: more responsive, incremental 'git log -L' The current line-level log implementation performs a preprocessing step in prepare_revision_walk(), during which the line_log_filter() function filters and rewrites history to keep only commits modifying the given line range. This preprocessing affects both responsiveness and correctness: - Git doesn't produce any output during this preprocessing step. Checking whether a commit modified the given line range is somewhat expensive, so depending on the size of the given revision range this preprocessing can result in a significant delay before the first commit is shown. - Limiting the number of displayed commits (e.g. 'git log -3 -L...') doesn't limit the amount of work during preprocessing, because that limit is applied during history traversal. Alas, by that point this expensive preprocessing step has already churned through the whole revision range to find all commits modifying the revision range, even though only a few of them need to be shown. - It rewrites parents, with no way to turn it off. Without the user explicitly requesting parent rewriting any parent object ID shown should be that of the immediate parent, just like in case of a pathspec-limited history traversal without parent rewriting. However, after that preprocessing step rewrote history, the subsequent "regular" history traversal (i.e. get_revision() in a loop) only sees commits modifying the given line range. Consequently, it can only show the object ID of the last ancestor that modified the given line range (which might happen to be the immediate parent, but many-many times it isn't). This patch addresses both the correctness and, at least for the common case, the responsiveness issues by integrating line-level log filtering into the regular revision walking machinery: - Make process_ranges_arbitrary_commit(), the static function in 'line-log.c' deciding whether a commit modifies the given line range, public by removing the static keyword and adding the 'line_log_' prefix, so it can be called from other parts of the revision walking machinery. - If the user didn't explicitly ask for parent rewriting (which, I believe, is the most common case): - Call this now-public function during regular history traversal, namely from get_commit_action() to ignore any commits not modifying the given line range. Note that while this check is relatively expensive, it must be performed before other, much cheaper conditions, because the tracked line range must be adjusted even when the commit will end up being ignored by other conditions. - Skip the line_log_filter() call, i.e. the expensive preprocessing step, in prepare_revision_walk(), because, thanks to the above points, the revision walking machinery is now able to filter out commits not modifying the given line range while traversing history. This way the regular history traversal sees the unmodified history, and is therefore able to print the object ids of the immediate parents of the listed commits. The eliminated preprocessing step can greatly reduce the delay before the first commit is shown, see the numbers below. - However, if the user did explicitly ask for parent rewriting via '--parents' or a similar option, then stick with the current implementation for now, i.e. perform that expensive filtering and history rewriting in the preprocessing step just like we did before, leaving the initial delay as long as it was. I tried to integrate line-level log filtering with parent rewriting into the regular history traversal, but, unfortunately, several subtleties resisted... :) Maybe someday we'll figure out how to do that, but until then at least the simple and common (i.e. without parent rewriting) 'git log -L:func:file' commands can benefit from the reduced delay. This change makes the failing 'parent oids without parent rewriting' test in 't4211-line-log.sh' succeed. The reduced delay is most noticable when there's a commit modifying the line range near the tip of a large-ish revision range: # no parent rewriting requested, no commit-graph present $ time git --no-pager log -L:read_alternate_refs:sha1-file.c -1 v2.23.0 Before: real 0m9.570s user 0m9.494s sys 0m0.076s After: real 0m0.718s user 0m0.674s sys 0m0.044s A significant part of the remaining delay is spent reading and parsing commit objects in limit_list(). With the help of the commit-graph we can eliminate most of that reading and parsing overhead, so here are the timing results of the same command as above, but this time using the commit-graph: Before: real 0m8.874s user 0m8.816s sys 0m0.057s After: real 0m0.107s user 0m0.091s sys 0m0.013s The next patch will further reduce the remaining delay. To be clear: this patch doesn't actually optimize the line-level log, but merely moves most of the work from the preprocessing step to the history traversal, so the commits modifying the line range can be shown as soon as they are processed, and the traversal can be terminated as soon as the given number of commits are shown. Consequently, listing the full history of a line range, potentially all the way to the root commit, will take the same time as before (but at least the user might start reading the output earlier). Furthermore, if the most recent commit modifying the line range is far away from the starting revision, then that initial delay will still be significant. Additional testing by Derrick Stolee: In the Linux kernel repository, the MAINTAINERS file was changed ~3,500 times across the ~915,000 commits. In addition to that edit frequency, the file itself is quite large (~18,700 lines). This means that a significant portion of the computation is taken up by computing the patch-diff of the file. This patch improves the real time it takes to output the first result quite a bit: Command: git log -L 100,200:MAINTAINERS -n 1 >/dev/null Before: 3.88 s After: 0.71 s If we drop the "-n 1" in the command, then there is no change in end-to-end process time. This is because the command still needs to walk the entire commit history, which negates the point of this patch. This is expected. As a note for future reference, the ~4.3 seconds in the old code spends ~2.6 seconds computing the patch-diffs, and the rest of the time is spent walking commits and computing diffs for which paths changed at each commit. The changed-path Bloom filters could improve the end-to-end computation time (i.e. no "-n 1" in the command). Signed-off-by: SZEDER Gábor <szeder.dev@gmail.com> Signed-off-by: Derrick Stolee <dstolee@microsoft.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2020-05-11 09:33:56 -07:00
Jonathan Tan	1c37e86ab2	diff: make diff_populate_filespec_options struct The behavior of diff_populate_filespec() currently can be customized through a bitflag, but a subsequent patch requires it to support a non-boolean option. Replace the bitflag with an options struct. Signed-off-by: Jonathan Tan <jonathantanmy@google.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2020-04-07 16:09:29 -07:00
Junio C Hamano	de67293e74	Merge branch 'sg/line-log-tree-diff-optim' Optimize unnecessary full-tree diff away from "git log -L" machinery. * sg/line-log-tree-diff-optim: line-log: avoid unnecessary full tree diffs line-log: extract pathspec parsing from line ranges into a helper function	2019-09-18 11:50:09 -07:00
SZEDER Gábor	a2bb801f6a	line-log: avoid unnecessary full tree diffs With rename detection enabled the line-level log is able to trace the evolution of line ranges across whole-file renames [1]. Alas, to achieve that it uses the diff machinery very inefficiently, making the operation very slow [2]. And since rename detection is enabled by default, the line-level log is very slow by default. When the line-level log processes a commit with rename detection enabled, it currently does the following (see queue_diffs()): 1. Computes a full tree diff between the commit and (one of) its parent(s), i.e. invokes diff_tree_oid() with an empty 'diffopt->pathspec'. 2. Checks whether any paths in the line ranges were modified. 3. Checks whether any modified paths in the line ranges are missing in the parent commit's tree. 4. If there is such a missing path, then calls diffcore_std() to figure out whether the path was indeed renamed based on the previously computed full tree diff. 5. Continues doing stuff that are unrelated to the slowness. So basically the line-level log computes a full tree diff for each commit-parent pair in step (1) to be used for rename detection in step (4) in the off chance that an interesting path is missing from the parent. Avoid these expensive and mostly unnecessary full tree diffs by limiting the diffs to paths in the line ranges. This is much cheaper, and makes step (2) unnecessary. If it turns out that an interesting path is missing from the parent, then fall back and compute a full tree diff, so the rename detection will still work. Care must be taken when to update the pathspec used to limit the diff in case of renames. A path might be renamed on one branch and modified on several parallel running branches, and while processing commits on these branches the line-level log might have to alternate between looking at a path's new and old name. However, at any one time there is only a single 'diffopt->pathspec'. So add a step (0) to the above to ensure that the paths in the pathspec match the paths in the line ranges associated with the currently processed commit, and re-parse the pathspec from the paths in the line ranges if they differ. The new test cases include a specially crafted piece of history with two merged branches and two files, where each branch modifies both files, renames on of them, and then modifies both again. Then two separate 'git log -L' invocations check the line-level log of each of those two files, which ensures that at least one of those invocations have to do that back-and-forth between the file's old and new name (no matter which branch is traversed first). 't/t4211-line-log.sh' already contains two tests involving renames, they don't don't trigger this back-and-forth. Avoiding these unnecessary full tree diffs can have huge impact on performance, especially in big repositories with big trees and mergy history. Tracing the evolution of a function through the whole history: # git.git $ time git --no-pager log -L:read_alternate_refs:sha1-file.c v2.23.0 Before: real 0m8.874s user 0m8.816s sys 0m0.057s After: real 0m2.516s user 0m2.456s sys 0m0.060s # linux.git $ time ~/src/git/git --no-pager log \ -L:build_restore_work_registers:arch/mips/mm/tlbex.c v5.2 Before: real 3m50.033s user 3m48.041s sys 0m0.300s After: real 0m2.599s user 0m2.466s sys 0m0.157s That's just over 88x speedup. [1] Line-level log's rename following is quite similar to 'git log --follow path', with the notable differences that it does handle multiple paths at once as well, and that it doesn't show the commit performing the rename if it's an exact rename. [2] This slowness might not have been apparent initially, because back when the line-level log feature was introduced rename detection was not yet enabled by default; `12da1d1f6f` (Implement line-history search (git log -L), 2013-03-28) and `5404c116aa` (diff: activate diff.renames by default, 2016-02-25). Signed-off-by: SZEDER Gábor <szeder.dev@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2019-08-21 10:17:54 -07:00
SZEDER Gábor	eef5204190	line-log: extract pathspec parsing from line ranges into a helper function A helper function to parse the paths involved in the line ranges and to turn them into a pathspec will be useful in the next patch. Signed-off-by: SZEDER Gábor <szeder.dev@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2019-08-21 10:17:52 -07:00
Nguyễn Thái Ngọc Duy	50ddb089ff	tree-walk.c: remove the_repo from get_tree_entry() Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2019-06-27 12:45:17 -07:00
Junio C Hamano	96379f043f	Merge branch 'en/merge-directory-renames' "git merge-recursive" backend recently learned a new heuristics to infer file movement based on how other files in the same directory moved. As this is inherently less robust heuristics than the one based on the content similarity of the file itself (rather than based on what its neighbours are doing), it sometimes gives an outcome unexpected by the end users. This has been toned down to leave the renamed paths in higher/conflicted stages in the index so that the user can examine and confirm the result. * en/merge-directory-renames: merge-recursive: switch directory rename detection default merge-recursive: give callers of handle_content_merge() access to contents merge-recursive: track information associated with directory renames t6043: fix copied test description to match its purpose merge-recursive: switch from (oid,mode) pairs to a diff_filespec merge-recursive: cleanup handle_rename_* function signatures merge-recursive: track branch where rename occurred in rename struct merge-recursive: remove ren[12]_other fields from rename_conflict_info merge-recursive: shrink rename_conflict_info merge-recursive: move some struct declarations together merge-recursive: use 'ci' for rename_conflict_info variable name merge-recursive: rename locals 'o' and 'a' to 'obuf' and 'abuf' merge-recursive: rename diff_filespec 'one' to 'o' merge-recursive: rename merge_options argument from 'o' to 'opt' Use 'unsigned short' for mode, like diff_filespec does	2019-05-09 00:37:22 +09:00
Elijah Newren	5ec1e72823	Use 'unsigned short' for mode, like diff_filespec does struct diff_filespec defines mode to be an 'unsigned short'. Several other places in the API which we'd like to interact with using a diff_filespec used a plain unsigned (or unsigned int). This caused problems when taking addresses, so switch to unsigned short. Signed-off-by: Elijah Newren <newren@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2019-04-08 16:02:07 +09:00
Jeff King	9f607cd09c	line-log: suppress diff output with "-s" When "-L" is in use, we ignore any diff output format that the user provides to us, and just always print a patch (with extra context lines covering the whole area of interest). It's not entirely clear what we should do with all formats (e.g., should "--stat" show just the diffstat of the touched lines, or the stat for the whole file?). But "-s" is pretty clear: the user probably wants to see just the commits that touched those lines, without any diff at all. Let's at least make that work. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2019-03-08 10:27:01 +09:00
Nguyễn Thái Ngọc Duy	363df5572c	line-log.c: remove the_repository reference Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2018-11-12 14:50:06 +09:00
Nguyễn Thái Ngọc Duy	80e0385541	line-range.c: remove implicit dependency on the_index Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2018-09-21 09:51:18 -07:00
Nguyễn Thái Ngọc Duy	b78ea5fc35	diff.c: reduce implicit dependency on the_index diff and textconv code has so widespread use that it's hard to simply update their api and all call sites at once because it would result in a big patch. For now reduce the_index references to two places: diff_setup() and fill_textconv(). Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2018-09-21 09:48:10 -07:00
Junio C Hamano	3a2a1dc170	Merge branch 'sb/object-store-lookup' lookup_commit_reference() and friends have been updated to find in-core object for a specific in-core repository instance. * sb/object-store-lookup: (32 commits) commit.c: allow lookup_commit_reference to handle arbitrary repositories commit.c: allow lookup_commit_reference_gently to handle arbitrary repositories tag.c: allow deref_tag to handle arbitrary repositories object.c: allow parse_object to handle arbitrary repositories object.c: allow parse_object_buffer to handle arbitrary repositories commit.c: allow get_cached_commit_buffer to handle arbitrary repositories commit.c: allow set_commit_buffer to handle arbitrary repositories commit.c: migrate the commit buffer to the parsed object store commit-slabs: remove realloc counter outside of slab struct commit.c: allow parse_commit_buffer to handle arbitrary repositories tag: allow parse_tag_buffer to handle arbitrary repositories tag: allow lookup_tag to handle arbitrary repositories commit: allow lookup_commit to handle arbitrary repositories tree: allow lookup_tree to handle arbitrary repositories blob: allow lookup_blob to handle arbitrary repositories object: allow lookup_object to handle arbitrary repositories object: allow object_as_type to handle arbitrary repositories tag: add repository argument to deref_tag tag: add repository argument to parse_tag_buffer tag: add repository argument to lookup_tag ...	2018-08-02 15:30:42 -07:00
Junio C Hamano	6566a917d8	Merge branch 'is/parsing-line-range' Parsing of -L[<N>][,[<M>]] parameters "git blame" and "git log" take has been tweaked. * is/parsing-line-range: log: prevent error if line range ends past end of file blame: prevent error if range ends past end of file	2018-08-02 15:30:41 -07:00
Stefan Beller	a74093da5e	tag: add repository argument to deref_tag Add a repository argument to allow the callers of deref_tag to be more specific about which repository to act on. This is a small mechanical change; it doesn't change the implementation to handle repositories other than the_repository yet. As with the previous commits, use a macro to catch callers passing a repository other than the_repository at compile time. Signed-off-by: Jonathan Nieder <jrnieder@gmail.com> Signed-off-by: Stefan Beller <sbeller@google.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2018-06-29 10:43:39 -07:00
Isabella Stephens	7f81c00f3b	log: prevent error if line range ends past end of file If the -L option is used to specify a line range in git log, and the end of the range is past the end of the file, git will fail with a fatal error. This commit prevents such behaviour - instead we perform the log for existing lines within the specified range. This commit also fixes a corner case where -L ,-n:file would be treated as a log over the whole file. Now we treat this as -L 1,-n:file and blame the first line of the file instead. Signed-off-by: Isabella Stephens <istephens@atlassian.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2018-06-15 10:29:14 -07:00
Junio C Hamano	c89b6e136e	Merge branch 'ds/lazy-load-trees' The code has been taught to use the duplicated information stored in the commit-graph file to learn the tree object name for a commit to avoid opening and parsing the commit object when it makes sense to do so. * ds/lazy-load-trees: coccinelle: avoid wrong transformation suggestions from commit.cocci commit-graph: lazy-load trees for commits treewide: replace maybe_tree with accessor methods commit: create get_commit_tree() method treewide: rename tree to maybe_tree	2018-05-23 14:38:13 +09:00
Derrick Stolee	2e27bd7731	treewide: replace maybe_tree with accessor methods In anticipation of making trees load lazily, create a Coccinelle script (contrib/coccinelle/commit.cocci) to ensure that all references to the 'maybe_tree' member of struct commit are either mutations or accesses through get_commit_tree() or get_commit_tree_oid(). Apply the Coccinelle script to create the rest of the patch. Signed-off-by: Derrick Stolee <dstolee@microsoft.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2018-04-11 10:47:16 +09:00
Derrick Stolee	891435d55d	treewide: rename tree to maybe_tree Using the commit-graph file to walk commit history removes the large cost of parsing commits during the walk. This exposes a performance issue: lookup_tree() takes a large portion of the computation time, even when Git never uses those trees. In anticipation of lazy-loading these trees, rename the 'tree' member of struct commit to 'maybe_tree'. This serves two purposes: it hints at the future role of possibly being NULL even if the commit has a valid tree, and it allows for unambiguous transformation from simple member access (i.e. commit->maybe_tree) to method access. Signed-off-by: Derrick Stolee <dstolee@microsoft.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2018-04-11 10:47:16 +09:00
brian m. carlson	916bc35b29	tree-walk: convert tree entry functions to object_id Convert get_tree_entry and find_tree_entry to take pointers to struct object_id. Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2018-03-14 09:23:50 -07:00
Brandon Williams	3ce9149619	line-log: rename 'new' variables Rename C++ keyword in order to bring the codebase closer to being able to be compiled with a C++ compiler. Signed-off-by: Brandon Williams <bmwill@google.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2018-02-22 10:08:05 -08:00
Ramsay Jones	071bcaab64	ALLOC_GROW: avoid -Wsign-compare warnings Signed-off-by: Ramsay Jones <ramsay@ramsayjones.plus.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2017-09-22 13:21:11 +09:00
Junio C Hamano	50f03c6676	Merge branch 'ab/free-and-null' A common pattern to free a piece of memory and assign NULL to the pointer that used to point at it has been replaced with a new FREE_AND_NULL() macro. * ab/free-and-null: *.[ch] refactoring: make use of the FREE_AND_NULL() macro coccinelle: make use of the "expression" FREE_AND_NULL() rule coccinelle: add a rule to make "expression" code use FREE_AND_NULL() coccinelle: make use of the "type" FREE_AND_NULL() rule coccinelle: add a rule to make "type" code use FREE_AND_NULL() git-compat-util: add a FREE_AND_NULL() wrapper around free(ptr); ptr = NULL	2017-06-24 14:28:41 -07:00
Ævar Arnfjörð Bjarmason	88ce3ef636	*.[ch] refactoring: make use of the FREE_AND_NULL() macro Replace occurrences of `free(ptr); ptr = NULL` which weren't caught by the coccinelle rule. These fall into two categories: - free/NULL assignments one after the other which coccinelle all put on one line, which is functionally equivalent code, but very ugly. - manually spotted occurrences where the NULL assignment isn't right after the free() call. Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2017-06-16 12:44:09 -07:00
Ævar Arnfjörð Bjarmason	6a83d90207	coccinelle: make use of the "type" FREE_AND_NULL() rule Apply the result of the just-added coccinelle rule. This manually excludes a few occurrences, mostly things that resulted in many FREE_AND_NULL() on one line, that'll be manually fixed in a subsequent change. Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2017-06-16 12:44:03 -07:00
Brandon Williams	66f414f885	diff-tree: convert diff_tree_sha1 to struct object_id Signed-off-by: Brandon Williams <bmwill@google.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2017-06-05 11:23:58 +09:00
Brandon Williams	f9704c2d82	diff: convert fill_filespec to struct object_id Signed-off-by: Brandon Williams <bmwill@google.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2017-06-02 09:36:07 +09:00
Johannes Schindelin	6cf034e7ed	line-log: avoid memory leak Discovered by Coverity. Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2017-05-08 12:18:20 +09:00
Junio C Hamano	6c621015f2	Merge branch 'vn/line-log-memcpy-size-fix' The command-line parsing of "git log -L" copied internal data structures using incorrect size on ILP32 systems. * vn/line-log-memcpy-size-fix: line-log: use COPY_ARRAY to fix mis-sized memcpy	2017-03-12 23:21:35 -07:00
Junio C Hamano	36d5286f68	Merge branch 'ax/line-log-range-merge-fix' The code to parse "git log -L..." command line was buggy when there are many ranges specified with -L; overrun of the allocated buffer has been fixed. * ax/line-log-range-merge-fix: line-log.c: prevent crash during union of too many ranges	2017-03-12 23:21:34 -07:00
Vegard Nossum	07f546cda5	line-log: use COPY_ARRAY to fix mis-sized memcpy This memcpy meant to get the sizeof a "struct range", not a "range_set", as the former is what our array holds. Rather than swap out the types, let's convert this site to COPY_ARRAY, which avoids the problem entirely (and confirms that the src and dst types match). Note for curiosity's sake that this bug doesn't trigger on I32LP64 systems, but does on ILP32 systems. The mistaken "struct range_set" has two ints and a pointer. That's 16 bytes on LP64, or 12 on ILP32. The correct "struct range" type has two longs, which is also 16 on LP64, but only 8 on ILP32. Likewise an IL32P64 system would experience the bug. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Vegard Nossum <vegard.nossum@oracle.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2017-03-06 12:01:02 -08:00
Allan Xavier	aaae0bf787	line-log.c: prevent crash during union of too many ranges The existing implementation of range_set_union does not correctly reallocate memory, leading to a heap overflow when it attempts to union more than 24 separate line ranges. For struct range_set *out to grow correctly it must have out->nr set to the current size of the buffer when it is passed to range_set_grow. However, the existing implementation of range_set_union only updates out->nr at the end of the function, meaning that it is always zero before this. This results in range_set_grow never growing the buffer, as well as some of the union logic itself being incorrect as !out->nr is always true. The reason why 24 is the limit is that the first allocation of size 1 ends up allocating a buffer of size 24 (due to the call to alloc_nr in ALLOC_GROW). This goes some way to explain why this hasn't been caught before. Fix the problem by correctly updating out->nr after reallocating the range_set. As this results in out->nr containing the same value as the variable o, replace o with out->nr as well. Finally, add a new test to help prevent the problem reoccurring in the future. Thanks to Vegard Nossum for writing the test. Signed-off-by: Allan Xavier <allan.x.xavier@oracle.com> Reviewed-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2017-03-03 11:16:20 -08:00
René Scharfe	9ed0d8d6e6	use QSORT Apply the semantic patch contrib/coccinelle/qsort.cocci to the code base, replacing calls of qsort(3) with QSORT. The resulting code is shorter and supports empty arrays with NULL pointers. Signed-off-by: Rene Scharfe <l.s.r@web.de> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2016-09-29 15:42:18 -07:00
Junio C Hamano	a63d31b4d3	Merge branch 'bc/cocci' Conversion from unsigned char sha1[20] to struct object_id continues. * bc/cocci: diff: convert prep_temp_blob() to struct object_id merge-recursive: convert merge_recursive_generic() to object_id merge-recursive: convert leaf functions to use struct object_id merge-recursive: convert struct merge_file_info to object_id merge-recursive: convert struct stage_data to use object_id diff: rename struct diff_filespec's sha1_valid member diff: convert struct diff_filespec to struct object_id coccinelle: apply object_id Coccinelle transformations coccinelle: convert hashcpy() with null_sha1 to hashclr() contrib/coccinelle: add basic Coccinelle transforms hex: add oid_to_hex_r()	2016-07-19 13:22:16 -07:00
Junio C Hamano	63641fb071	Merge branch 'js/log-to-diffopt-file' The commands in the "log/diff" family have had an FILE* pointer in the data structure they pass around for a long time, but some codepaths used to always write to the standard output. As a preparatory step to make "git format-patch" available to the internal callers, these codepaths have been updated to consistently write into that FILE* instead. * js/log-to-diffopt-file: mingw: fix the shortlog --output=<file> test diff: do not color output when --color=auto and --output=<file> is given t4211: ensure that log respects --output=<file> shortlog: respect the --output=<file> setting format-patch: use stdout directly format-patch: avoid freopen() format-patch: explicitly switch off color when writing to files shortlog: support outputting to streams other than stdout graph: respect the diffopt.file setting line-log: respect diffopt's configured output file stream log-tree: respect diffopt's configured output file stream log: prepare log/log-tree to reuse the diffopt.close_file attribute	2016-07-19 13:22:15 -07:00
brian m. carlson	41c9560ee5	diff: rename struct diff_filespec's sha1_valid member Now that this struct's sha1 member is called "oid", update the comment and the sha1_valid member to be called "oid_valid" instead. The following Coccinelle semantic patch was used to implement this, followed by the transformations in object_id.cocci: @@ struct diff_filespec o; @@ - o.sha1_valid + o.oid_valid @@ struct diff_filespec *p; @@ - p->sha1_valid + p->oid_valid Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2016-06-28 11:39:02 -07:00
brian m. carlson	a0d12c4433	diff: convert struct diff_filespec to struct object_id Convert struct diff_filespec's sha1 member to use a struct object_id called "oid" instead. The following Coccinelle semantic patch was used to implement this, followed by the transformations in object_id.cocci: @@ struct diff_filespec o; @@ - o.sha1 + o.oid.hash @@ struct diff_filespec *p; @@ - p->sha1 + p->oid.hash Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2016-06-28 11:39:02 -07:00
Junio C Hamano	2a5618ec78	Merge branch 'jc/deref-tag' Code clean-up. * jc/deref-tag: blame, line-log: do not loop around deref_tag()	2016-06-27 09:56:50 -07:00
Johannes Schindelin	179795e511	line-log: respect diffopt's configured output file stream The diff machinery can optionally output to a file stream other than stdout, by overriding diffopt.file. In such a case, the rest of the log tree machinery should also write to that stream. Currently, there is no user of the line level log that wants to redirect output to a file. Therefore, one might argue that it is superfluous to support that now. However, it is better to be consistent now, rather than to face hard-to-debug problems later. Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2016-06-24 13:59:58 -07:00
Junio C Hamano	31da121f2d	blame, line-log: do not loop around deref_tag() These callers appear to expect that deref_tag() is to peel one layer of a tag, but the function does not work that way; it has its own loop to unwrap tags until an object that is not a tag appears. Signed-off-by: Junio C Hamano <gitster@pobox.com>	2016-06-14 13:38:14 -07:00
Jeff King	b32fa95fd8	convert trivial cases to ALLOC_ARRAY Each of these cases can be converted to use ALLOC_ARRAY or REALLOC_ARRAY, which has two advantages: 1. It automatically checks the array-size multiplication for overflow. 2. It always uses sizeof(*array) for the element-size, so that it can never go out of sync with the declared type of the array. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2016-02-22 14:51:09 -08:00
Jeff King	850d2fec53	convert manual allocations to argv_array There are many manual argv allocations that predate the argv_array API. Switching to that API brings a few advantages: 1. We no longer have to manually compute the correct final array size (so it's one less thing we can screw up). 2. In many cases we had to make a separate pass to count, then allocate, then fill in the array. Now we can do it in one pass, making the code shorter and easier to follow. 3. argv_array handles memory ownership for us, making it more obvious when things should be free()d and and when not. Most of these cases are pretty straightforward. In some, we switch from "run_command_v" to "run_command" which lets us directly use the argv_array embedded in "struct child_process". Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2016-02-22 14:50:32 -08:00
brian m. carlson	ed1c9977cb	Remove get_object_hash. Convert all instances of get_object_hash to use an appropriate reference to the hash member of the oid member of struct object. This provides no functional change, as it is essentially a macro substitution. Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net> Signed-off-by: Jeff King <peff@peff.net>	2015-11-20 08:02:05 -05:00
brian m. carlson	7999b2cf77	Add several uses of get_object_hash. Convert most instances where the sha1 member of struct object is dereferenced to use get_object_hash. Most instances that are passed to functions that have versions taking struct object_id, such as get_sha1_hex/get_oid_hex, or instances that can be trivially converted to use struct object_id instead, are not converted. Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net> Signed-off-by: Jeff King <peff@peff.net>	2015-11-20 08:02:05 -05:00

1 2

84 commits