development/git - HydraGit

mirror of https://github.com/git/git synced 2024-11-05 18:59:29 +00:00

Author	SHA1	Message	Date
Elijah Newren	d1cbe1e6d8	hash-ll.h: split out of hash.h to remove dependency on repository.h hash.h depends upon and includes repository.h, due to the definition and use of the_hash_algo (defined as the_repository->hash_algo). However, most headers trying to include hash.h are only interested in the layout of the structs like object_id. Move the parts of hash.h that do not depend upon repository.h into a new file hash-ll.h (the "low level" parts of hash.h), and adjust other files to use this new header where the convenience inline functions aren't needed. This allows hash.h and object.h to be fairly small, minimal headers. It also exposes a lot of hidden dependencies on both path.h (which was brought in by repository.h) and repository.h (which was previously implicitly brought in by object.h), so also adjust other files to be more explicit about what they depend upon. Signed-off-by: Elijah Newren <newren@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2023-04-24 12:47:32 -07:00
Elijah Newren	41227cb138	hash.h: move some oid-related declarations from cache.h These defines and enum are all oid-related and as such seem to make more sense being included in hash.h. Further, moving them there allows us to remove some includes of cache.h in other files. The change to line-log.h might look unrelated, but line-log.h includes diffcore.h, which previously included cache.h, which included the kitchen sink. Since this patch makes diffcore.h no longer include cache.h, the compiler complains about the 'struct string_list *' function parameter. Add a forward declaration for struct string_list to address this. Signed-off-by: Elijah Newren <newren@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2023-02-23 17:25:28 -08:00
SZEDER Gábor	04ae00062d	line-log: free diff queue when processing non-merge commits When processing a non-merge commit, the line-level log first asks the tree-diff machinery whether any of the files in the given line ranges were modified between the current commit and its parent, and if some of them were, then it loads the contents of those files from both commits to see whether their line ranges were modified and/or need to be adjusted. Alas, it doesn't free() the diff queue holding the results of that query and the contents of those files once its done. This can add up to a substantial amount of leaked memory, especially when the file in question is big and is frequently modified: a user reported "Out of memory, malloc failed" errors with a 2MB text file that was modified ~2800 times [1] (I estimate the leak would use up almost 11GB memory in that case). Free that diff queue to plug this memory leak. However, instead of simply open-coding the necessary three lines, add them as a helper function to the diff API, because it will be useful elsewhere as well. [1] https://public-inbox.org/git/CAFOPqVXz2XwzX8vGU7wLuqb2ZuwTuOFAzBLRM_QPk+NJa=eC-g@mail.gmail.com/ Signed-off-by: SZEDER Gábor <szeder.dev@gmail.com> Signed-off-by: Taylor Blau <me@ttaylorr.com>	2022-11-02 20:16:34 -04:00
Elijah Newren	f239fff4c1	merge-ort: store filepairs and filespecs in our mem_pool For the testcases mentioned in commit `557ac0350d` ("merge-ort: begin performance work; instrument with trace2_region_* calls", 2020-10-28), this change improves the performance as follows: Before After no-renames: 198.1 ms ± 2.6 ms 198.5 ms ± 3.4 ms mega-renames: 715.8 ms ± 4.0 ms 679.1 ms ± 5.6 ms just-one-mega: 276.8 ms ± 4.2 ms 271.9 ms ± 2.8 ms Signed-off-by: Elijah Newren <newren@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2021-07-30 09:01:19 -07:00
Elijah Newren	a8791ef649	diffcore-rename, merge-ort: add wrapper functions for filepair alloc/dealloc We want to be able to allocate filespecs and filepairs using a mem_pool. However, filespec data will still remain outside the pool (perhaps in the future we could plumb the pool through the various diff APIs to allocate the filespec data too, but for now we are limiting the scope). Add some extra functions to allocate these appropriately based on the non-NULL-ness of opt->priv->pool, as well as some extra functions to handle correctly deallocating the relevant parts of them. A future commit will make use of these new functions. Signed-off-by: Elijah Newren <newren@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2021-07-30 09:01:19 -07:00
Junio C Hamano	169914ede2	Merge branch 'en/ort-perf-batch-11' Optimize out repeated rename detection in a sequence of mergy operations. * en/ort-perf-batch-11: merge-ort, diffcore-rename: employ cached renames when possible merge-ort: handle interactions of caching and rename/rename(1to1) cases merge-ort: add helper functions for using cached renames merge-ort: preserve cached renames for the appropriate side merge-ort: avoid accidental API mis-use merge-ort: add code to check for whether cached renames can be reused merge-ort: populate caches of rename detection results merge-ort: add data structures for in-memory caching of rename detection t6429: testcases for remembering renames fast-rebase: write conflict state to working tree, index, and HEAD fast-rebase: change assert() to BUG() Documentation/technical: describe remembering renames optimization t6423: rename file within directory that other side renamed	2021-06-14 13:33:27 +09:00
Elijah Newren	25e65b6dd5	merge-ort, diffcore-rename: employ cached renames when possible When there are many renames between the old base of a series of commits and the new base, the way sequencer.c, merge-recursive.c, and diffcore-rename.c have traditionally split the work resulted in redetecting the same renames with each and every commit being transplanted. To address this, the last several commits have been creating a cache of rename detection results, determining when it was safe to use such a cache in subsequent merge operations, adding helper functions, and so on. See the previous half dozen commit messages for additional discussion of this optimization, particularly the message a few commits ago entitled "add code to check for whether cached renames can be reused". This commit finally ties all of that work together, modifying the merge algorithm to make use of these cached renames. For the testcases mentioned in commit `557ac0350d` ("merge-ort: begin performance work; instrument with trace2_region_* calls", 2020-10-28), this change improves the performance as follows: Before After no-renames: 5.665 s ± 0.129 s 5.622 s ± 0.059 s mega-renames: 11.435 s ± 0.158 s 10.127 s ± 0.073 s just-one-mega: 494.2 ms ± 6.1 ms 500.3 ms ± 3.8 ms That's a fairly small improvement, but mostly because the previous optimizations were so effective for these particular testcases; this optimization only kicks in when the others don't. If we undid the basename-guided rename detection and skip-irrelevant-renames optimizations, then we'd see that this series by itself improved performance as follows: Before Basename Series After Just This Series no-renames: 13.815 s ± 0.062 s 5.697 s ± 0.080 s mega-renames: 1799.937 s ± 0.493 s 205.709 s ± 0.457 s Since this optimization kicks in to help accelerate cases where the previous optimizations do not apply, this last comparison shows that this cached-renames optimization has the potential to help signficantly in cases that don't meet the requirements for the other optimizations to be effective. The changes made in this optimization also lay some important groundwork for a future optimization around having collect_merge_info() avoid recursing into subtrees in more cases. However, for this optimization to be effective, merge_switch_to_result() should only be called when the rebase or cherry-pick operation has either completed or hit a case where the user needs to resolve a conflict or edit the result. If it is called after every commit, as sequencer.c does, then the working tree and index are needlessly updated with every commit and the cached metadata is tossed, defeating this optimization. Refactoring sequencer.c to only call merge_switch_to_result() at the end of the operation is a bigger undertaking, and the practical benefits of this optimization will not be realized until that work is performed. Since `test-tool fast-rebase` only updates at the end of the operation, it was used to obtain the timings above. Signed-off-by: Elijah Newren <newren@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2021-05-20 15:40:39 +09:00
Junio C Hamano	e2e1a03f6b	Merge branch 'en/ort-perf-batch-10' Various rename detection optimization to help "ort" merge strategy backend. * en/ort-perf-batch-10: diffcore-rename: determine which relevant_sources are no longer relevant merge-ort: record the reason that we want a rename for a file diffcore-rename: add computation of number of unknown renames diffcore-rename: check if we have enough renames for directories early on diffcore-rename: only compute dir_rename_count for relevant directories merge-ort: record the reason that we want a rename for a directory merge-ort, diffcore-rename: tweak dirs_removed and relevant_source type diffcore-rename: take advantage of "majority rules" to skip more renames	2021-04-16 13:53:33 -07:00
Junio C Hamano	1b31224e59	Merge branch 'en/ort-perf-batch-9' The ort merge backend has been optimized by skipping irrelevant renames. * en/ort-perf-batch-9: diffcore-rename: avoid doing basename comparisons for irrelevant sources merge-ort: skip rename detection entirely if possible merge-ort: use relevant_sources to filter possible rename sources merge-ort: precompute whether directory rename detection is needed merge-ort: introduce wrappers for alternate tree traversal merge-ort: add data structures for an alternate tree traversal merge-ort: precompute subset of sources for which we need rename detection diffcore-rename: enable filtering possible rename sources	2021-04-08 13:23:26 -07:00
Junio C Hamano	dd4048d1c7	Merge branch 'en/ort-perf-batch-8' Rename detection rework continues. * en/ort-perf-batch-8: diffcore-rename: compute dir_rename_guess from dir_rename_counts diffcore-rename: limit dir_rename_counts computation to relevant dirs diffcore-rename: compute dir_rename_counts in stages diffcore-rename: extend cleanup_dir_rename_info() diffcore-rename: move dir_rename_counts into dir_rename_info struct diffcore-rename: add function for clearing dir_rename_count Move computation of dir_rename_count from merge-ort to diffcore-rename diffcore-rename: add a mapping of destination names to their indices diffcore-rename: provide basic implementation of idx_possible_rename() diffcore-rename: use directory rename guided basename comparisons	2021-03-22 14:00:24 -07:00
Elijah Newren	ec59da6015	merge-ort: record the reason that we want a rename for a file There are two different reasons we might want a rename for a file -- for three-way content merging or as part of directory rename detection. Record the reason. diffcore-rename will potentially be able to filter some of the ones marked as needed only for directory rename detection, if it can determine those directory renames based solely on renames found via exact rename detection and basename-guided rename detection. Signed-off-by: Elijah Newren <newren@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2021-03-18 14:32:56 -07:00
Elijah Newren	fb52938eec	merge-ort: record the reason that we want a rename for a directory When one side of history renames a directory, and the other side of history added files to the old directory, directory rename detection is used to warn about the location of the added files so the user can move them to the old directory or keep them with the new one. This sets up three different types of directories: * directories that had new files added to them * directories underneath a directory that had new files added to them * directories where no new files were added to it or any leading path Save this information in dirs_removed; the next several commits will make use of this information. Signed-off-by: Elijah Newren <newren@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2021-03-18 14:32:55 -07:00
Elijah Newren	a49b55d52e	merge-ort, diffcore-rename: tweak dirs_removed and relevant_source type As noted in the previous commit, we want to be able to take advantage of the "majority rules" portion of directory rename detection to avoid detecting more renames than necessary. However, for diffcore-rename to take advantage of that, it needs to know whether a rename source file was needed for just directory rename detection reasons, or if it is wanted for potential three-way content merging. Modify relevant_sources from a strset to a strintmap, so we can encode additional information. We also modify dirs_removed from a strset to a strintmap at the same time because trying to determine what files are needed for directory rename detection will require us tracking a bit more information for each directory. This commit only changes the types of the two variables from strset to strintmap; it does not actually store any special values yet and for now only checks for presence of entries in the strintmap. Thus, the code is functionally identical to how it behaved before. Future commits will start associating values with each key for these two maps. Signed-off-by: Elijah Newren <newren@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2021-03-18 14:32:55 -07:00
Elijah Newren	9799889f2e	diffcore-rename: enable filtering possible rename sources Add the ability to diffcore_rename_extended() to allow external callers to declare that they only need renames detected for a subset of source files, and use that information to skip detecting renames for them. There are two important pieces to this optimization that may not be obvious at first glance: * We do not require callers to just filter the filepairs out to remove the non-relevant sources, because exact rename detection is fast and when it finds a match it can remove both a source and a destination whereas the relevant_sources filter can only remove a source. * We need to filter out the source pairs in a preliminary pass instead of adding a strset_contains(relevant_sources, one->path) check within the nested matrix loop. The reason for that is if we have 30k renames, doing 30k * 30k = 900M strset_contains() calls becomes extraordinarily expensive and defeats the performance gains from this change; we only want to do 30k such calls instead. If callers pass NULL for relevant_sources, that is special cases to treat all sources as relevant. Since all callers currently pass NULL, this optimization does not yet have any effect. Subsequent commits will have merge-ort compute a set of relevant_sources to restrict which sources we detect renames for, and have merge-ort pass that set of relevant_sources to diffcore_rename_extended(). A note about filtering order: Some may be curious why we don't filter out irrelevant sources at the same time we filter out exact renames. While that technically could be done at this point, there are two reasons to defer it: First, was to reinforce a lesson that was too easy to forget. As I mentioned above, in the past I filtered irrelevant sources out before exact rename checking, and then discovered that exact renames' ability to remove both sources and destinations was an important consideration and thus doing the filtering after exact rename checking would speed things up. Then at some point I realized that basename matching could also remove both sources and destinations, and decided to put irrelevant source filtering after basename filtering. That slowed things down a lot. But, despite learning about this important ordering, in later restructuring I forgot and made the same mistake of putting the filtering after basename guided rename detection again. So, I have this series of patches structured to do the irrelevant filtering last to start to show how much extra it costs, and then add relevant filtering in to find_basename_matches() to show how much it speeds things up. Basically, it's a way to reinforce something that apparently was too easy to forget, and make sure the commit messages record this lesson. Second, the items in the "relevant_sources" in this patch series will include all sources that might be relevant. It has to be conservative and catch anything that might need a rename, but in the patch series after this one we'll find ways to weed out more of the might be relevant ones. Unfortunately, merge-ort does not have sufficient information to weed those ones out, and there isn't enough information at the time of filtering exact renames out to remove the extra ones either. It has to be deferred. So the deferral is in part to simplify some later additions. Signed-off-by: Elijah Newren <newren@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2021-03-10 22:18:04 -08:00
Elijah Newren	cd52e0050f	diffcore-rename: add function for clearing dir_rename_count As we adjust the usage of dir_rename_count we want to have a function for clearing, or partially clearing it out. Add a partial_clear_dir_rename_count() function for this purpose. Reviewed-by: Derrick Stolee <dstolee@microsoft.com> Signed-off-by: Elijah Newren <newren@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2021-02-26 17:53:11 -08:00
Elijah Newren	0c4fd732f0	Move computation of dir_rename_count from merge-ort to diffcore-rename Move the computation of dir_rename_count from merge-ort.c to diffcore-rename.c, making slight adjustments to the data structures based on the move. While the diffstat looks large, viewing this commit with --color-moved makes it clear that only about 20 lines changed. With this patch, the computation of dir_rename_count is still only done after inexact rename detection, but subsequent commits will add a preliminary computation of dir_rename_count after exact rename detection, followed by some updates after inexact rename detection. Reviewed-by: Derrick Stolee <dstolee@microsoft.com> Signed-off-by: Elijah Newren <newren@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2021-02-26 17:53:11 -08:00
Junio C Hamano	1eb4136ac2	diff: --{rotate,skip}-to=<path> In the implementation of "git difftool", there is a case where the user wants to start viewing the diffs at a specific path and continue on to the rest, optionally wrapping around to the beginning. Since it is somewhat cumbersome to implement such a feature as a post-processing step of "git diff" output, let's support it internally with two new options. - "git diff --rotate-to=C", when the resulting patch would show paths A B C D E without the option, would "rotate" the paths to shows patch to C D E A B instead. It is an error when there is no patch for C is shown. - "git diff --skip-to=C" would instead "skip" the paths before C, and shows patch to C D E. Again, it is an error when there is no patch for C is shown. - "git log [-p]" also accepts these two options, but it is not an error if there is no change to the specified path. Instead, the set of output paths are rotated or skipped to the specified path or the first path that sorts after the specified path. Signed-off-by: Junio C Hamano <gitster@pobox.com>	2021-02-16 09:30:42 -08:00
Jonathan Tan	95acf11a3d	diff: restrict when prefetching occurs Commit `7fbbcb21b1` ("diff: batch fetching of missing blobs", 2019-04-08) optimized "diff" by prefetching blobs in a partial clone, but there are some cases wherein blobs do not need to be prefetched. In these cases, any command that uses the diff machinery will unnecessarily fetch blobs. diffcore_std() may read blobs when it calls the following functions: (1) diffcore_skip_stat_unmatch() (controlled by the config variable diff.autorefreshindex) (2) diffcore_break() and diffcore_merge_broken() (for break-rewrite detection) (3) diffcore_rename() (for rename detection) (4) diffcore_pickaxe() (for detecting addition/deletion of specified string) Instead of always prefetching blobs, teach diffcore_skip_stat_unmatch(), diffcore_break(), and diffcore_rename() to prefetch blobs upon the first read of a missing object. This covers (1), (2), and (3): to cover the rest, teach diffcore_std() to prefetch if the output type is one that includes blob data (and hence blob data will be required later anyway), or if it knows that (4) will be run. Helped-by: Jeff King <peff@peff.net> Signed-off-by: Jonathan Tan <jonathantanmy@google.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2020-04-07 16:09:29 -07:00
Jonathan Tan	1c37e86ab2	diff: make diff_populate_filespec_options struct The behavior of diff_populate_filespec() currently can be customized through a bitflag, but a subsequent patch requires it to support a non-boolean option. Replace the bitflag with an options struct. Signed-off-by: Jonathan Tan <jonathantanmy@google.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2020-04-07 16:09:29 -07:00
Heba Waly	13c4d7eb22	diff: move doc to diff.h and diffcore.h Move the documentation from Documentation/technical/api-diff.txt to both diff.h and diffcore.h as it's easier for the developers to find the usage information beside the code instead of looking for it in another doc file. Also documentation/technical/api-diff.txt is removed because the information it has is now redundant and it'll be hard to keep it up to date and synchronized with the documentation in the header files. There are three members documented in the doc file that weren't found in the header files, assuming the doc wasn't up to date and the members no longer exist: touched_flags, COLOR_DIFF_WORDS and QUIET. Signed-off-by: Heba Waly <heba.waly@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2019-11-18 15:21:28 +09:00
Nguyễn Thái Ngọc Duy	b78ea5fc35	diff.c: reduce implicit dependency on the_index diff and textconv code has so widespread use that it's hard to simply update their api and all call sites at once because it would result in a big patch. For now reduce the_index references to two places: diff_setup() and fill_textconv(). Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2018-09-21 09:48:10 -07:00
Junio C Hamano	5ade034464	Merge branch 'en/incl-forward-decl' Code hygiene improvement for the header files. * en/incl-forward-decl: Remove forward declaration of an enum compat/precompose_utf8.h: use more common include guard style urlmatch.h: fix include guard Move definition of enum branch_track from cache.h to branch.h alloc: make allocate_alloc_state and clear_alloc_state more consistent Add missing includes and forward declarations	2018-08-20 12:41:32 -07:00
Elijah Newren	ef3ca95475	Add missing includes and forward declarations I looped over the toplevel header files, creating a temporary two-line C program for each consisting of #include "git-compat-util.h" #include $HEADER This patch is the result of manually fixing errors in compiling those tiny programs. Signed-off-by: Elijah Newren <newren@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2018-08-15 11:52:09 -07:00
Nguyễn Thái Ngọc Duy	78d70d9b10	diffcore.h: drop extern from function declaration Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2018-08-03 10:42:55 -07:00
Brandon Williams	f9704c2d82	diff: convert fill_filespec to struct object_id Signed-off-by: Brandon Williams <bmwill@google.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2017-06-02 09:36:07 +09:00
Junio C Hamano	6d40812e4b	Merge branch 'tk/diffcore-delta-remove-unused' Code cleanup. * tk/diffcore-delta-remove-unused: diffcore-delta: remove unused parameter to diffcore_count_changes()	2016-11-17 13:45:22 -08:00
Tobias Klauser	974e0044d6	diffcore-delta: remove unused parameter to diffcore_count_changes() The delta_limit parameter to diffcore_count_changes() has been unused since commit `ba23bbc8e` ("diffcore-delta: make change counter to byte oriented again.", 2006-03-04). Remove the parameter and adjust all callers. Signed-off-by: Tobias Klauser <tklauser@distanz.ch> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2016-11-14 09:24:04 -08:00
brian m. carlson	41c9560ee5	diff: rename struct diff_filespec's sha1_valid member Now that this struct's sha1 member is called "oid", update the comment and the sha1_valid member to be called "oid_valid" instead. The following Coccinelle semantic patch was used to implement this, followed by the transformations in object_id.cocci: @@ struct diff_filespec o; @@ - o.sha1_valid + o.oid_valid @@ struct diff_filespec *p; @@ - p->sha1_valid + p->oid_valid Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2016-06-28 11:39:02 -07:00
brian m. carlson	a0d12c4433	diff: convert struct diff_filespec to struct object_id Convert struct diff_filespec's sha1 member to use a struct object_id called "oid" instead. The following Coccinelle semantic patch was used to implement this, followed by the transformations in object_id.cocci: @@ struct diff_filespec o; @@ - o.sha1 + o.oid.hash @@ struct diff_filespec *p; @@ - p->sha1 + p->oid.hash Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2016-06-28 11:39:02 -07:00
Nguyễn Thái Ngọc Duy	6bf3b81348	diff --stat: mark any file larger than core.bigfilethreshold binary Too large files may lead to failure to allocate memory. If it happens here, it could impact quite a few commands that involve diff. Moreover, too large files are inefficient to compare anyway (and most likely non-text), so mark them binary and skip looking at their content. Noticed-by: Dale R. Worley <worley@alum.mit.edu> Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2014-08-18 10:16:45 -07:00
Nguyễn Thái Ngọc Duy	8e5dd3d654	diff.c: allow to pass more flags to diff_populate_filespec Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2014-08-18 10:16:35 -07:00
Junio C Hamano	9b347673a1	Merge branch 'jk/diff-filespec-cleanup' Portability fix to a topic already in v1.9 * jk/diff-filespec-cleanup: diffcore.h: be explicit about the signedness of is_binary	2014-03-18 13:48:50 -07:00
Junio C Hamano	6376463c37	Merge branch 'ks/combine-diff' Teach combine-diff to honour the path-output-order imposed by diffcore-order, and optimize how matching paths are found in the N-way diffs made with parents. * ks/combine-diff: tests: add checking that combine-diff emits only correct paths combine-diff: simplify intersect_paths() further combine-diff: combine_diff_path.len is not needed anymore combine-diff: optimize combine_diff_path sets intersection diff test: add tests for combine-diff with orderfile diffcore-order: export generic ordering interface	2014-03-05 15:06:26 -08:00
Junio C Hamano	1e745453fe	Merge branch 'nd/diff-quiet-stat-dirty' "git diff --quiet -- pathspec1 pathspec2" sometimes did not return correct status value. * nd/diff-quiet-stat-dirty: diff: do not quit early on stat-dirty files diff.c: move diffcore_skip_stat_unmatch core logic out for reuse later	2014-02-27 14:01:21 -08:00
Nguyễn Thái Ngọc Duy	f34b205f6c	diff: do not quit early on stat-dirty files When QUICK is set (i.e. with --quiet) we try to do as little work as possible, stopping after seeing the first change. stat-dirty is considered a "change" but it may turn out not, if no actual content is changed. The actual content test is performed too late in the process and the shortcut may be taken prematurely, leading to incorrect return code. Assume we do "git diff --quiet". If we have a stat-dirty file "a" and a really dirty file "b". We break the loop in run_diff_files() and stop after "a" because we have got a "change". Later in diffcore_skip_stat_unmatch() we find out "a" is actually not changed. But there's nothing else in the diff queue, we incorrectly declare "no change", ignoring the fact that "b" is changed. This also happens to "git diff --quiet HEAD" when it hits diff_can_quit_early() in oneway_diff(). This patch does the content test earlier in order to keep going if "a" is unchanged. The test result is cached so that when diffcore_skip_stat_unmatch() is done in the end, we spend no cycles on re-testing "a". Reported-by: IWAMOTO Toshihiro <iwamoto@valinux.co.jp> Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2014-02-24 14:50:14 -08:00
Kirill Smelkov	1df4320fa2	diffcore-order: export generic ordering interface diffcore_order() interface only accepts a queue of `struct diff_filepair`. In the next patches, we'll want to order `struct combine_diff_path` by path, so let's first rework diffcore-order to also provide generic low-level interface for ordering arbitrary objects, provided they have path accessors. The new interface is: - `struct obj_order` for describing objects to ordering routine, and - order_objects() for actually doing the ordering work. Signed-off-by: Kirill Smelkov <kirr@mns.spb.ru> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2014-02-24 14:44:57 -08:00
Richard Lowe	7d0a9a752b	diffcore.h: be explicit about the signedness of is_binary Bitfields need to specify their signedness explicitly or the compiler is free to default as it sees fit. With compilers that default 'unsigned' (SUNWspro 12 seems to do this) the tri-state nature of is_binary vanishes and all files are treated as binary. Signed-off-by: Richard Lowe <richlowe@richlowe.net> Acked-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2014-02-24 09:52:44 -08:00
Jeff King	cbfe47b67f	diff_filespec: use only 2 bits for is_binary flag The is_binary flag needs only three values: -1, 0, and 1. However, we use a whole 32-bit int for it on most systems (both 32- and 64- bit). Instead, we can mark it to use only 2 bits. On 32-bit systems, this lets it end up as part of the bitfield above (saving 4 bytes). On 64-bit systems, we don't see any change (because the savings end up as padding), but it does leave room for another "free" 32-bit value to be added later. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2014-01-17 10:50:14 -08:00
Jeff King	b38f70a82b	diff_filespec: reorder is_binary field The middle of the diff_filespec struct contains a mixture of ints, shorts, and bit-fields, followed by a pointer. On an x86-64 system with an LP64 or LLP64 data model (i.e., most of them), the integers and flags end up being padded out by 41 bits to put the pointer at an 8-byte boundary. After the pointer, we have the "int is_binary" field, which is only 32 bits. We end up wasting another 32 bits to pad the struct size up to a multiple of 64 bits. We can move the is_binary field before the pointer, which lets the compiler store it where we used to have padding. This shrinks the top padding to only 9 bits (from the bit-fields), and eliminates the bottom padding entirely, dropping the struct size from 88 to 80 bytes. On a 32-bit system, there is no benefit, but nor should there be any harm (we only need 4-byte alignment there, so we were already using only 9 bits of padding). Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2014-01-17 10:50:13 -08:00
Jeff King	428d52a5a5	diff_filespec: drop xfrm_flags field The only mention of this field in the code is by some debugging code which prints it out (and it will always be zero, since we never touch it otherwise). It was obsoleted very early on by `25d5ea4` ([PATCH] Redo rename/copy detection logic., 2005-05-24). Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2014-01-17 10:50:11 -08:00
Jeff King	5b711b207f	diff_filespec: drop funcname_pattern_ident field This struct field was obsoleted by `be58e70` (diff: unify external diff and funcname parsing code, 2008-10-05), but we forgot to remove it. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2014-01-17 10:50:10 -08:00
Jeff King	b837f5d68d	diff_filespec: reorder dirty_submodule macro definitions diff_filespec has a 2-bit "dirty_submodule" field and defines two flags as macros. Originally these were right next to each other, but a new field was accidentally added in between in commit `4682d85`. This patch puts the field and its flags back together. Using an enum like: enum { DIRTY_SUBMODULE_UNTRACKED = 1, DIRTY_SUBMODULE_MODIFIED = 2 } dirty_submodule; would be more obvious, but it bloats the structure. Limiting the enum size like: } dirty_submodule : 2; might work, but it is not portable. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2014-01-17 10:50:03 -08:00
Junio C Hamano	3b753148b6	Merge branch 'jk/maint-null-in-trees' We do not want a link to 0{40} object stored anywhere in our objects. * jk/maint-null-in-trees: fsck: detect null sha1 in tree entries do not write null sha1s to on-disk index diff: do not use null sha1 as a sentinel value	2012-08-27 11:54:28 -07:00
Jeff King	e54501004a	diff: do not use null sha1 as a sentinel value The diff code represents paths using the diff_filespec struct. This struct has a sha1 to represent the sha1 of the content at that path, as well as a sha1_valid member which indicates whether its sha1 field is actually useful. If sha1_valid is not true, then the filespec represents a working tree file (e.g., for the no-index case, or for when the index is not up-to-date). The diff_filespec is only used internally, though. At the interfaces to the diff subsystem, callers feed the sha1 directly, and we create a diff_filespec from it. It's at that point that we look at the sha1 and decide whether it is valid or not; callers may pass the null sha1 as a sentinel value to indicate that it is not. We should not typically see the null sha1 coming from any other source (e.g., in the index itself, or from a tree). However, a corrupt tree might have a null sha1, which would cause "diff --patch" to accidentally diff the working tree version of a file instead of treating it as a blob. This patch extends the edges of the diff interface to accept a "sha1_valid" flag whenever we accept a sha1, and to use that flag when creating a filespec. In some cases, this means passing the flag through several layers, making the code change larger than would be desirable. One alternative would be to simply die() upon seeing corrupted trees with null sha1s. However, this fix more directly addresses the problem (while bogus sha1s in a tree are probably a bad thing, it is really the sentinel confusion sending us down the wrong code path that is what makes it devastating). And it means that git is more capable of examining and debugging these corrupted trees. For example, you can still "diff --raw" such a tree to find out when the bogus entry was introduced; you just cannot do a "--patch" diff (just as you could not with any other corrupted tree, as we do not have any content to diff). Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2012-07-29 15:04:32 -07:00
Junio C Hamano	d7afe648dc	Merge branch 'jc/refactor-diff-stdin' Due to the way "git diff --no-index" is bolted onto by touching the low level code that is shared with the rest of the "git diff" code, even though it has to work in a very different way, any comparison that involves a file "-" at the root level incorrectly tried to read from the standard input. This cleans up the no-index codepath further to remove code that reads from the standard input from the core side, which is never necessary when git is running its usual diff operation. * jc/refactor-diff-stdin: diff-index.c: "git diff" has no need to read blob from the standard input diff-index.c: unify handling of command line paths diff-index.c: do not pretend paths are pathspecs	2012-07-13 15:38:05 -07:00
Junio C Hamano	4682d8521c	diff-index.c: "git diff" has no need to read blob from the standard input Only "diff --no-index -" does. Bolting the logic into the low-level function diff_populate_filespec() was a layering violation from day one. Move populate_from_stdin() function out of the generic diff.c to its only user, diff-index.c. Also make sure "-" from the command line stays a special token "read from the standard input", even if we later decide to sanitize the result from prefix_filename() function in a few obvious ways, e.g. removing unnecessary "./" prefix, duplicated slashes "//" in the middle, etc. Signed-off-by: Junio C Hamano <gitster@pobox.com>	2012-06-28 16:18:19 -07:00
Junio C Hamano	25e5e2bf85	combine-diff: support format_callback This teaches combine-diff machinery to feed a combined merge to a callback function when DIFF_FORMAT_CALLBACK is specified. So far, format callback functions are not used for anything but 2-way diffs. A callback is given a diff_queue_struct, which is an array of diff_filepair. As its name suggests, a diff_filepair is a _pair_ of diff_filespec that represents a single preimage and a single postimage. Since "diff -c" is to compare N parents with a single merge result and filter out any paths whose result match one (or more) of the parent(s), its output has to be able to represent N preimages and 1 postimage. For this reason, a callback function that inspects a diff_filepair that results from this new infrastructure can and is expected to view the preimage side (i.e. pair->one) as an array of diff_filespec. Each element in the array, except for the last one, is marked with "has_more_entries" bit, so that the same callback function can be used for 2-way diffs and combined diffs. Signed-off-by: Junio C Hamano <gitster@pobox.com>	2011-08-20 23:03:06 -07:00
Junio C Hamano	382f013bc4	diff: pass the entire diff-options to diffcore_pickaxe() That would make it easier to give enhanced feature to the pickaxe transformation. Signed-off-by: Junio C Hamano <gitster@pobox.com>	2010-08-31 14:30:28 -07:00
Junio C Hamano	44c48a909a	diff --follow: do call diffcore_std() as necessary Usually, diff frontends populate the output queue with filepairs without any rename information and call diffcore_std() to sort the renames out. When --follow is in effect, however, diff-tree family of frontend has a hack that looks like this: diff-tree frontend -> diff_tree_sha1() . populate diff_queued_diff . if --follow is in effect and there is only one change that creates the target path, then -> try_to_follow_renames() -> diff_tree_sha1() with no pathspec but with -C -> diffcore_std() to find renames . if rename is found, tweak diff_queued_diff and put a single filepair that records the found rename there -> diffcore_std() . tweak elements on diff_queued_diff by - rename detection - path ordering - pickaxe filtering We need to skip parts of the second call to diffcore_std() that is related to rename detection, and do so only when try_to_follow_renames() did find a rename. Earlier `1da6175` (Make diffcore_std only can run once before a diff_flush, 2010-05-06) tried to deal with this issue incorrectly; it unconditionally disabled any second call to diffcore_std(). This hopefully fixes the breakage. Signed-off-by: Junio C Hamano <gitster@pobox.com>	2010-08-13 12:17:45 -07:00
Jonathan Nieder	987460611a	Standardize do { ... } while (0) style Signed-off-by: Jonathan Nieder <jrnieder@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2010-08-12 15:44:51 -07:00

1 2 3

102 commits