development/git - HydraGit

mirror of https://github.com/git/git synced 2024-11-05 18:59:29 +00:00

Author	SHA1	Message	Date
Stefan Beller	8d0017daa1	patch-ids.c: drop hashmap_cmp_fn cast Signed-off-by: Stefan Beller <sbeller@google.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2017-07-05 13:53:12 -07:00
Stefan Beller	3da492f808	patch-ids.c: use hashmap correctly As alluded to in the previous patch, the code in patch-ids.c is using the hashmaps API wrong. Luckily we do not have a bug, as all hashmap functionality that we use here (hashmap_get) passes through the keydata. If hashmap_get_next were to be used, a bug would occur as that passes NULL for the key_data. So instead use the hashmap API correctly and provide the caller required data in the compare function via the first argument that always gets passed and was setup via the hashmap_init function. Signed-off-by: Stefan Beller <sbeller@google.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2017-06-30 13:11:54 -07:00
Stefan Beller	7663cdc86c	hashmap.h: compare function has access to a data field When using the hashmap a common need is to have access to caller provided data in the compare function. A couple of times we abuse the keydata field to pass in the data needed. This happens for example in patch-ids.c. This patch changes the function signature of the compare function to have one more void pointer available. The pointer given for each invocation of the compare function must be defined in the init function of the hashmap and is just passed through. Documentation of this new feature is deferred to a later patch. This is a rather mechanical conversion, just adding the new pass-through parameter. However while at it improve the naming of the fields of all compare functions used by hashmaps by ensuring unused parameters are prefixed with 'unused_' and naming the parameters what they are (instead of 'unused' make it 'unused_keydata'). Signed-off-by: Stefan Beller <sbeller@google.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2017-06-30 12:49:28 -07:00
Brandon Williams	66f414f885	diff-tree: convert diff_tree_sha1 to struct object_id Signed-off-by: Brandon Williams <bmwill@google.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2017-06-05 11:23:58 +09:00
Brandon Williams	7b8dea0c75	tree-diff: convert diff_root_tree_sha1 to struct object_id Signed-off-by: Brandon Williams <bmwill@google.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2017-06-02 09:36:30 +09:00
Brandon Williams	bd25f28876	diff: convert diff_flush_patch_id to struct object_id Signed-off-by: Brandon Williams <bmwill@google.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2017-06-02 09:36:07 +09:00
Brandon Williams	34f3c0ebfb	patch-ids: convert to struct object_id Signed-off-by: Brandon Williams <bmwill@google.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2017-06-02 09:36:07 +09:00
Johannes Schindelin	5748693b91	add_commit_patch_id(): avoid allocating memory unnecessarily It would appear that we allocate (and forget to release) memory if the patch ID is not even defined. Reported by the Coverity tool. Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2017-05-08 12:18:19 +09:00
brian m. carlson	cd02599c48	Convert GIT_SHA1_RAWSZ used for allocation to GIT_MAX_RAWSZ Since we will likely be introducing a new hash function at some point, and that hash function might be longer than 20 bytes, use the constant GIT_MAX_RAWSZ, which is designed to be suitable for allocations, instead of GIT_SHA1_RAWSZ. This will ease the transition down the line by distinguishing between places where we need to allocate memory suitable for the largest hash from those where we need to handle the current hash. Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2017-03-26 22:08:21 -07:00
Jeff King	7c81040792	patch-ids: refuse to compute patch-id for merge commit The patch-id code which powers "log --cherry-pick" doesn't look at whether each commit is a merge or not. It just feeds the commit's first parent to the diff, and ignores any additional parents. In theory, this might be useful if you wanted to find equivalence between, say, a merge commit and a squash-merge that does the same thing. But it also promotes a false equivalence between distinct merges. For example, every "merge -s ours" would look identical to an empty commit (which is true in a sense, but presumably there was a value in merging in the discarded history). Since patch-ids are meant for throwing away duplicates, we should err on the side of _not_ matching such merges. Moreover, we may spend a lot of extra time computing these merge diffs. In the case that inspired this patch, a "git format-patch --cherry-pick" dropped from over 3 minutes to less than 3 seconds. This seems pretty drastic, but is easily explained. The command was invoked by a "git rebase" of an older topic branch; there had been tens of thousands of commits on the upstream branch in the meantime. In addition, this project used a topic-branch workflow with occasional "back-merges" from "master" to each topic (to resolve conflicts on the topics rather than in the merge commits). So there were not only extra merges, but the diffs for these back-merges were generally quite large (because they represented _everything_ that had been merged to master since the topic branched). This patch treats a merge fed to commit_patch_id() or add_commit_patch_id() as an error, and a lookup for such a merge via has_commit_patch_id() will always return NULL. An earlier version of the patch tried to distinguish between "error" and "patch id for merges not defined", but that becomes unnecessarily complicated. The only callers are: 1. revision traversals which want to do --cherry-pick; they call add_commit_patch_id(), but do not care if it fails. They only want to add what we can, look it up later with has_commit_patch_id(), and err on the side of not-matching. 2. format-patch --base, which calls commit_patch_id(). This _does_ notice errors, but should never feed a merge in the first place (and if it were to do so accidentally, then this patch is a strict improvement; we notice the bug rather than generating a bogus patch-id). So in both cases, this does the right thing. Helped-by: Johannes Schindelin <Johannes.Schindelin@gmx.de> Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2016-09-12 13:45:01 -07:00
Jeff King	5a29cbc6e9	patch-ids: turn off rename detection The patch-id code may be running inside another porcelain like "git log" or "git format-patch", and therefore may have set diff_detect_rename_default, either via the diff-ui config, or by default since `5404c11` (diff: activate diff.renames by default, 2016-02-25). This is the case even if a command is run with `--no-renames`, as that is applied only to the diff-options used by the command itself. Rename detection doesn't help the patch-id results. It _may_ actually hurt, as minor differences in the files that would be overlooked by patch-id's canonicalization might result in different renames (though I'd doubt that it ever comes up in practice). But mostly it is just a waste of CPU to compute these renames. Note that this does have one user-visible impact: the prerequisite patches listed by "format-patch --base". There may be some confusion between different versions of git as older ones will enable renames, but newer ones will not. However, this was already a problem, as people with different settings for the "diff.renames" config would get different results. After this patch, everyone should get the same results, regardless of their config. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2016-09-09 14:13:53 -07:00
Kevin Willford	b3dfeebb92	rebase: avoid computing unnecessary patch IDs The `rebase` family of Git commands avoid applying patches that were already integrated upstream. They do that by using the revision walking option that computes the patch IDs of the two sides of the rebase (local-only patches vs upstream-only ones) and skipping those local patches whose patch ID matches one of the upstream ones. In many cases, this causes unnecessary churn, as already the set of paths touched by a given commit would suffice to determine that an upstream patch has no local equivalent. This hurts performance in particular when there are a lot of upstream patches, and/or large ones. Therefore, let's introduce the concept of a "diff-header-only" patch ID, compare those first, and only evaluate the "full" patch ID lazily. Please note that in contrast to the "full" patch IDs, those "diff-header-only" patch IDs are prone to collide with one another, as adjacent commits frequently touch the very same files. Hence we now have to be careful to allow multiple hash entries with the same hash. We accomplish that by using the hashmap_add() function that does not even test for hash collisions. This also allows us to evaluate the full patch ID lazily, i.e. only when we found commits with matching diff-header-only patch IDs. We add a performance test that demonstrates ~1-6% improvement. In practice this will depend on various factors such as how many upstream changes and how big those changes are along with whether file system caches are cold or warm. As Git's test suite has no way of catching performance regressions, we also add a regression test that verifies that the full patch ID computation is skipped when the diff-header-only computation suffices. Signed-off-by: Kevin Willford <kcwillford@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2016-08-11 14:39:16 -07:00
Kevin Willford	3e8e32c32e	patch-ids: add flag to create the diff patch id using header only data This will allow a diff patch id to be created using only the header data so that the contents of the file will not have to be loaded. Signed-off-by: Kevin Willford <kcwillford@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2016-07-29 14:10:01 -07:00
Kevin Willford	683f17ec44	patch-ids: replace the seen indicator with a commit pointer The cherry_pick_list was looping through the original side checking the seen indicator and setting the cherry_flag on the commit. If we save off the commit in the patch_id we can set the cherry_flag on the correct commit when running through the other side when a patch_id match is found. Signed-off-by: Kevin Willford <kcwillford@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2016-07-29 13:23:03 -07:00
Kevin Willford	dfb7a1b4d0	patch-ids: stop using a hand-rolled hashmap implementation This change will use the hashmap from the hashmap.h to keep track of the patch_ids that have been encountered instead of using an internal implementation. This simplifies the implementation of the patch ids. Signed-off-by: Kevin Willford <kcwillford@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2016-07-29 13:23:03 -07:00
Xiaolong Ye	ded2c097ba	patch-ids: make commit_patch_id() a public helper function Make commit_patch_id() available to other builtins. Signed-off-by: Xiaolong Ye <xiaolong.ye@intel.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2016-04-26 10:49:57 -07:00
brian m. carlson	ed1c9977cb	Remove get_object_hash. Convert all instances of get_object_hash to use an appropriate reference to the hash member of the oid member of struct object. This provides no functional change, as it is essentially a macro substitution. Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net> Signed-off-by: Jeff King <peff@peff.net>	2015-11-20 08:02:05 -05:00
brian m. carlson	7999b2cf77	Add several uses of get_object_hash. Convert most instances where the sha1 member of struct object is dereferenced to use get_object_hash. Most instances that are passed to functions that have versions taking struct object_id, such as get_sha1_hex/get_oid_hex, or instances that can be trivially converted to use struct object_id instead, are not converted. Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net> Signed-off-by: Jeff King <peff@peff.net>	2015-11-20 08:02:05 -05:00
Dmitry S. Dolzhenko	104fb26a1e	patch-ids.c: use ALLOC_GROW() in add_commit() Signed-off-by: Dmitry S. Dolzhenko <dmitrys.dolzhenko@yandex.ru> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2014-03-03 14:49:12 -08:00
Thomas Rast	28452655af	diff_setup_done(): return void diff_setup_done() has historically returned an error code, but lost the last nonzero return in `943d5b7` (allow diff.renamelimit to be set regardless of -M/-C, 2006-08-09). The callers were in a pretty confused state: some actually checked for the return code, and some did not. Let it return void, and patch all callers to take this into account. This conveniently also gets rid of a handful of different(!) error messages that could never be triggered anyway. Note that the function can still die(). Signed-off-by: Thomas Rast <trast@student.ethz.ch> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2012-08-03 12:11:07 -07:00
Christian Couder	5289bae17f	patch-ids: use the new generic "sha1_pos" function to lookup sha1 instead of the specific one from which the new one has been copied. Signed-off-by: Christian Couder <chriscool@tuxfamily.org> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2009-04-04 22:57:42 -07:00
Pierre Habouzit	8f67f8aefb	Make the diff_options bitfields be an unsigned with explicit masks. reverse_diff was a bit-value in disguise, it's merged in the flags now. Signed-off-by: Pierre Habouzit <madcoder@debian.org> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2007-11-11 16:54:15 -08:00
Junio C Hamano	5d23e133d2	Refactor patch-id filtering out of git-cherry and git-format-patch. This implements the patch-id computation and recording library, patch-ids.c, and rewrites the get_patch_ids() function used in cherry and format-patch to use it, so that they do not pollute the object namespace. Earlier code threw non-objects into the in-core object database, and hoped for not getting bitten by SHA-1 collisions. While it may be practically Ok, it still was an ugly hack. Signed-off-by: Junio C Hamano <junkio@cox.net>	2007-04-11 20:02:03 -07:00

23 commits