rerere: add documentation for conflict normalization

Add some documentation for the logic behind the conflict normalization
in rerere.

Helped-by: Junio C Hamano <gitster@pobox.com>
Signed-off-by: Thomas Gummerer <t.gummerer@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
This commit is contained in:
Thomas Gummerer 2018-08-05 18:20:31 +01:00 committed by Junio C Hamano
parent 2373b65059
commit fb90dca34c
2 changed files with 140 additions and 4 deletions

View file

@ -0,0 +1,140 @@
Rerere
======
This document describes the rerere logic.
Conflict normalization
----------------------
To ensure recorded conflict resolutions can be looked up in the rerere
database, even when branches are merged in a different order,
different branches are merged that result in the same conflict, or
when different conflict style settings are used, rerere normalizes the
conflicts before writing them to the rerere database.
Different conflict styles and branch names are normalized by stripping
the labels from the conflict markers, and removing the common ancestor
version from the `diff3` conflict style. Branches that are merged
in different order are normalized by sorting the conflict hunks. More
on each of those steps in the following sections.
Once these two normalization operations are applied, a conflict ID is
calculated based on the normalized conflict, which is later used by
rerere to look up the conflict in the rerere database.
Removing the common ancestor version
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Say we have three branches AB, AC and AC2. The common ancestor of
these branches has a file with a line containing the string "A" (for
brevity this is called "line A" in the rest of the document). In
branch AB this line is changed to "B", in AC, this line is changed to
"C", and branch AC2 is forked off of AC, after the line was changed to
"C".
Forking a branch ABAC off of branch AB and then merging AC into it, we
get a conflict like the following:
<<<<<<< HEAD
B
=======
C
>>>>>>> AC
Doing the analogous with AC2 (forking a branch ABAC2 off of branch AB
and then merging branch AC2 into it), using the diff3 conflict style,
we get a conflict like the following:
<<<<<<< HEAD
B
||||||| merged common ancestors
A
=======
C
>>>>>>> AC2
By resolving this conflict, to leave line D, the user declares:
After examining what branches AB and AC did, I believe that making
line A into line D is the best thing to do that is compatible with
what AB and AC wanted to do.
As branch AC2 refers to the same commit as AC, the above implies that
this is also compatible what AB and AC2 wanted to do.
By extension, this means that rerere should recognize that the above
conflicts are the same. To do this, the labels on the conflict
markers are stripped, and the common ancestor version is removed. The above
examples would both result in the following normalized conflict:
<<<<<<<
B
=======
C
>>>>>>>
Sorting hunks
~~~~~~~~~~~~~
As before, lets imagine that a common ancestor had a file with line A
its early part, and line X in its late part. And then four branches
are forked that do these things:
- AB: changes A to B
- AC: changes A to C
- XY: changes X to Y
- XZ: changes X to Z
Now, forking a branch ABAC off of branch AB and then merging AC into
it, and forking a branch ACAB off of branch AC and then merging AB
into it, would yield the conflict in a different order. The former
would say "A became B or C, what now?" while the latter would say "A
became C or B, what now?"
As a reminder, the act of merging AC into ABAC and resolving the
conflict to leave line D means that the user declares:
After examining what branches AB and AC did, I believe that
making line A into line D is the best thing to do that is
compatible with what AB and AC wanted to do.
So the conflict we would see when merging AB into ACAB should be
resolved the same way---it is the resolution that is in line with that
declaration.
Imagine that similarly previously a branch XYXZ was forked from XY,
and XZ was merged into it, and resolved "X became Y or Z" into "X
became W".
Now, if a branch ABXY was forked from AB and then merged XY, then ABXY
would have line B in its early part and line Y in its later part.
Such a merge would be quite clean. We can construct 4 combinations
using these four branches ((AB, AC) x (XY, XZ)).
Merging ABXY and ACXZ would make "an early A became B or C, a late X
became Y or Z" conflict, while merging ACXY and ABXZ would make "an
early A became C or B, a late X became Y or Z". We can see there are
4 combinations of ("B or C", "C or B") x ("X or Y", "Y or X").
By sorting, the conflict is given its canonical name, namely, "an
early part became B or C, a late part becames X or Y", and whenever
any of these four patterns appear, and we can get to the same conflict
and resolution that we saw earlier.
Without the sorting, we'd have to somehow find a previous resolution
from combinatorial explosion.
Conflict ID calculation
~~~~~~~~~~~~~~~~~~~~~~~
Once the conflict normalization is done, the conflict ID is calculated
as the sha1 hash of the conflict hunks appended to each other,
separated by <NUL> characters. The conflict markers are stripped out
before the sha1 is calculated. So in the example above, where we
merge branch AC which changes line A to line C, into branch AB, which
changes line A to line C, the conflict ID would be
SHA1('B<NUL>C<NUL>').
If there are multiple conflicts in one file, the sha1 is calculated
the same way with all hunks appended to each other, in the order in
which they appear in the file, separated by a <NUL> character.

View file

@ -394,10 +394,6 @@ static int is_cmarker(char *buf, int marker_char, int marker_size)
* and NUL concatenated together.
*
* Return the number of conflict hunks found.
*
* NEEDSWORK: the logic and theory of operation behind this conflict
* normalization may deserve to be documented somewhere, perhaps in
* Documentation/technical/rerere.txt.
*/
static int handle_path(unsigned char *sha1, struct rerere_io *io, int marker_size)
{