2018-10-17 22:13:26 +00:00
|
|
|
#ifndef COMMIT_REACH_H
|
|
|
|
#define COMMIT_REACH_H
|
2018-07-20 16:33:02 +00:00
|
|
|
|
commit-reach.h: add missing declarations (hdr-check)
Add the necessary #includes and forward declarations to allow the header
file to pass the 'hdr-check' target.
Note that, since this header includes the commit-slab implementation
header file (indirectly via commit-slab.h), some of the commit-slab
inline functions (e.g contains_cache_at_peek()) will not compile without
the complete type of 'struct commit'. Hence, we replace the forward
declaration of 'struct commit' with the an #include of the 'commit.h'
header file.
It is possible, using the 'commit-slab-{decl,impl}.h' files, to avoid
this inclusion of the 'commit.h' header. Commit a9f1f1f9f8 ("commit-slab.h:
code split", 2018-05-19) separated the commit-slab interface from its
implementation, to allow for the definition of a public commit-slab data
structure. This enabled us to avoid including the commit-slab implementation
in a header file, which could result in the replication of the commit-slab
functions in each compilation unit in which it was included.
Indeed, if you compile with optimizations disabled, then run this script:
$ cat -n dup-static.sh
1 #!/bin/sh
2
3 nm $1 | grep ' t ' | cut -d' ' -f3 | sort | uniq -c |
4 sort -rn | grep -v ' 1'
$
$ ./dup-static.sh git | grep contains
24 init_contains_cache_with_stride
24 init_contains_cache
24 contains_cache_peek
24 contains_cache_at_peek
24 contains_cache_at
24 clear_contains_cache
$
you will find 24 copies of the commit-slab routines for the contains_cache.
Of course, when you enable optimizations again, these duplicate static
functions (mostly) disappear. Compiling with gcc at -O2, leaves two static
functions, thus:
$ nm commit-reach.o | grep contains_cache
0000000000000870 t contains_cache_at_peek.isra.1.constprop.6
$ nm ref-filter.o | grep contains_cache
00000000000002b0 t clear_contains_cache.isra.14
$
However, using a shared 'contains_cache' would result in all six of the
above functions as external public functions in the git binary. At present,
only three of these functions are actually called, so the trade-off
seems to favour letting the compiler inline the commit-slab functions.
Signed-off-by: Ramsay Jones <ramsay@ramsayjones.plus.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-10-27 01:53:57 +00:00
|
|
|
#include "commit.h"
|
2018-07-20 16:33:08 +00:00
|
|
|
#include "commit-slab.h"
|
|
|
|
|
2018-07-20 16:33:02 +00:00
|
|
|
struct commit_list;
|
2018-07-20 16:33:08 +00:00
|
|
|
struct ref_filter;
|
commit-reach.h: add missing declarations (hdr-check)
Add the necessary #includes and forward declarations to allow the header
file to pass the 'hdr-check' target.
Note that, since this header includes the commit-slab implementation
header file (indirectly via commit-slab.h), some of the commit-slab
inline functions (e.g contains_cache_at_peek()) will not compile without
the complete type of 'struct commit'. Hence, we replace the forward
declaration of 'struct commit' with the an #include of the 'commit.h'
header file.
It is possible, using the 'commit-slab-{decl,impl}.h' files, to avoid
this inclusion of the 'commit.h' header. Commit a9f1f1f9f8 ("commit-slab.h:
code split", 2018-05-19) separated the commit-slab interface from its
implementation, to allow for the definition of a public commit-slab data
structure. This enabled us to avoid including the commit-slab implementation
in a header file, which could result in the replication of the commit-slab
functions in each compilation unit in which it was included.
Indeed, if you compile with optimizations disabled, then run this script:
$ cat -n dup-static.sh
1 #!/bin/sh
2
3 nm $1 | grep ' t ' | cut -d' ' -f3 | sort | uniq -c |
4 sort -rn | grep -v ' 1'
$
$ ./dup-static.sh git | grep contains
24 init_contains_cache_with_stride
24 init_contains_cache
24 contains_cache_peek
24 contains_cache_at_peek
24 contains_cache_at
24 clear_contains_cache
$
you will find 24 copies of the commit-slab routines for the contains_cache.
Of course, when you enable optimizations again, these duplicate static
functions (mostly) disappear. Compiling with gcc at -O2, leaves two static
functions, thus:
$ nm commit-reach.o | grep contains_cache
0000000000000870 t contains_cache_at_peek.isra.1.constprop.6
$ nm ref-filter.o | grep contains_cache
00000000000002b0 t clear_contains_cache.isra.14
$
However, using a shared 'contains_cache' would result in all six of the
above functions as external public functions in the git binary. At present,
only three of these functions are actually called, so the trade-off
seems to favour letting the compiler inline the commit-slab functions.
Signed-off-by: Ramsay Jones <ramsay@ramsayjones.plus.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-10-27 01:53:57 +00:00
|
|
|
struct object_id;
|
|
|
|
struct object_array;
|
2018-07-20 16:33:02 +00:00
|
|
|
|
2024-02-28 09:44:14 +00:00
|
|
|
int repo_get_merge_bases(struct repository *r,
|
|
|
|
struct commit *rev1,
|
|
|
|
struct commit *rev2,
|
|
|
|
struct commit_list **result);
|
2024-02-28 09:44:16 +00:00
|
|
|
int repo_get_merge_bases_many(struct repository *r,
|
|
|
|
struct commit *one, int n,
|
|
|
|
struct commit **twos,
|
|
|
|
struct commit_list **result);
|
2018-07-20 16:33:02 +00:00
|
|
|
/* To be used only when object flags after this call no longer matter */
|
2024-02-28 09:44:17 +00:00
|
|
|
int repo_get_merge_bases_many_dirty(struct repository *r,
|
|
|
|
struct commit *one, int n,
|
|
|
|
struct commit **twos,
|
|
|
|
struct commit_list **result);
|
2018-11-14 00:12:55 +00:00
|
|
|
|
2024-02-28 09:44:15 +00:00
|
|
|
int get_octopus_merge_bases(struct commit_list *in, struct commit_list **result);
|
2018-07-20 16:33:02 +00:00
|
|
|
|
2020-06-23 18:42:22 +00:00
|
|
|
int repo_is_descendant_of(struct repository *r,
|
|
|
|
struct commit *commit,
|
|
|
|
struct commit_list *with_commit);
|
2018-11-14 00:12:56 +00:00
|
|
|
int repo_in_merge_bases(struct repository *r,
|
|
|
|
struct commit *commit,
|
|
|
|
struct commit *reference);
|
|
|
|
int repo_in_merge_bases_many(struct repository *r,
|
|
|
|
struct commit *commit,
|
commit-reach(repo_in_merge_bases_many): optionally expect missing commits
Currently this function treats unrelated commit histories the same way
as commit histories with missing commit objects.
Typically, missing commit objects constitute a corrupt repository,
though, and should be reported as such. The next commits will make it
so, but there is one exception: In `git fetch --update-shallow` we
_expect_ commit objects to be missing, and we do want to treat the
now-incomplete commit histories as unrelated.
To allow for that, let's introduce an additional parameter that is
passed to `repo_in_merge_bases_many()` to trigger this behavior, and use
it in the two callers in `shallow.c`.
This commit changes behavior slightly: unless called from the
`shallow.c` functions that set the `ignore_missing_commits` bit, any
non-existing tip commit that is passed to `repo_in_merge_bases_many()`
will now result in an error.
Note: When encountering missing commits while traversing the commit
history in search for merge bases, with this commit there won't be a
change in behavior just yet, their children will still be interpreted as
root commits. This bug will get fixed by follow-up commits.
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-02-28 09:44:08 +00:00
|
|
|
int nr_reference, struct commit **reference,
|
|
|
|
int ignore_missing_commits);
|
2018-07-20 16:33:02 +00:00
|
|
|
|
|
|
|
/*
|
|
|
|
* Takes a list of commits and returns a new list where those
|
|
|
|
* have been removed that can be reached from other commits in
|
|
|
|
* the list. It is useful for, e.g., reducing the commits
|
|
|
|
* randomly thrown at the git-merge command and removing
|
|
|
|
* redundant commits that the user shouldn't have given to it.
|
|
|
|
*
|
|
|
|
* This function destroys the STALE bit of the commit objects'
|
|
|
|
* flags.
|
|
|
|
*/
|
|
|
|
struct commit_list *reduce_heads(struct commit_list *heads);
|
|
|
|
|
|
|
|
/*
|
|
|
|
* Like `reduce_heads()`, except it replaces the list. Use this
|
|
|
|
* instead of `foo = reduce_heads(foo);` to avoid memory leaks.
|
|
|
|
*/
|
|
|
|
void reduce_heads_replace(struct commit_list **heads);
|
|
|
|
|
2018-07-20 16:33:06 +00:00
|
|
|
int ref_newer(const struct object_id *new_oid, const struct object_id *old_oid);
|
|
|
|
|
2018-07-20 16:33:08 +00:00
|
|
|
/*
|
|
|
|
* Unknown has to be "0" here, because that's the default value for
|
|
|
|
* contains_cache slab entries that have not yet been assigned.
|
|
|
|
*/
|
|
|
|
enum contains_result {
|
|
|
|
CONTAINS_UNKNOWN = 0,
|
|
|
|
CONTAINS_NO,
|
|
|
|
CONTAINS_YES
|
|
|
|
};
|
|
|
|
|
|
|
|
define_commit_slab(contains_cache, enum contains_result);
|
|
|
|
|
|
|
|
int commit_contains(struct ref_filter *filter, struct commit *commit,
|
|
|
|
struct commit_list *list, struct contains_cache *cache);
|
|
|
|
|
2018-07-20 16:33:13 +00:00
|
|
|
/*
|
|
|
|
* Determine if every commit in 'from' can reach at least one commit
|
|
|
|
* that is marked with 'with_flag'. As we traverse, use 'assign_flag'
|
|
|
|
* as a marker for commits that are already visited. Do not walk
|
2018-07-20 16:33:28 +00:00
|
|
|
* commits with date below 'min_commit_date' or generation below
|
|
|
|
* 'min_generation'.
|
2018-07-20 16:33:13 +00:00
|
|
|
*/
|
|
|
|
int can_all_from_reach_with_flag(struct object_array *from,
|
|
|
|
unsigned int with_flag,
|
|
|
|
unsigned int assign_flag,
|
2018-07-20 16:33:28 +00:00
|
|
|
time_t min_commit_date,
|
2021-01-16 18:11:13 +00:00
|
|
|
timestamp_t min_generation);
|
2018-07-20 16:33:23 +00:00
|
|
|
int can_all_from_reach(struct commit_list *from, struct commit_list *to,
|
|
|
|
int commit_date_cutoff);
|
2018-07-20 16:33:13 +00:00
|
|
|
|
2018-11-02 13:14:45 +00:00
|
|
|
|
|
|
|
/*
|
|
|
|
* Return a list of commits containing the commits in the 'to' array
|
|
|
|
* that are reachable from at least one commit in the 'from' array.
|
|
|
|
* Also add the given 'flag' to each of the commits in the returned list.
|
|
|
|
*
|
|
|
|
* This method uses the PARENT1 and PARENT2 flags during its operation,
|
|
|
|
* so be sure these flags are not set before calling the method.
|
|
|
|
*/
|
|
|
|
struct commit_list *get_reachable_subset(struct commit **from, int nr_from,
|
|
|
|
struct commit **to, int nr_to,
|
|
|
|
unsigned int reachable_flag);
|
|
|
|
|
commit-reach: implement ahead_behind() logic
Fully implement the commit-counting logic required to determine
ahead/behind counts for a batch of commit pairs. This is a new library
method within commit-reach.h. This method will be linked to the
for-each-ref builtin in the next change.
The interface for ahead_behind() uses two arrays. The first array of
commits contains the list of all starting points for the walk. This
includes all tip commits _and_ base commits. The second array specifies
base/tip pairs by pointing to commits within the first array, by index.
The second array also stores the resulting ahead/behind counts for each
of these pairs.
This implementation of ahead_behind() allows multiple bases, if desired.
Even with multiple bases, there is only one commit walk used for
counting the ahead/behind values, saving time when the base/tip ranges
overlap significantly.
This interface for ahead_behind() also makes it very easy to call
ensure_generations_valid() on the entire array of bases and tips. This
call is necessary because it is critical that the walk that counts
ahead/behind values never walks a commit more than once. Without
generation numbers on every commit, there is a possibility that a
commit date skew could cause the walk to revisit a commit and then
double-count it. For this reason, it is strongly recommended that 'git
ahead-behind' is only run in a repository with a commit-graph file that
covers most of the reachable commits, storing precomputed generation
numbers. If no commit-graph exists, this walk will be much slower as it
must walk all reachable commits in ensure_generations_valid() before
performing the counting logic.
It is possible to detect if generation numbers are available at run time
and redirect the implementation to another algorithm that does not
require this property. However, that implementation requires a commit
walk per base/tip pair _and_ can be slower due to the commit date
heuristics required. Such an implementation could be considered in the
future if there is a reason to include it, but most Git hosts should
already be generating a commit-graph file as part of repository
maintenance. Most Git clients should also be generating commit-graph
files as part of background maintenance or automatic GCs.
Now, let's discuss the ahead/behind counting algorithm.
The first array of commits are considered the starting commits. The
index within that array will play a critical role.
We create a new commit slab that maps commits to a bitmap. For a given
commit (anywhere in the history), its bitmap stores information relative
to which of the input commits can reach that commit. The ith bit will be
on if the ith commit from the starting list can reach that commit. It is
important to notice that these bitmaps are not the typical "reachability
bitmaps" that are stored in .bitmap files. Instead of signalling which
objects are reachable from the current commit, they instead signal
"which starting commits can reach me?" It is also important to know that
the bitmap is not necessarily "complete" until we walk that commit. We
will perform a commit walk by generation number in such a way that we
can guarantee the bitmap is correct when we visit that commit.
At the beginning of the ahead_behind() method, we initialize the bitmaps
for each of the starting commits. By enabling the ith bit for the ith
starting commit, we signal "the ith commit can reach itself."
We walk commits by popping the commit with maximum generation number out
of the queue, guaranteeing that we will never walk a child of that
commit in any future steps.
As we walk, we load the bitmap for the current commit and perform two
main steps. The _second_ step examines each parent of the current commit
and adds the current commit's bitmap bits to each parent's bitmap. (We
create a new bitmap for the parent if this is our first time seeing that
parent.) After adding the bits to the parent's bitmap, the parent is
added to the walk queue. Due to this passing of bits to parents, the
current commit has a guarantee that the ith bit is enabled on its bitmap
if and only if the ith commit can reach the current commit.
The first step of the walk is to examine the bitmask on the current
commit and decide which ranges the commit is in or not. Due to the "bit
pushing" in the second step, we have a guarantee that the ith bit of the
current commit's bitmap is on if and only if the ith starting commit can
reach it. For each ahead_behind_count struct, check the base_index and
tip_index to see if those bits are enabled on the current bitmap. If
exactly one bit is enabled, then increment the corresponding 'ahead' or
'behind' count. This increment is the reason we _absolutely need_ to
walk commits at most once.
The only subtle thing to do with this walk is to check to see if a
parent has all bits on in its bitmap, in which case it becomes "stale"
and is marked with the STALE bit. This allows queue_has_nonstale() to be
the terminating condition of the walk, which greatly reduces the number
of commits walked if all of the commits are nearby in history. It avoids
walking a large number of common commits when there is a deep history.
We also use the helper method insert_no_dup() to add commits to the
priority queue without adding them multiple times. This uses the PARENT2
flag. Thus, we must clear both the STALE and PARENT2 bits of all
commits, in case ahead_behind() is called multiple times in the same
process.
Co-authored-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Derrick Stolee <derrickstolee@github.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-03-20 11:26:53 +00:00
|
|
|
struct ahead_behind_count {
|
|
|
|
/**
|
|
|
|
* As input, the *_index members indicate which positions in
|
|
|
|
* the 'tips' array correspond to the tip and base of this
|
|
|
|
* comparison.
|
|
|
|
*/
|
|
|
|
size_t tip_index;
|
|
|
|
size_t base_index;
|
|
|
|
|
|
|
|
/**
|
|
|
|
* These values store the computed counts for each side of the
|
|
|
|
* symmetric difference:
|
|
|
|
*
|
|
|
|
* 'ahead' stores the number of commits reachable from the tip
|
|
|
|
* and not reachable from the base.
|
|
|
|
*
|
|
|
|
* 'behind' stores the number of commits reachable from the base
|
|
|
|
* and not reachable from the tip.
|
|
|
|
*/
|
|
|
|
unsigned int ahead;
|
|
|
|
unsigned int behind;
|
|
|
|
};
|
|
|
|
|
|
|
|
/*
|
|
|
|
* Given an array of commits and an array of ahead_behind_count pairs,
|
|
|
|
* compute the ahead/behind counts for each pair.
|
|
|
|
*/
|
|
|
|
void ahead_behind(struct repository *r,
|
|
|
|
struct commit **commits, size_t commits_nr,
|
|
|
|
struct ahead_behind_count *counts, size_t counts_nr);
|
|
|
|
|
commit-reach: add tips_reachable_from_bases()
Both 'git for-each-ref --merged=<X>' and 'git branch --merged=<X>' use
the ref-filter machinery to select references or branches (respectively)
that are reachable from a set of commits presented by one or more
--merged arguments. This happens within reach_filter(), which uses the
revision-walk machinery to walk history in a standard way.
However, the commit-reach.c file is full of custom searches that are
more efficient, especially for reachability queries that can terminate
early when reachability is discovered. Add a new
tips_reachable_from_bases() method to commit-reach.c and call it from
within reach_filter() in ref-filter.c. This affects both 'git branch'
and 'git for-each-ref' as tested in p1500-graph-walks.sh.
For the Linux kernel repository, we take an already-fast algorithm and
make it even faster:
Test HEAD~1 HEAD
-------------------------------------------------------------------
1500.5: contains: git for-each-ref --merged 0.13 0.02 -84.6%
1500.6: contains: git branch --merged 0.14 0.02 -85.7%
1500.7: contains: git tag --merged 0.15 0.03 -80.0%
(Note that we remove the iterative 'git rev-list' test from p1500
because it no longer makes sense as a comparison to 'git for-each-ref'
and would just waste time running it for these comparisons.)
The algorithm is implemented in commit-reach.c in the method
tips_reachable_from_base(). This method takes a string_list of tips and
assigns the 'util' for each item with the value 1 if the base commit can
reach those tips.
Like other reachability queries in commit-reach.c, the fastest way to
search for "can A reach B?" is to do a depth-first search up to the
generation number of B, preferring to explore first parents before later
parents. While we must walk all reachable commits up to that generation
number when the answer is "no", the depth-first search can answer "yes"
much faster than other approaches in most cases.
This search becomes trickier when there are multiple targets for the
depth-first search. The commits with lower generation number are more
likely to be within the history of the start commit, but we don't want
to waste time searching commits of low generation number if the commit
target with lowest generation number has already been found.
The trick here is to take the input commits and sort them by generation
number in ascending order. Track the index within this order as
min_generation_index. When we find a commit, if its index in the list is
equal to min_generation_index, then we can increase the generation
number boundary of our search to the next-lowest value in the list.
With this mechanism, the number of commits to search is minimized with
respect to the depth-first search heuristic. We will walk all commits up
to the minimum generation number of a commit that is _not_ reachable
from the start, but we will walk only the necessary portion of the
depth-first search for the reachable commits of lower generation.
Add extra tests for this behavior in t6600-test-reach.sh as the
interesting data shape of that repository can sometimes demonstrate
corner case bugs.
Signed-off-by: Derrick Stolee <derrickstolee@github.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-03-20 11:26:55 +00:00
|
|
|
/*
|
|
|
|
* For all tip commits, add 'mark' to their flags if and only if they
|
|
|
|
* are reachable from one of the commits in 'bases'.
|
|
|
|
*/
|
|
|
|
void tips_reachable_from_bases(struct repository *r,
|
|
|
|
struct commit_list *bases,
|
|
|
|
struct commit **tips, size_t tips_nr,
|
|
|
|
int mark);
|
|
|
|
|
commit-reach: add get_branch_base_for_tip
Add a new reachability algorithm that intends to discover (from a heuristic)
which branch was used as the starting point for a given commit. Add focused
tests using the 'test-tool reach' command.
In repositories that use pull requests (or merge requests) to advance one or
more "protected" branches, the history of that reference can be recovered by
following the first-parent history in most cases. Most are completed using
no-fast-forward merges, though squash merges are quite common. Less common
is rebase-and-merge, which still validates this assumption. Finally, the
case that breaks this assumption is the fast-forward update (with potential
rebasing). Even in this case, the previous commit commonly appears in the
first-parent history of the branch.
Similar assumptions can be made for a topic branch created by a single user
with the intention to merge back into another branch. Using 'git commit',
'git merge', and 'git cherry-pick' from HEAD will default to having the
first-parent commit be the previous commit at HEAD. This history changes
only with commands such as 'git reset' or 'git rebase', where the command
names also imply that the branch is starting from a new location.
With this movement of branches in mind, the following heuristic is proposed
as a way to determine the base branch for a given source branch:
Among a list of candidate base branches, select the candidate that
minimizes the number of commits in the first-parent history of the source
that are not in the first-parent history of the candidate.
Prior third-party solutions to this problem have used this optimization
criteria, but have relied upon extracting the first-parent history and
comparing those lists as tables instead of using commit-graph walks.
Given current command-line interface options, this optimization criteria is
not easy to detect directly. Even using the command
git rev-list --count --first-parent <base>..<source>
does not measure this count, as it uses full reachability from <base> to
determine which commits to remove from the range '<base>..<source>'. This
may lead to one asking if we should instead be using the full reachability
of the candidate and only the first-parent history of the source. This,
unfortunately, does not work for repositories that use long-lived branches
and automation to merge across those branches.
In extremely large repositories, merging into a single trunk may not be
feasible. This is usually due to the desired frequency of updates
(thousands of engineers doing daily work) combined with the time required to
perform a validation build. These factors combine to create significant
risk of semantic merge conflicts, leading to build breaks on the trunk. In
response, repository maintainers can create a single Level Zero (L0) trunk
and multiple Level One (L1) branches. By partitioning the engineers by
organization, these engineers may see lower risk of semantic merge conflicts
as well as be protected against build breaks in other L1 branches. The key
to making this system work is a semi-automated process of merging L1
branches into the L0 trunk and vice-versa. In a large enough organization,
these L1 branches may further split into L2 or L3 branches, but the same
principles apply for merging across deeper levels.
If these automated merges use a typical merge with the second parent
bringing in the "new" content, then each L0 and L1 branch can track its
previous positions by following first-parent history, which appear as
parallel paths (until reaching the first place where the branches diverged).
If we also walk to second parents, then the histories overlap significantly
and cannot be distinguished except for very-recent changes.
For this reason, the first-parent condition should be symmetrical across the
base and source branches.
Another common case for desiring the result of this optimization method is
the use of release branches. When releasing a version of a repository, a
branch can be used to track that release. Any updates that are worth fixing
in that release can be merged to the release branch and shipped with only
the necessary fixes without any new features introduced in the trunk branch.
The 'maint-2.<X>' branches represent this pattern in the Git project. The
microsoft/git fork uses 'vfs-2.<X>.<Y>' branches to track the changes that
are custom to that fork on top of each upstream Git release 2.<X>.<Y>. This
application doesn't need the symmetrical first-parent condition, but the use
of first-parent histories does not change the results for these branches.
To determine the base branch from a list of candidates, create a new method
in commit-reach.c that performs a single* commit-graph walk. The core
concept is to walk first-parents starting at the candidate bases and the
source, tracking the "best" base to reach a given commit. Use generation
numbers to ensure that a commit is walked at most once and all children have
been explored before visiting it. When reaching a commit that is reachable
from both a base and the source, we will then have a guarantee that this is
the closest intersection of first-parent histories. Track the best base to
reach that commit and return it as a result. In rare cases involving
multiple root commits, the first-parent history of the source may never
intersect any of the candidates and thus a null result is returned.
* There are up to two walks, since we require all commits to have a computed
generation number in order to avoid incorrect results. This is similar to
the need for computed generation numbers in ahead_behind() as implemented
in fd67d149bde (commit-reach: implement ahead_behind() logic, 2023-03-20).
In order to track the "best" base, use a new commit slab that stores an
integer. This value defaults to zero upon initialization, so use -1 to
track that the source commit can reach this commit and use 'i + 1' to track
that the ith base can reach this commit. When multiple bases can reach a
commit, minimize the index to break ties. This allows the caller to specify
an order to the bases that determines some amount of preference when the
heuristic does not result in a unique result.
The trickiest part of the integer slab is what happens when reaching a
collision among the histories of the bases and the history of the source.
This is noticed when viewing the first parent and seeing that it has a slab
value that differs in sign (negative or positive). In this case, the
collision commit is stored in the method variable 'branch_point' and its
slab value is set to -1. The index of the best base (so far) is stored in
the method variable 'best_index'. It is possible that there are multiple
commits that have the branch_point as its first parent, leading to multiple
updates of best_index. The result is determined when 'branch_point' is
visited in the commit walk, giving the guarantee that all commits that could
reach 'branch_point' were visited.
Several interesting cases of collisions and different results are tested in
the t6600-test-reach.sh script. Recall that this script also tests the
algorithm in three possible states involving the commit-graph file and how
many commits are written in the file. This provides some coverage of the
need (and lack of need) for the ensure_generations_valid() method.
Signed-off-by: Derrick Stolee <stolee@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-08-14 10:31:27 +00:00
|
|
|
/*
|
|
|
|
* Given a 'tip' commit and a list potential 'bases', return the index 'i' that
|
|
|
|
* minimizes the number of commits in the first-parent history of 'tip' and not
|
|
|
|
* in the first-parent history of 'bases[i]'.
|
|
|
|
*
|
|
|
|
* Among a list of long-lived branches that are updated only by merges (with the
|
|
|
|
* first parent being the previous position of the branch), this would inform
|
|
|
|
* which branch was used to create the tip reference.
|
|
|
|
*
|
|
|
|
* Returns -1 if no common point is found in first-parent histories, which is
|
|
|
|
* rare, but possible with multiple root commits.
|
|
|
|
*/
|
|
|
|
int get_branch_base_for_tip(struct repository *r,
|
|
|
|
struct commit *tip,
|
|
|
|
struct commit **bases,
|
|
|
|
size_t bases_nr);
|
|
|
|
|
2018-07-20 16:33:02 +00:00
|
|
|
#endif
|