2012-03-31 22:10:11 +00:00
|
|
|
#include "cache.h"
|
|
|
|
#include "mergesort.h"
|
|
|
|
|
mergesort: use ranks stack
The bottom-up mergesort implementation needs to skip through sublists a
lot. A recursive version could avoid that, but would require log2(n)
stack frames. Explicitly manage a stack of sorted sublists of various
lengths instead to avoid fast-forwarding while also keeping a lid on
memory usage.
While this patch was developed independently, a ranks stack is also used
in https://github.com/mono/mono/blob/master/mono/eglib/sort.frag.h by
the Mono project.
The idea is to keep slots for log2(n_max) sorted sublists, one for each
power of 2. Such a construct can accommodate lists of any length up to
n_max. Since there is a known maximum number of items (effectively
SIZE_MAX), we can preallocate the whole rank stack.
We add items one by one, which is akin to incrementing a binary number.
Make use of that by keeping track of the number of items and check bits
in it instead of checking for NULL in the rank stack when checking if a
sublist of a certain rank exists, in order to avoid memory accesses.
The first item can go into the empty first slot as a sublist of length
2^0. The second one needs to be merged with the previous sublist and
the result goes into the empty second slot as a sublist of length 2^1.
The third one goes into vacated first slot and so on. At the end we
merge all the sublists to get the result.
The new version still performs a stable sort by making sure to put items
seen earlier first when the compare function indicates equality. That's
done by preferring items from sublists with a higher rank.
The new merge function also tries to minimize the number of operations.
Like blame.c::blame_merge(), the function doesn't set the next pointer
if it already points to the right item, and it exits when it reaches the
end of one of the two sublists that it's given. The old code couldn't
do the latter because it kept all items in a single list.
The number of comparisons stays the same, though. Here's example output
of "test-tool mergesort test" for the rand distributions with the most
number of comparisons with the ranks stack:
$ t/helper/test-tool mergesort test | awk '
NR > 1 && $1 != "rand" {next}
$7 > max[$3] {max[$3] = $7; line[$3] = $0}
END {for (n in line) print line[n]}
'
distribut mode n m get_next set_next compare verdict
rand copy 100 32 669 420 569 OK
rand dither 1023 64 9997 5396 8974 OK
rand dither 1024 512 10007 6159 8983 OK
rand dither 1025 256 10993 5988 9968 OK
Here are the differences to the results without this patch:
distribut mode n m get_next set_next compare
rand copy 100 32 -515 -280 0
rand dither 1023 64 -6376 -4834 0
rand dither 1024 512 -6377 -4081 0
rand dither 1025 256 -7461 -5287 0
The numbers of get_next and set_next calls are reduced significantly.
NB: These winners are different than the ones shown in the patch that
introduced the unriffle mode because the addition of the unriffle_skewed
mode in between changed the consumption of rand() values.
Here are the distributions with the most comparisons overall with the
ranks stack:
$ t/helper/test-tool mergesort test | awk '
$7 > max[$3] {max[$3] = $7; line[$3] = $0}
END {for (n in line) print line[n]}
'
distribut mode n m get_next set_next compare verdict
sawtooth unriffle_skewed 100 128 689 632 589 OK
sawtooth unriffle_skewed 1023 1024 10230 10220 9207 OK
sawtooth unriffle 1024 1024 10241 10240 9217 OK
sawtooth unriffle_skewed 1025 2048 11266 10242 10241 OK
And here the differences to before:
distribut mode n m get_next set_next compare
sawtooth unriffle_skewed 100 128 -495 -68 0
sawtooth unriffle_skewed 1023 1024 -6143 -10 0
sawtooth unriffle 1024 1024 -6143 0 0
sawtooth unriffle_skewed 1025 2048 -7188 -1033 0
We get a similar reduction of get_next calls here, but only a slight
reduction of set_next calls, if at all.
And here are the results of p0071-sort.sh before:
0071.12: llist_mergesort() unsorted 0.36(0.33+0.01)
0071.14: llist_mergesort() sorted 0.15(0.13+0.01)
0071.16: llist_mergesort() reversed 0.16(0.14+0.01)
... and here the ones with this patch:
0071.12: llist_mergesort() unsorted 0.24(0.22+0.01)
0071.14: llist_mergesort() sorted 0.12(0.10+0.01)
0071.16: llist_mergesort() reversed 0.12(0.10+0.01)
NB: We can't use t/perf/run to compare revisions in one run because it
uses the test-tool from the worktree, not from the revisions being
tested.
Signed-off-by: René Scharfe <l.s.r@web.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-10-01 09:22:39 +00:00
|
|
|
/* Combine two sorted lists. Take from `list` on equality. */
|
|
|
|
static void *llist_merge(void *list, void *other,
|
|
|
|
void *(*get_next_fn)(const void *),
|
|
|
|
void (*set_next_fn)(void *, void *),
|
|
|
|
int (*compare_fn)(const void *, const void *))
|
2012-03-31 22:10:11 +00:00
|
|
|
{
|
mergesort: use ranks stack
The bottom-up mergesort implementation needs to skip through sublists a
lot. A recursive version could avoid that, but would require log2(n)
stack frames. Explicitly manage a stack of sorted sublists of various
lengths instead to avoid fast-forwarding while also keeping a lid on
memory usage.
While this patch was developed independently, a ranks stack is also used
in https://github.com/mono/mono/blob/master/mono/eglib/sort.frag.h by
the Mono project.
The idea is to keep slots for log2(n_max) sorted sublists, one for each
power of 2. Such a construct can accommodate lists of any length up to
n_max. Since there is a known maximum number of items (effectively
SIZE_MAX), we can preallocate the whole rank stack.
We add items one by one, which is akin to incrementing a binary number.
Make use of that by keeping track of the number of items and check bits
in it instead of checking for NULL in the rank stack when checking if a
sublist of a certain rank exists, in order to avoid memory accesses.
The first item can go into the empty first slot as a sublist of length
2^0. The second one needs to be merged with the previous sublist and
the result goes into the empty second slot as a sublist of length 2^1.
The third one goes into vacated first slot and so on. At the end we
merge all the sublists to get the result.
The new version still performs a stable sort by making sure to put items
seen earlier first when the compare function indicates equality. That's
done by preferring items from sublists with a higher rank.
The new merge function also tries to minimize the number of operations.
Like blame.c::blame_merge(), the function doesn't set the next pointer
if it already points to the right item, and it exits when it reaches the
end of one of the two sublists that it's given. The old code couldn't
do the latter because it kept all items in a single list.
The number of comparisons stays the same, though. Here's example output
of "test-tool mergesort test" for the rand distributions with the most
number of comparisons with the ranks stack:
$ t/helper/test-tool mergesort test | awk '
NR > 1 && $1 != "rand" {next}
$7 > max[$3] {max[$3] = $7; line[$3] = $0}
END {for (n in line) print line[n]}
'
distribut mode n m get_next set_next compare verdict
rand copy 100 32 669 420 569 OK
rand dither 1023 64 9997 5396 8974 OK
rand dither 1024 512 10007 6159 8983 OK
rand dither 1025 256 10993 5988 9968 OK
Here are the differences to the results without this patch:
distribut mode n m get_next set_next compare
rand copy 100 32 -515 -280 0
rand dither 1023 64 -6376 -4834 0
rand dither 1024 512 -6377 -4081 0
rand dither 1025 256 -7461 -5287 0
The numbers of get_next and set_next calls are reduced significantly.
NB: These winners are different than the ones shown in the patch that
introduced the unriffle mode because the addition of the unriffle_skewed
mode in between changed the consumption of rand() values.
Here are the distributions with the most comparisons overall with the
ranks stack:
$ t/helper/test-tool mergesort test | awk '
$7 > max[$3] {max[$3] = $7; line[$3] = $0}
END {for (n in line) print line[n]}
'
distribut mode n m get_next set_next compare verdict
sawtooth unriffle_skewed 100 128 689 632 589 OK
sawtooth unriffle_skewed 1023 1024 10230 10220 9207 OK
sawtooth unriffle 1024 1024 10241 10240 9217 OK
sawtooth unriffle_skewed 1025 2048 11266 10242 10241 OK
And here the differences to before:
distribut mode n m get_next set_next compare
sawtooth unriffle_skewed 100 128 -495 -68 0
sawtooth unriffle_skewed 1023 1024 -6143 -10 0
sawtooth unriffle 1024 1024 -6143 0 0
sawtooth unriffle_skewed 1025 2048 -7188 -1033 0
We get a similar reduction of get_next calls here, but only a slight
reduction of set_next calls, if at all.
And here are the results of p0071-sort.sh before:
0071.12: llist_mergesort() unsorted 0.36(0.33+0.01)
0071.14: llist_mergesort() sorted 0.15(0.13+0.01)
0071.16: llist_mergesort() reversed 0.16(0.14+0.01)
... and here the ones with this patch:
0071.12: llist_mergesort() unsorted 0.24(0.22+0.01)
0071.14: llist_mergesort() sorted 0.12(0.10+0.01)
0071.16: llist_mergesort() reversed 0.12(0.10+0.01)
NB: We can't use t/perf/run to compare revisions in one run because it
uses the test-tool from the worktree, not from the revisions being
tested.
Signed-off-by: René Scharfe <l.s.r@web.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-10-01 09:22:39 +00:00
|
|
|
void *result = list, *tail;
|
2012-03-31 22:10:11 +00:00
|
|
|
|
mergesort: use ranks stack
The bottom-up mergesort implementation needs to skip through sublists a
lot. A recursive version could avoid that, but would require log2(n)
stack frames. Explicitly manage a stack of sorted sublists of various
lengths instead to avoid fast-forwarding while also keeping a lid on
memory usage.
While this patch was developed independently, a ranks stack is also used
in https://github.com/mono/mono/blob/master/mono/eglib/sort.frag.h by
the Mono project.
The idea is to keep slots for log2(n_max) sorted sublists, one for each
power of 2. Such a construct can accommodate lists of any length up to
n_max. Since there is a known maximum number of items (effectively
SIZE_MAX), we can preallocate the whole rank stack.
We add items one by one, which is akin to incrementing a binary number.
Make use of that by keeping track of the number of items and check bits
in it instead of checking for NULL in the rank stack when checking if a
sublist of a certain rank exists, in order to avoid memory accesses.
The first item can go into the empty first slot as a sublist of length
2^0. The second one needs to be merged with the previous sublist and
the result goes into the empty second slot as a sublist of length 2^1.
The third one goes into vacated first slot and so on. At the end we
merge all the sublists to get the result.
The new version still performs a stable sort by making sure to put items
seen earlier first when the compare function indicates equality. That's
done by preferring items from sublists with a higher rank.
The new merge function also tries to minimize the number of operations.
Like blame.c::blame_merge(), the function doesn't set the next pointer
if it already points to the right item, and it exits when it reaches the
end of one of the two sublists that it's given. The old code couldn't
do the latter because it kept all items in a single list.
The number of comparisons stays the same, though. Here's example output
of "test-tool mergesort test" for the rand distributions with the most
number of comparisons with the ranks stack:
$ t/helper/test-tool mergesort test | awk '
NR > 1 && $1 != "rand" {next}
$7 > max[$3] {max[$3] = $7; line[$3] = $0}
END {for (n in line) print line[n]}
'
distribut mode n m get_next set_next compare verdict
rand copy 100 32 669 420 569 OK
rand dither 1023 64 9997 5396 8974 OK
rand dither 1024 512 10007 6159 8983 OK
rand dither 1025 256 10993 5988 9968 OK
Here are the differences to the results without this patch:
distribut mode n m get_next set_next compare
rand copy 100 32 -515 -280 0
rand dither 1023 64 -6376 -4834 0
rand dither 1024 512 -6377 -4081 0
rand dither 1025 256 -7461 -5287 0
The numbers of get_next and set_next calls are reduced significantly.
NB: These winners are different than the ones shown in the patch that
introduced the unriffle mode because the addition of the unriffle_skewed
mode in between changed the consumption of rand() values.
Here are the distributions with the most comparisons overall with the
ranks stack:
$ t/helper/test-tool mergesort test | awk '
$7 > max[$3] {max[$3] = $7; line[$3] = $0}
END {for (n in line) print line[n]}
'
distribut mode n m get_next set_next compare verdict
sawtooth unriffle_skewed 100 128 689 632 589 OK
sawtooth unriffle_skewed 1023 1024 10230 10220 9207 OK
sawtooth unriffle 1024 1024 10241 10240 9217 OK
sawtooth unriffle_skewed 1025 2048 11266 10242 10241 OK
And here the differences to before:
distribut mode n m get_next set_next compare
sawtooth unriffle_skewed 100 128 -495 -68 0
sawtooth unriffle_skewed 1023 1024 -6143 -10 0
sawtooth unriffle 1024 1024 -6143 0 0
sawtooth unriffle_skewed 1025 2048 -7188 -1033 0
We get a similar reduction of get_next calls here, but only a slight
reduction of set_next calls, if at all.
And here are the results of p0071-sort.sh before:
0071.12: llist_mergesort() unsorted 0.36(0.33+0.01)
0071.14: llist_mergesort() sorted 0.15(0.13+0.01)
0071.16: llist_mergesort() reversed 0.16(0.14+0.01)
... and here the ones with this patch:
0071.12: llist_mergesort() unsorted 0.24(0.22+0.01)
0071.14: llist_mergesort() sorted 0.12(0.10+0.01)
0071.16: llist_mergesort() reversed 0.12(0.10+0.01)
NB: We can't use t/perf/run to compare revisions in one run because it
uses the test-tool from the worktree, not from the revisions being
tested.
Signed-off-by: René Scharfe <l.s.r@web.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-10-01 09:22:39 +00:00
|
|
|
if (compare_fn(list, other) > 0) {
|
|
|
|
result = other;
|
|
|
|
goto other;
|
|
|
|
}
|
|
|
|
for (;;) {
|
|
|
|
do {
|
|
|
|
tail = list;
|
|
|
|
list = get_next_fn(list);
|
|
|
|
if (!list) {
|
|
|
|
set_next_fn(tail, other);
|
|
|
|
return result;
|
|
|
|
}
|
|
|
|
} while (compare_fn(list, other) <= 0);
|
|
|
|
set_next_fn(tail, other);
|
|
|
|
other:
|
|
|
|
do {
|
|
|
|
tail = other;
|
|
|
|
other = get_next_fn(other);
|
|
|
|
if (!other) {
|
|
|
|
set_next_fn(tail, list);
|
|
|
|
return result;
|
|
|
|
}
|
|
|
|
} while (compare_fn(list, other) > 0);
|
|
|
|
set_next_fn(tail, list);
|
|
|
|
}
|
2012-03-31 22:10:11 +00:00
|
|
|
}
|
|
|
|
|
mergesort: use ranks stack
The bottom-up mergesort implementation needs to skip through sublists a
lot. A recursive version could avoid that, but would require log2(n)
stack frames. Explicitly manage a stack of sorted sublists of various
lengths instead to avoid fast-forwarding while also keeping a lid on
memory usage.
While this patch was developed independently, a ranks stack is also used
in https://github.com/mono/mono/blob/master/mono/eglib/sort.frag.h by
the Mono project.
The idea is to keep slots for log2(n_max) sorted sublists, one for each
power of 2. Such a construct can accommodate lists of any length up to
n_max. Since there is a known maximum number of items (effectively
SIZE_MAX), we can preallocate the whole rank stack.
We add items one by one, which is akin to incrementing a binary number.
Make use of that by keeping track of the number of items and check bits
in it instead of checking for NULL in the rank stack when checking if a
sublist of a certain rank exists, in order to avoid memory accesses.
The first item can go into the empty first slot as a sublist of length
2^0. The second one needs to be merged with the previous sublist and
the result goes into the empty second slot as a sublist of length 2^1.
The third one goes into vacated first slot and so on. At the end we
merge all the sublists to get the result.
The new version still performs a stable sort by making sure to put items
seen earlier first when the compare function indicates equality. That's
done by preferring items from sublists with a higher rank.
The new merge function also tries to minimize the number of operations.
Like blame.c::blame_merge(), the function doesn't set the next pointer
if it already points to the right item, and it exits when it reaches the
end of one of the two sublists that it's given. The old code couldn't
do the latter because it kept all items in a single list.
The number of comparisons stays the same, though. Here's example output
of "test-tool mergesort test" for the rand distributions with the most
number of comparisons with the ranks stack:
$ t/helper/test-tool mergesort test | awk '
NR > 1 && $1 != "rand" {next}
$7 > max[$3] {max[$3] = $7; line[$3] = $0}
END {for (n in line) print line[n]}
'
distribut mode n m get_next set_next compare verdict
rand copy 100 32 669 420 569 OK
rand dither 1023 64 9997 5396 8974 OK
rand dither 1024 512 10007 6159 8983 OK
rand dither 1025 256 10993 5988 9968 OK
Here are the differences to the results without this patch:
distribut mode n m get_next set_next compare
rand copy 100 32 -515 -280 0
rand dither 1023 64 -6376 -4834 0
rand dither 1024 512 -6377 -4081 0
rand dither 1025 256 -7461 -5287 0
The numbers of get_next and set_next calls are reduced significantly.
NB: These winners are different than the ones shown in the patch that
introduced the unriffle mode because the addition of the unriffle_skewed
mode in between changed the consumption of rand() values.
Here are the distributions with the most comparisons overall with the
ranks stack:
$ t/helper/test-tool mergesort test | awk '
$7 > max[$3] {max[$3] = $7; line[$3] = $0}
END {for (n in line) print line[n]}
'
distribut mode n m get_next set_next compare verdict
sawtooth unriffle_skewed 100 128 689 632 589 OK
sawtooth unriffle_skewed 1023 1024 10230 10220 9207 OK
sawtooth unriffle 1024 1024 10241 10240 9217 OK
sawtooth unriffle_skewed 1025 2048 11266 10242 10241 OK
And here the differences to before:
distribut mode n m get_next set_next compare
sawtooth unriffle_skewed 100 128 -495 -68 0
sawtooth unriffle_skewed 1023 1024 -6143 -10 0
sawtooth unriffle 1024 1024 -6143 0 0
sawtooth unriffle_skewed 1025 2048 -7188 -1033 0
We get a similar reduction of get_next calls here, but only a slight
reduction of set_next calls, if at all.
And here are the results of p0071-sort.sh before:
0071.12: llist_mergesort() unsorted 0.36(0.33+0.01)
0071.14: llist_mergesort() sorted 0.15(0.13+0.01)
0071.16: llist_mergesort() reversed 0.16(0.14+0.01)
... and here the ones with this patch:
0071.12: llist_mergesort() unsorted 0.24(0.22+0.01)
0071.14: llist_mergesort() sorted 0.12(0.10+0.01)
0071.16: llist_mergesort() reversed 0.12(0.10+0.01)
NB: We can't use t/perf/run to compare revisions in one run because it
uses the test-tool from the worktree, not from the revisions being
tested.
Signed-off-by: René Scharfe <l.s.r@web.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-10-01 09:22:39 +00:00
|
|
|
/*
|
|
|
|
* Perform an iterative mergesort using an array of sublists.
|
|
|
|
*
|
|
|
|
* n is the number of items.
|
|
|
|
* ranks[i] is undefined if n & 2^i == 0, and assumed empty.
|
|
|
|
* ranks[i] contains a sublist of length 2^i otherwise.
|
|
|
|
*
|
|
|
|
* The number of bits in a void pointer limits the number of objects
|
|
|
|
* that can be created, and thus the number of array elements necessary
|
|
|
|
* to be able to sort any valid list.
|
|
|
|
*
|
|
|
|
* Adding an item to this array is like incrementing a binary number;
|
|
|
|
* positional values for set bits correspond to sublist lengths.
|
|
|
|
*/
|
2012-04-17 18:07:01 +00:00
|
|
|
void *llist_mergesort(void *list,
|
|
|
|
void *(*get_next_fn)(const void *),
|
|
|
|
void (*set_next_fn)(void *, void *),
|
|
|
|
int (*compare_fn)(const void *, const void *))
|
2012-03-31 22:10:11 +00:00
|
|
|
{
|
mergesort: use ranks stack
The bottom-up mergesort implementation needs to skip through sublists a
lot. A recursive version could avoid that, but would require log2(n)
stack frames. Explicitly manage a stack of sorted sublists of various
lengths instead to avoid fast-forwarding while also keeping a lid on
memory usage.
While this patch was developed independently, a ranks stack is also used
in https://github.com/mono/mono/blob/master/mono/eglib/sort.frag.h by
the Mono project.
The idea is to keep slots for log2(n_max) sorted sublists, one for each
power of 2. Such a construct can accommodate lists of any length up to
n_max. Since there is a known maximum number of items (effectively
SIZE_MAX), we can preallocate the whole rank stack.
We add items one by one, which is akin to incrementing a binary number.
Make use of that by keeping track of the number of items and check bits
in it instead of checking for NULL in the rank stack when checking if a
sublist of a certain rank exists, in order to avoid memory accesses.
The first item can go into the empty first slot as a sublist of length
2^0. The second one needs to be merged with the previous sublist and
the result goes into the empty second slot as a sublist of length 2^1.
The third one goes into vacated first slot and so on. At the end we
merge all the sublists to get the result.
The new version still performs a stable sort by making sure to put items
seen earlier first when the compare function indicates equality. That's
done by preferring items from sublists with a higher rank.
The new merge function also tries to minimize the number of operations.
Like blame.c::blame_merge(), the function doesn't set the next pointer
if it already points to the right item, and it exits when it reaches the
end of one of the two sublists that it's given. The old code couldn't
do the latter because it kept all items in a single list.
The number of comparisons stays the same, though. Here's example output
of "test-tool mergesort test" for the rand distributions with the most
number of comparisons with the ranks stack:
$ t/helper/test-tool mergesort test | awk '
NR > 1 && $1 != "rand" {next}
$7 > max[$3] {max[$3] = $7; line[$3] = $0}
END {for (n in line) print line[n]}
'
distribut mode n m get_next set_next compare verdict
rand copy 100 32 669 420 569 OK
rand dither 1023 64 9997 5396 8974 OK
rand dither 1024 512 10007 6159 8983 OK
rand dither 1025 256 10993 5988 9968 OK
Here are the differences to the results without this patch:
distribut mode n m get_next set_next compare
rand copy 100 32 -515 -280 0
rand dither 1023 64 -6376 -4834 0
rand dither 1024 512 -6377 -4081 0
rand dither 1025 256 -7461 -5287 0
The numbers of get_next and set_next calls are reduced significantly.
NB: These winners are different than the ones shown in the patch that
introduced the unriffle mode because the addition of the unriffle_skewed
mode in between changed the consumption of rand() values.
Here are the distributions with the most comparisons overall with the
ranks stack:
$ t/helper/test-tool mergesort test | awk '
$7 > max[$3] {max[$3] = $7; line[$3] = $0}
END {for (n in line) print line[n]}
'
distribut mode n m get_next set_next compare verdict
sawtooth unriffle_skewed 100 128 689 632 589 OK
sawtooth unriffle_skewed 1023 1024 10230 10220 9207 OK
sawtooth unriffle 1024 1024 10241 10240 9217 OK
sawtooth unriffle_skewed 1025 2048 11266 10242 10241 OK
And here the differences to before:
distribut mode n m get_next set_next compare
sawtooth unriffle_skewed 100 128 -495 -68 0
sawtooth unriffle_skewed 1023 1024 -6143 -10 0
sawtooth unriffle 1024 1024 -6143 0 0
sawtooth unriffle_skewed 1025 2048 -7188 -1033 0
We get a similar reduction of get_next calls here, but only a slight
reduction of set_next calls, if at all.
And here are the results of p0071-sort.sh before:
0071.12: llist_mergesort() unsorted 0.36(0.33+0.01)
0071.14: llist_mergesort() sorted 0.15(0.13+0.01)
0071.16: llist_mergesort() reversed 0.16(0.14+0.01)
... and here the ones with this patch:
0071.12: llist_mergesort() unsorted 0.24(0.22+0.01)
0071.14: llist_mergesort() sorted 0.12(0.10+0.01)
0071.16: llist_mergesort() reversed 0.12(0.10+0.01)
NB: We can't use t/perf/run to compare revisions in one run because it
uses the test-tool from the worktree, not from the revisions being
tested.
Signed-off-by: René Scharfe <l.s.r@web.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-10-01 09:22:39 +00:00
|
|
|
void *ranks[bitsizeof(void *)];
|
|
|
|
size_t n = 0;
|
|
|
|
int i;
|
2012-03-31 22:10:11 +00:00
|
|
|
|
mergesort: use ranks stack
The bottom-up mergesort implementation needs to skip through sublists a
lot. A recursive version could avoid that, but would require log2(n)
stack frames. Explicitly manage a stack of sorted sublists of various
lengths instead to avoid fast-forwarding while also keeping a lid on
memory usage.
While this patch was developed independently, a ranks stack is also used
in https://github.com/mono/mono/blob/master/mono/eglib/sort.frag.h by
the Mono project.
The idea is to keep slots for log2(n_max) sorted sublists, one for each
power of 2. Such a construct can accommodate lists of any length up to
n_max. Since there is a known maximum number of items (effectively
SIZE_MAX), we can preallocate the whole rank stack.
We add items one by one, which is akin to incrementing a binary number.
Make use of that by keeping track of the number of items and check bits
in it instead of checking for NULL in the rank stack when checking if a
sublist of a certain rank exists, in order to avoid memory accesses.
The first item can go into the empty first slot as a sublist of length
2^0. The second one needs to be merged with the previous sublist and
the result goes into the empty second slot as a sublist of length 2^1.
The third one goes into vacated first slot and so on. At the end we
merge all the sublists to get the result.
The new version still performs a stable sort by making sure to put items
seen earlier first when the compare function indicates equality. That's
done by preferring items from sublists with a higher rank.
The new merge function also tries to minimize the number of operations.
Like blame.c::blame_merge(), the function doesn't set the next pointer
if it already points to the right item, and it exits when it reaches the
end of one of the two sublists that it's given. The old code couldn't
do the latter because it kept all items in a single list.
The number of comparisons stays the same, though. Here's example output
of "test-tool mergesort test" for the rand distributions with the most
number of comparisons with the ranks stack:
$ t/helper/test-tool mergesort test | awk '
NR > 1 && $1 != "rand" {next}
$7 > max[$3] {max[$3] = $7; line[$3] = $0}
END {for (n in line) print line[n]}
'
distribut mode n m get_next set_next compare verdict
rand copy 100 32 669 420 569 OK
rand dither 1023 64 9997 5396 8974 OK
rand dither 1024 512 10007 6159 8983 OK
rand dither 1025 256 10993 5988 9968 OK
Here are the differences to the results without this patch:
distribut mode n m get_next set_next compare
rand copy 100 32 -515 -280 0
rand dither 1023 64 -6376 -4834 0
rand dither 1024 512 -6377 -4081 0
rand dither 1025 256 -7461 -5287 0
The numbers of get_next and set_next calls are reduced significantly.
NB: These winners are different than the ones shown in the patch that
introduced the unriffle mode because the addition of the unriffle_skewed
mode in between changed the consumption of rand() values.
Here are the distributions with the most comparisons overall with the
ranks stack:
$ t/helper/test-tool mergesort test | awk '
$7 > max[$3] {max[$3] = $7; line[$3] = $0}
END {for (n in line) print line[n]}
'
distribut mode n m get_next set_next compare verdict
sawtooth unriffle_skewed 100 128 689 632 589 OK
sawtooth unriffle_skewed 1023 1024 10230 10220 9207 OK
sawtooth unriffle 1024 1024 10241 10240 9217 OK
sawtooth unriffle_skewed 1025 2048 11266 10242 10241 OK
And here the differences to before:
distribut mode n m get_next set_next compare
sawtooth unriffle_skewed 100 128 -495 -68 0
sawtooth unriffle_skewed 1023 1024 -6143 -10 0
sawtooth unriffle 1024 1024 -6143 0 0
sawtooth unriffle_skewed 1025 2048 -7188 -1033 0
We get a similar reduction of get_next calls here, but only a slight
reduction of set_next calls, if at all.
And here are the results of p0071-sort.sh before:
0071.12: llist_mergesort() unsorted 0.36(0.33+0.01)
0071.14: llist_mergesort() sorted 0.15(0.13+0.01)
0071.16: llist_mergesort() reversed 0.16(0.14+0.01)
... and here the ones with this patch:
0071.12: llist_mergesort() unsorted 0.24(0.22+0.01)
0071.14: llist_mergesort() sorted 0.12(0.10+0.01)
0071.16: llist_mergesort() reversed 0.12(0.10+0.01)
NB: We can't use t/perf/run to compare revisions in one run because it
uses the test-tool from the worktree, not from the revisions being
tested.
Signed-off-by: René Scharfe <l.s.r@web.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-10-01 09:22:39 +00:00
|
|
|
while (list) {
|
|
|
|
void *next = get_next_fn(list);
|
|
|
|
if (next)
|
|
|
|
set_next_fn(list, NULL);
|
2021-11-15 23:19:38 +00:00
|
|
|
for (i = 0; n & ((size_t)1 << i); i++)
|
mergesort: use ranks stack
The bottom-up mergesort implementation needs to skip through sublists a
lot. A recursive version could avoid that, but would require log2(n)
stack frames. Explicitly manage a stack of sorted sublists of various
lengths instead to avoid fast-forwarding while also keeping a lid on
memory usage.
While this patch was developed independently, a ranks stack is also used
in https://github.com/mono/mono/blob/master/mono/eglib/sort.frag.h by
the Mono project.
The idea is to keep slots for log2(n_max) sorted sublists, one for each
power of 2. Such a construct can accommodate lists of any length up to
n_max. Since there is a known maximum number of items (effectively
SIZE_MAX), we can preallocate the whole rank stack.
We add items one by one, which is akin to incrementing a binary number.
Make use of that by keeping track of the number of items and check bits
in it instead of checking for NULL in the rank stack when checking if a
sublist of a certain rank exists, in order to avoid memory accesses.
The first item can go into the empty first slot as a sublist of length
2^0. The second one needs to be merged with the previous sublist and
the result goes into the empty second slot as a sublist of length 2^1.
The third one goes into vacated first slot and so on. At the end we
merge all the sublists to get the result.
The new version still performs a stable sort by making sure to put items
seen earlier first when the compare function indicates equality. That's
done by preferring items from sublists with a higher rank.
The new merge function also tries to minimize the number of operations.
Like blame.c::blame_merge(), the function doesn't set the next pointer
if it already points to the right item, and it exits when it reaches the
end of one of the two sublists that it's given. The old code couldn't
do the latter because it kept all items in a single list.
The number of comparisons stays the same, though. Here's example output
of "test-tool mergesort test" for the rand distributions with the most
number of comparisons with the ranks stack:
$ t/helper/test-tool mergesort test | awk '
NR > 1 && $1 != "rand" {next}
$7 > max[$3] {max[$3] = $7; line[$3] = $0}
END {for (n in line) print line[n]}
'
distribut mode n m get_next set_next compare verdict
rand copy 100 32 669 420 569 OK
rand dither 1023 64 9997 5396 8974 OK
rand dither 1024 512 10007 6159 8983 OK
rand dither 1025 256 10993 5988 9968 OK
Here are the differences to the results without this patch:
distribut mode n m get_next set_next compare
rand copy 100 32 -515 -280 0
rand dither 1023 64 -6376 -4834 0
rand dither 1024 512 -6377 -4081 0
rand dither 1025 256 -7461 -5287 0
The numbers of get_next and set_next calls are reduced significantly.
NB: These winners are different than the ones shown in the patch that
introduced the unriffle mode because the addition of the unriffle_skewed
mode in between changed the consumption of rand() values.
Here are the distributions with the most comparisons overall with the
ranks stack:
$ t/helper/test-tool mergesort test | awk '
$7 > max[$3] {max[$3] = $7; line[$3] = $0}
END {for (n in line) print line[n]}
'
distribut mode n m get_next set_next compare verdict
sawtooth unriffle_skewed 100 128 689 632 589 OK
sawtooth unriffle_skewed 1023 1024 10230 10220 9207 OK
sawtooth unriffle 1024 1024 10241 10240 9217 OK
sawtooth unriffle_skewed 1025 2048 11266 10242 10241 OK
And here the differences to before:
distribut mode n m get_next set_next compare
sawtooth unriffle_skewed 100 128 -495 -68 0
sawtooth unriffle_skewed 1023 1024 -6143 -10 0
sawtooth unriffle 1024 1024 -6143 0 0
sawtooth unriffle_skewed 1025 2048 -7188 -1033 0
We get a similar reduction of get_next calls here, but only a slight
reduction of set_next calls, if at all.
And here are the results of p0071-sort.sh before:
0071.12: llist_mergesort() unsorted 0.36(0.33+0.01)
0071.14: llist_mergesort() sorted 0.15(0.13+0.01)
0071.16: llist_mergesort() reversed 0.16(0.14+0.01)
... and here the ones with this patch:
0071.12: llist_mergesort() unsorted 0.24(0.22+0.01)
0071.14: llist_mergesort() sorted 0.12(0.10+0.01)
0071.16: llist_mergesort() reversed 0.12(0.10+0.01)
NB: We can't use t/perf/run to compare revisions in one run because it
uses the test-tool from the worktree, not from the revisions being
tested.
Signed-off-by: René Scharfe <l.s.r@web.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-10-01 09:22:39 +00:00
|
|
|
list = llist_merge(ranks[i], list, get_next_fn,
|
|
|
|
set_next_fn, compare_fn);
|
|
|
|
n++;
|
|
|
|
ranks[i] = list;
|
|
|
|
list = next;
|
|
|
|
}
|
2012-03-31 22:10:11 +00:00
|
|
|
|
mergesort: use ranks stack
The bottom-up mergesort implementation needs to skip through sublists a
lot. A recursive version could avoid that, but would require log2(n)
stack frames. Explicitly manage a stack of sorted sublists of various
lengths instead to avoid fast-forwarding while also keeping a lid on
memory usage.
While this patch was developed independently, a ranks stack is also used
in https://github.com/mono/mono/blob/master/mono/eglib/sort.frag.h by
the Mono project.
The idea is to keep slots for log2(n_max) sorted sublists, one for each
power of 2. Such a construct can accommodate lists of any length up to
n_max. Since there is a known maximum number of items (effectively
SIZE_MAX), we can preallocate the whole rank stack.
We add items one by one, which is akin to incrementing a binary number.
Make use of that by keeping track of the number of items and check bits
in it instead of checking for NULL in the rank stack when checking if a
sublist of a certain rank exists, in order to avoid memory accesses.
The first item can go into the empty first slot as a sublist of length
2^0. The second one needs to be merged with the previous sublist and
the result goes into the empty second slot as a sublist of length 2^1.
The third one goes into vacated first slot and so on. At the end we
merge all the sublists to get the result.
The new version still performs a stable sort by making sure to put items
seen earlier first when the compare function indicates equality. That's
done by preferring items from sublists with a higher rank.
The new merge function also tries to minimize the number of operations.
Like blame.c::blame_merge(), the function doesn't set the next pointer
if it already points to the right item, and it exits when it reaches the
end of one of the two sublists that it's given. The old code couldn't
do the latter because it kept all items in a single list.
The number of comparisons stays the same, though. Here's example output
of "test-tool mergesort test" for the rand distributions with the most
number of comparisons with the ranks stack:
$ t/helper/test-tool mergesort test | awk '
NR > 1 && $1 != "rand" {next}
$7 > max[$3] {max[$3] = $7; line[$3] = $0}
END {for (n in line) print line[n]}
'
distribut mode n m get_next set_next compare verdict
rand copy 100 32 669 420 569 OK
rand dither 1023 64 9997 5396 8974 OK
rand dither 1024 512 10007 6159 8983 OK
rand dither 1025 256 10993 5988 9968 OK
Here are the differences to the results without this patch:
distribut mode n m get_next set_next compare
rand copy 100 32 -515 -280 0
rand dither 1023 64 -6376 -4834 0
rand dither 1024 512 -6377 -4081 0
rand dither 1025 256 -7461 -5287 0
The numbers of get_next and set_next calls are reduced significantly.
NB: These winners are different than the ones shown in the patch that
introduced the unriffle mode because the addition of the unriffle_skewed
mode in between changed the consumption of rand() values.
Here are the distributions with the most comparisons overall with the
ranks stack:
$ t/helper/test-tool mergesort test | awk '
$7 > max[$3] {max[$3] = $7; line[$3] = $0}
END {for (n in line) print line[n]}
'
distribut mode n m get_next set_next compare verdict
sawtooth unriffle_skewed 100 128 689 632 589 OK
sawtooth unriffle_skewed 1023 1024 10230 10220 9207 OK
sawtooth unriffle 1024 1024 10241 10240 9217 OK
sawtooth unriffle_skewed 1025 2048 11266 10242 10241 OK
And here the differences to before:
distribut mode n m get_next set_next compare
sawtooth unriffle_skewed 100 128 -495 -68 0
sawtooth unriffle_skewed 1023 1024 -6143 -10 0
sawtooth unriffle 1024 1024 -6143 0 0
sawtooth unriffle_skewed 1025 2048 -7188 -1033 0
We get a similar reduction of get_next calls here, but only a slight
reduction of set_next calls, if at all.
And here are the results of p0071-sort.sh before:
0071.12: llist_mergesort() unsorted 0.36(0.33+0.01)
0071.14: llist_mergesort() sorted 0.15(0.13+0.01)
0071.16: llist_mergesort() reversed 0.16(0.14+0.01)
... and here the ones with this patch:
0071.12: llist_mergesort() unsorted 0.24(0.22+0.01)
0071.14: llist_mergesort() sorted 0.12(0.10+0.01)
0071.16: llist_mergesort() reversed 0.12(0.10+0.01)
NB: We can't use t/perf/run to compare revisions in one run because it
uses the test-tool from the worktree, not from the revisions being
tested.
Signed-off-by: René Scharfe <l.s.r@web.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-10-01 09:22:39 +00:00
|
|
|
for (i = 0; n; i++, n >>= 1) {
|
|
|
|
if (!(n & 1))
|
|
|
|
continue;
|
|
|
|
if (list)
|
|
|
|
list = llist_merge(ranks[i], list, get_next_fn,
|
|
|
|
set_next_fn, compare_fn);
|
2012-03-31 22:10:11 +00:00
|
|
|
else
|
mergesort: use ranks stack
The bottom-up mergesort implementation needs to skip through sublists a
lot. A recursive version could avoid that, but would require log2(n)
stack frames. Explicitly manage a stack of sorted sublists of various
lengths instead to avoid fast-forwarding while also keeping a lid on
memory usage.
While this patch was developed independently, a ranks stack is also used
in https://github.com/mono/mono/blob/master/mono/eglib/sort.frag.h by
the Mono project.
The idea is to keep slots for log2(n_max) sorted sublists, one for each
power of 2. Such a construct can accommodate lists of any length up to
n_max. Since there is a known maximum number of items (effectively
SIZE_MAX), we can preallocate the whole rank stack.
We add items one by one, which is akin to incrementing a binary number.
Make use of that by keeping track of the number of items and check bits
in it instead of checking for NULL in the rank stack when checking if a
sublist of a certain rank exists, in order to avoid memory accesses.
The first item can go into the empty first slot as a sublist of length
2^0. The second one needs to be merged with the previous sublist and
the result goes into the empty second slot as a sublist of length 2^1.
The third one goes into vacated first slot and so on. At the end we
merge all the sublists to get the result.
The new version still performs a stable sort by making sure to put items
seen earlier first when the compare function indicates equality. That's
done by preferring items from sublists with a higher rank.
The new merge function also tries to minimize the number of operations.
Like blame.c::blame_merge(), the function doesn't set the next pointer
if it already points to the right item, and it exits when it reaches the
end of one of the two sublists that it's given. The old code couldn't
do the latter because it kept all items in a single list.
The number of comparisons stays the same, though. Here's example output
of "test-tool mergesort test" for the rand distributions with the most
number of comparisons with the ranks stack:
$ t/helper/test-tool mergesort test | awk '
NR > 1 && $1 != "rand" {next}
$7 > max[$3] {max[$3] = $7; line[$3] = $0}
END {for (n in line) print line[n]}
'
distribut mode n m get_next set_next compare verdict
rand copy 100 32 669 420 569 OK
rand dither 1023 64 9997 5396 8974 OK
rand dither 1024 512 10007 6159 8983 OK
rand dither 1025 256 10993 5988 9968 OK
Here are the differences to the results without this patch:
distribut mode n m get_next set_next compare
rand copy 100 32 -515 -280 0
rand dither 1023 64 -6376 -4834 0
rand dither 1024 512 -6377 -4081 0
rand dither 1025 256 -7461 -5287 0
The numbers of get_next and set_next calls are reduced significantly.
NB: These winners are different than the ones shown in the patch that
introduced the unriffle mode because the addition of the unriffle_skewed
mode in between changed the consumption of rand() values.
Here are the distributions with the most comparisons overall with the
ranks stack:
$ t/helper/test-tool mergesort test | awk '
$7 > max[$3] {max[$3] = $7; line[$3] = $0}
END {for (n in line) print line[n]}
'
distribut mode n m get_next set_next compare verdict
sawtooth unriffle_skewed 100 128 689 632 589 OK
sawtooth unriffle_skewed 1023 1024 10230 10220 9207 OK
sawtooth unriffle 1024 1024 10241 10240 9217 OK
sawtooth unriffle_skewed 1025 2048 11266 10242 10241 OK
And here the differences to before:
distribut mode n m get_next set_next compare
sawtooth unriffle_skewed 100 128 -495 -68 0
sawtooth unriffle_skewed 1023 1024 -6143 -10 0
sawtooth unriffle 1024 1024 -6143 0 0
sawtooth unriffle_skewed 1025 2048 -7188 -1033 0
We get a similar reduction of get_next calls here, but only a slight
reduction of set_next calls, if at all.
And here are the results of p0071-sort.sh before:
0071.12: llist_mergesort() unsorted 0.36(0.33+0.01)
0071.14: llist_mergesort() sorted 0.15(0.13+0.01)
0071.16: llist_mergesort() reversed 0.16(0.14+0.01)
... and here the ones with this patch:
0071.12: llist_mergesort() unsorted 0.24(0.22+0.01)
0071.14: llist_mergesort() sorted 0.12(0.10+0.01)
0071.16: llist_mergesort() reversed 0.12(0.10+0.01)
NB: We can't use t/perf/run to compare revisions in one run because it
uses the test-tool from the worktree, not from the revisions being
tested.
Signed-off-by: René Scharfe <l.s.r@web.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-10-01 09:22:39 +00:00
|
|
|
list = ranks[i];
|
2012-03-31 22:10:11 +00:00
|
|
|
}
|
|
|
|
return list;
|
|
|
|
}
|