The `oidset_parse_file_carefully()` function implicitly depends on
`the_repository` when parsing object IDs. Fix this by having callers
pass in the hash algorithm to use.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
In a following commit, we will need to add all the oids from a set into
another set. In "list-objects-filter.c", there is already a static
function called add_all() to do that.
Let's rename this function oidset_insert_from_set() and move it into
oidset.{c,h} to make it generally available.
While at it, let's remove a useless `!= NULL`.
Signed-off-by: Christian Couder <chriscool@tuxfamily.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
oidset_size() just reads a single word from memory and returns it.
Avoid the function call overhead for this trivial operation by turning
it into an inline function.
While we're at it, declare its parameter const to allow it to be used
on read-only oidsets.
Suggested-by: Jeff King <peff@peff.net>
Signed-off-by: René Scharfe <l.s.r@web.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The command reads list of object names to place on the ignore list
either from the command line or from a file, but they are not
checked with their object type (those read from the file are not
even checked for object existence).
Extend the oidset_parse_file() API and allow it to take a callback
that can be used to die (e.g. when an inappropriate input is read)
or modify the object name read (e.g. when a tag pointing at a commit
is read, and the caller wants a commit object name), and use it in
the code that handles ignore list.
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Code cleanup.
* jk/oid-array-cleanups:
oidset: stop referring to sha1-array
ref-filter: stop referring to "sha1 array"
bisect: stop referring to sha1_array
test-tool: rename sha1-array to oid-array
oid_array: rename source file from sha1-array
oid_array: use size_t for iteration
oid_array: use size_t for count and allocation
Occasionally, it may be useful for callers to know the number of object
IDs in an oidset. Right now, the only way to compute this is to call
'kh_size' on the internal 'kh_set_oid_t'.
Similar to how we wrap other 'kh_*' functions over the 'oidset' type,
let's allow callers to compute this value by introducing 'oidset_size'.
We will add its first caller in the subsequent commit.
Signed-off-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Ths has been oid_array for some time, though the source only recently
moved from sha1-array.[ch] to oid-array.[ch]. In either case, we should
say "oid-array" here.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
When commit 8b2f8cbcb1 ("oidset: use khash", 2018-10-04) moved from
using oidmap to khash, it replaced the oidmap.h include with both one
for hashmap.h and khash.h. Since the hashmap.h header is unnecessary,
and the point of the patch was to switch from hashmap (used by oidmap)
to khash.h, remove the unneccessary include.
Signed-off-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
"git blame" learned to "ignore" commits in the history, whose
effects (as well as their presence) get ignored.
* br/blame-ignore:
t8014: remove unnecessary braces
blame: drop some unused function parameters
blame: add a test to cover blame_coalesce()
blame: use the fingerprint heuristic to match ignored lines
blame: add a fingerprint heuristic to match ignored lines
blame: optionally track line fingerprints during fill_blame_origin()
blame: add config options for the output of ignored or unblamable lines
blame: add the ability to ignore commits and their changes
blame: use a helper function in blame_chunk()
Move oidset_parse_file() to oidset.c
fsck: rename and touch up init_skiplist()
khash lets us define a hash as either a map or a set (i.e., with no
"value" type). For the oid maps we define, "oid" is the set and
"oid_map" is the map. As the bug in the previous commit shows, it's easy
to pick the wrong one.
So let's make the names more distinct: "oid_set" and "oid_map".
An alternative naming scheme would be to actually name the type after
the key/value types. So e.g., "oid" _would_ be the set, since it has no
value type. And "oid_map" would become "oid_void" or similar (and
"oid_pos" becomes "oid_int"). That's better in some ways: it's more
regular, and a given map type can be more reasily reused in multiple
contexts (e.g., something storing an "int" that isn't a "pos"). But it's
also slightly less descriptive.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Move the oid khash table definition to khash.h and define a typedef for
it, similar to the one we have for unsigned char pointers. Define
variants that are maps as well.
Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
There is no need to inline oidset_init(), as it's typically only called
twice in the lifetime of an oidset (once at the beginning and at the end
by oidset_clear()) and kh_resize_* is quite big, so move its definition
to oidset.c. Document it while we're at it.
Signed-off-by: Rene Scharfe <l.s.r@web.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Reimplement oidset using khash.h in order to reduce its memory footprint
and make it faster.
Performance of a command that mainly checks for duplicate objects using
an oidset, with master and Clang 6.0.1:
$ cmd="./git-cat-file --batch-all-objects --unordered --buffer --batch-check='%(objectname)'"
$ /usr/bin/time $cmd >/dev/null
0.22user 0.03system 0:00.25elapsed 99%CPU (0avgtext+0avgdata 48484maxresident)k
0inputs+0outputs (0major+11204minor)pagefaults 0swaps
$ hyperfine "$cmd"
Benchmark #1: ./git-cat-file --batch-all-objects --unordered --buffer --batch-check='%(objectname)'
Time (mean ± σ): 250.0 ms ± 6.0 ms [User: 225.9 ms, System: 23.6 ms]
Range (min … max): 242.0 ms … 261.1 ms
And with this patch:
$ /usr/bin/time $cmd >/dev/null
0.14user 0.00system 0:00.15elapsed 100%CPU (0avgtext+0avgdata 41396maxresident)k
0inputs+0outputs (0major+8318minor)pagefaults 0swaps
$ hyperfine "$cmd"
Benchmark #1: ./git-cat-file --batch-all-objects --unordered --buffer --batch-check='%(objectname)'
Time (mean ± σ): 151.9 ms ± 4.9 ms [User: 130.5 ms, System: 21.2 ms]
Range (min … max): 148.2 ms … 170.4 ms
Initial-patch-by: Jeff King <peff@peff.net>
Signed-off-by: Rene Scharfe <l.s.r@web.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
c3a9ad3117 ("oidset: add iterator methods to oidset", 2017-11-21)
introduced a 'oidset_init()' function in oidset.h, which has void as
return type, but returns an expression.
This makes the solaris compiler fail with:
"oidset.h", line 30: void function cannot return value
As the return type is void, and even the return type of the expression
we're trying to return (oidmap_init) is void just remove the return
statement to fix the compiler error.
Signed-off-by: Thomas Gummerer <t.gummerer@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Add the usual iterator methods to oidset.
Add oidset_remove().
Signed-off-by: Jeff Hostetler <jeffhost@microsoft.com>
Reviewed-by: Jonathan Tan <jonathantanmy@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
This is similar to using the hashmap in hashmap.c, but with an
easier-to-use API. In particular, custom entry comparisons no longer
need to be written, and lookups can be done without constructing a
temporary entry structure.
This is implemented as a thin wrapper over the hashmap API. In
particular, this means that there is an additional 4-byte overhead due
to the fact that the first 4 bytes of the hash is redundantly stored.
For now, I'm taking the simpler approach, but if need be, we can
reimplement oidmap without affecting the callers significantly.
oidset has been updated to use oidmap.
Signed-off-by: Jonathan Tan <jonathantanmy@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
This is similar to many of our uses of sha1-array, but it
overcomes one limitation of a sha1-array: when you are
de-duplicating a large input with relatively few unique
entries, sha1-array uses 20 bytes per non-unique entry.
Whereas this set will use memory linear in the number of
unique entries (albeit a few more than 20 bytes due to
hashmap overhead).
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>