Commit graph

34 commits

Author SHA1 Message Date
Junio C Hamano
0df82d99da Merge branch 'jk/object-filter-with-bitmap'
The object reachability bitmap machinery and the partial cloning
machinery were not prepared to work well together, because some
object-filtering criteria that partial clones use inherently rely
on object traversal, but the bitmap machinery is an optimization
to bypass that object traversal.  There however are some cases
where they can work together, and they were taught about them.

* jk/object-filter-with-bitmap:
  rev-list --count: comment on the use of count_right++
  pack-objects: support filters with bitmaps
  pack-bitmap: implement BLOB_LIMIT filtering
  pack-bitmap: implement BLOB_NONE filtering
  bitmap: add bitmap_unset() function
  rev-list: use bitmap filters for traversal
  pack-bitmap: basic noop bitmap filter infrastructure
  rev-list: allow commit-only bitmap traversals
  t5310: factor out bitmap traversal comparison
  rev-list: allow bitmaps when counting objects
  rev-list: make --count work with --objects
  rev-list: factor out bitmap-optimized routines
  pack-bitmap: refuse to do a bitmap traversal with pathspecs
  rev-list: fallback to non-bitmap traversal when filtering
  pack-bitmap: fix leak of haves/wants object lists
  pack-bitmap: factor out type iterator initialization
2020-03-02 15:07:18 -08:00
Jeff King
cc4aa28506 bitmap: add bitmap_unset() function
We've never needed to unset an individual bit in a bitmap until now.
Typically they start with all bits unset and we bitmap_set() them, or we
are applying another bitmap as a mask. But the easiest way to apply an
object filter to a bitmap result will be to unset the individual bits.

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-02-14 10:46:22 -08:00
Jeff King
14fbd26044 ewah/bitmap: introduce bitmap_word_alloc()
In a following commit we will need to allocate a variable
number of bitmap words, instead of always 32, so let's add
bitmap_word_alloc() for this purpose.

Note that we have to adjust the block growth in bitmap_set(),
since a caller could now use an initial size of "0" (we don't
plan to do that, but it doesn't hurt to be defensive).

Helped-by: Jonathan Tan <jonathantanmy@google.com>
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Christian Couder <chriscool@tuxfamily.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-01-23 10:51:50 -08:00
Ramsay Jones
3a457a08f2 ewok_rlw.h: add missing 'inline' to function definition
The 'ewok_rlw.h' header file contains the rlw_get_run_bit() function
definition, which is marked as 'static' but not 'inline'. At least when
compiled by gcc, with the default -O2 optimization level, the function
is actually inlined and leaves no static version in the ewah_bitmap.o
and ewah_rlw.o object files. Despite this, add the missing 'inline'
keyword to better describe the intended behaviour.

Signed-off-by: Ramsay Jones <ramsay@ramsayjones.plus.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-10-29 10:14:19 +09:00
Ramsay Jones
9623514aa8 ewah/ewok_rlw.h: add missing include (hdr-check)
Signed-off-by: Ramsay Jones <ramsay@ramsayjones.plus.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-09-20 11:50:00 -07:00
Junio C Hamano
c806278e0c ewah: delete unused 'rlwit_discharge_empty()'
Complete the removal of unused 'ewah bitmap' code by removing the now
unused 'rlwit_discharge_empty()' function. Also, the 'ewah_clear()'
function can now be made a file-scope static symbol.

Signed-off-by: Ramsay Jones <ramsay@ramsayjones.plus.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-06-21 09:39:48 -07:00
Jeff King
26675087a5 ewah: drop ewah_serialize_native function
We don't call this function, and never have. The on-disk
bitmap format uses network-byte-order integers, meaning that
we cannot use the native-byte-order format written here.

Let's drop it in the name of simplicity.

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-06-18 10:16:19 -07:00
Jeff King
caa88140d4 ewah: drop ewah_deserialize function
We don't call this function, and in fact never have since it
was added (at least not in iterations of the ewah patches
that got merged). Instead we use ewah_read_mmap().

Let's drop the unused code.

Note to anybody who later wants to resurrect this: it does
not check for integer overflow in the ewah data size,
meaning it may be possible to convince the code to allocate
a too-small buffer and read() into it.

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-06-18 10:16:19 -07:00
Derrick Stolee
83ea4e1e59 ewah_io: delete unused 'ewah_serialize()'
Reported-by: Ramsay Jones <ramsay@ramsayjones.plus.com>
Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-06-18 10:16:19 -07:00
Derrick Stolee
a9fda811fc ewah_bitmap: delete unused 'ewah_or()'
Reported-by: Ramsay Jones <ramsay@ramsayjones.plus.com>
Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-06-18 10:16:19 -07:00
Derrick Stolee
44301d2b76 ewah_bitmap: delete unused 'ewah_not()'
Reported-by: Ramsay Jones <ramsay@ramsayjones.plus.com>
Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-06-18 10:16:19 -07:00
Derrick Stolee
19436fe03a ewah_bitmap: delete unused 'ewah_and_not()'
Reported-by: Ramsay Jones <ramsay@ramsayjones.plus.com>
Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-06-18 10:16:19 -07:00
Derrick Stolee
01b4a63f55 ewah_bitmap: delete unused 'ewah_and()'
Reported-by: Ramsay Jones <ramsay@ramsayjones.plus.com>
Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-06-18 10:16:19 -07:00
Derrick Stolee
48dc98344f ewah/bitmap.c: delete unused 'bitmap_each_bit()'
Reported-by: Ramsay Jones <ramsay@ramsayjones.plus.com>
Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-06-18 10:16:19 -07:00
Derrick Stolee
b36c3134bb ewah/bitmap.c: delete unused 'bitmap_clear()'
Reported-by: Ramsay Jones <ramsay@ramsayjones.plus.com>
Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-06-18 10:16:18 -07:00
Jeff King
9d2e330b17 ewah_read_mmap: bounds-check mmap reads
The on-disk ewah format tells us how big the ewah data is,
and we blindly read that much from the buffer without
considering whether the mmap'd data is long enough, which
can lead to out-of-bound reads.

Let's make sure we have data available before reading it,
both for the ewah header/footer as well as for the bit data
itself. In particular:

  - keep our ptr/len pair in sync as we move through the
    buffer, and check it before each read

  - check the size for integer overflow (this should be
    impossible on 64-bit, as the size is given as a 32-bit
    count of 8-byte words, but is possible on a 32-bit
    system)

  - return the number of bytes read as an ssize_t instead of
    an int, again to prevent integer overflow

  - compute the return value using a pointer difference;
    this should yield the same result as the existing code,
    but makes it more obvious that we got our computations
    right

The included test is far from comprehensive, as it just
picks a static point at which to truncate the generated
bitmap. But in practice this will hit in the middle of an
ewah and make sure we're at least exercising this code.

Reported-by: Luat Nguyen <root@l4w.io>
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-06-18 09:13:57 -07:00
Todd Zullinger
484257925f Replace Free Software Foundation address in license notices
The mailing address for the FSF has changed over the years.  Rather than
updating the address across all files, refer readers to gnu.org, as the
GNU GPL documentation now suggests for license notices.  The mailing
address is retained in the full license files (COPYING and LGPL-2.1).

The old address is still present in t/diff-lib/COPYING.  This is
intentional, as the file is used in tests and the contents are not
expected to change.

Signed-off-by: Todd Zullinger <tmz@pobox.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2017-11-09 13:21:21 +09:00
René Scharfe
42c78a216e use DIV_ROUND_UP
Convert code that divides and rounds up to use DIV_ROUND_UP to make the
intent clearer and reduce the number of magic constants.

Signed-off-by: Rene Scharfe <l.s.r@web.de>
Reviewed-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2017-07-10 14:24:36 -07:00
Junio C Hamano
cb36508ac5 Merge branch 'jk/ewah-use-right-type-in-sizeof'
Code clean-up.

* jk/ewah-use-right-type-in-sizeof:
  ewah: fix eword_t/uint64_t confusion
2017-03-12 23:21:35 -07:00
Jeff King
3255e512a8 ewah: fix eword_t/uint64_t confusion
The ewah subsystem typedefs eword_t to be uint64_t, but some
code uses a bare uint64_t. This isn't a bug now, but it's a
potential maintenance problem if the definition of eword_t
ever changes. Let's use the correct type.

Note that we can't use COPY_ARRAY() here because the source
and destination point to objects of different sizes. For
that reason we'll also skip the usual "sizeof(*dst)" and use
the real type, which should make it more clear that there's
something tricky going on.

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2017-03-06 12:03:40 -08:00
Jeff King
08c95df8fa ewah: convert to REALLOC_ARRAY, etc
Now that we're built around xmalloc and friends, we can use
helpers like REALLOC_ARRAY, ALLOC_GROW, and so on to make
the code shorter and protect against integer overflow.

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2016-02-22 14:51:09 -08:00
Jeff King
fb7dbf3e7a convert ewah/bitmap code to use xmalloc
This code was originally written with the idea that it could
be spun off into its own ewah library, and uses the
overrideable ewah_malloc to do allocations.

We plug in xmalloc as our ewah_malloc, of course. But over
the years the ewah code itself has become more entangled
with git, and the return value of many ewah_malloc sites is
not checked.

Let's just drop the level of indirection and use xmalloc and
friends directly. This saves a few lines, and will let us
adapt these sites to our more advanced malloc helpers.

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2016-02-22 14:51:09 -08:00
Junio C Hamano
6da9f888da Merge branch 'es/osx-header-pollutes-mask-macro'
* es/osx-header-pollutes-mask-macro:
  ewah: use less generic macro name
  ewah/bitmap: silence warning about MASK macro redefinition
2015-06-24 12:21:44 -07:00
Jeff King
34b935c01f ewah: use less generic macro name
The ewah/ewok.h header pollutes the global namespace with
"BITS_IN_WORD", without any specific notion that we are
talking about the bits in an eword_t. We can give this the
more specific name "BITS_IN_EWORD".

Signed-off-by: Jeff King <peff@peff.net>
Reviewed-by: Eric Sunshine <sunshine@sunshineco.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2015-06-03 00:04:01 -07:00
Eric Sunshine
414382fb00 ewah/bitmap: silence warning about MASK macro redefinition
On PowerPC Mac OS X (10.5.8 "Leopard" with Xcode 3.1),
system header /usr/include/ppc/param.h[1] pollutes the
preprocessor namespace with a macro generically named MASK.
This conflicts with the same-named macro in ewah/bitmap.c.
We can avoid this conflict by using a more specific name.

[1]: Included indirectly via:
     git-compat-util.h ->
     sys/sysctl.h ->
     sys/ucred.h ->
     sys/param.h ->
     machine/param.h ->
     ppc/param.h

Signed-off-by: Eric Sunshine <sunshine@sunshineco.com>
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2015-06-03 00:03:03 -07:00
Nguyễn Thái Ngọc Duy
be0d9d5323 ewah: add convenient wrapper ewah_serialize_strbuf()
Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2015-03-12 13:45:16 -07:00
Junio C Hamano
2c1f554d0c Merge branch 'jk/pack-bitmap'
The pack bitmap support did not build with older versions of GCC.

* jk/pack-bitmap:
  ewah: fix building with gcc < 3.4.0
2015-02-18 11:45:00 -08:00
Tom G. Christensen
bd4e8822da ewah: fix building with gcc < 3.4.0
The __builtin_ctzll function was added in gcc 3.4.0.
This extends the check for gcc so that use of __builtin_ctzll is only
enabled if gcc >= 3.4.0.

Signed-off-by: Tom G. Christensen <tgc@statsbiblioteket.dk>
Reviewed-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2015-02-04 10:45:31 -08:00
Nguyễn Thái Ngọc Duy
16fc2b7a9c ewah: delete unused ewah_read_mmap_native declaration
Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2014-04-29 12:43:16 -07:00
Nguyễn Thái Ngọc Duy
a0a2f7d79c ewah: fix constness of ewah_read_mmap
Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2014-04-29 12:43:15 -07:00
Kyle J. McKay
68f4e1fc6a ewah_bitmap.c: do not assume size_t and eword_t are the same size
When buffer_grow changes the size of the buffer using realloc,
it first computes and saves the rlw pointer's offset into the
buffer using (uint8_t *) math before the realloc but then
restores it using (eword_t *) math.

In order to do this it's necessary to convert the (uint8_t *)
offset into an (eword_t *) offset.  It was doing this by
dividing by the sizeof(size_t).  Unfortunately sizeof(size_t)
is not same as sizeof(eword_t) on all platforms.

This causes illegal memory accesses and other bad things to
happen when attempting to use bitmaps on those platforms.

Fix this by dividing by the sizeof(eword_t) instead which
will always be correct for all platforms.

Signed-off-by: Kyle J. McKay <mackyle@gmail.com>
Acked-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2014-04-22 16:21:16 -07:00
Jeff King
6b5b3a27b7 ewah: unconditionally ntohll ewah data
Commit a201c20 tried to optimize out a loop like:

  for (i = 0; i < len; i++)
	  data[i] = ntohll(data[i]);

in the big-endian case, because we know that ntohll is a
noop, and we do not need to pay the cost of the loop at all.
However, it mistakenly assumed that __BYTE_ORDER was always
defined, whereas it may not be on systems which do not
define it by default, and where we did not need to define it
to set up the ntohll macro. This includes OS X and Windows.

We could muck with the ordering in compat/bswap.h to make
sure it is defined unconditionally, but it is simpler to
still to just execute the loop unconditionally. That avoids
the application code knowing anything about these magic
macros, and lets it depend only on having ntohll defined.

And since the resulting loop looks like (on a big-endian
system):

  for (i = 0; i < len; i++)
	  data[i] = data[i];

any decent compiler can probably optimize it out.

Original report and analysis by Brian Gernhardt.

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2014-02-12 11:21:29 -08:00
Vicent Marti
a201c20b41 ewah: support platforms that require aligned reads
The caller may hand us an unaligned buffer (e.g., because it
is an mmap of a file with many ewah bitmaps). On some
platforms (like SPARC) this can cause a bus error. We can
fix it with a combination of get_be32 and moving the data
into an aligned buffer (which we would do anyway, but we can
move it before fixing the endianness).

Signed-off-by: Vicent Marti <tanoku@gmail.com>
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2014-01-23 14:05:05 -08:00
Vicent Marti
e1273106f6 ewah: compressed bitmap implementation
EWAH is a word-aligned compressed variant of a bitset (i.e. a data
structure that acts as a 0-indexed boolean array for many entries).

It uses a 64-bit run-length encoding (RLE) compression scheme,
trading some compression for better processing speed.

The goal of this word-aligned implementation is not to achieve
the best compression, but rather to improve query processing time.
As it stands right now, this EWAH implementation will always be more
efficient storage-wise than its uncompressed alternative.

EWAH arrays will be used as the on-disk format to store reachability
bitmaps for all objects in a repository while keeping reasonable sizes,
in the same way that JGit does.

This EWAH implementation is a mostly straightforward port of the
original `javaewah` library that JGit currently uses. The library is
self-contained and has been embedded whole (4 files) inside the `ewah`
folder to ease redistribution.

The library is re-licensed under the GPLv2 with the permission of Daniel
Lemire, the original author. The source code for the C version can
be found on GitHub:

	https://github.com/vmg/libewok

The original Java implementation can also be found on GitHub:

	https://github.com/lemire/javaewah

[jc: stripped debug-only code per Peff's $gmane/239768]

Signed-off-by: Vicent Marti <tanoku@gmail.com>
Signed-off-by: Jeff King <peff@peff.net>
Helped-by: Ramsay Jones <ramsay@ramsay1.demon.co.uk>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2013-12-30 12:17:20 -08:00