Commit graph

3 commits

Author SHA1 Message Date
Jeff King
6b5b3a27b7 ewah: unconditionally ntohll ewah data
Commit a201c20 tried to optimize out a loop like:

  for (i = 0; i < len; i++)
	  data[i] = ntohll(data[i]);

in the big-endian case, because we know that ntohll is a
noop, and we do not need to pay the cost of the loop at all.
However, it mistakenly assumed that __BYTE_ORDER was always
defined, whereas it may not be on systems which do not
define it by default, and where we did not need to define it
to set up the ntohll macro. This includes OS X and Windows.

We could muck with the ordering in compat/bswap.h to make
sure it is defined unconditionally, but it is simpler to
still to just execute the loop unconditionally. That avoids
the application code knowing anything about these magic
macros, and lets it depend only on having ntohll defined.

And since the resulting loop looks like (on a big-endian
system):

  for (i = 0; i < len; i++)
	  data[i] = data[i];

any decent compiler can probably optimize it out.

Original report and analysis by Brian Gernhardt.

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2014-02-12 11:21:29 -08:00
Vicent Marti
a201c20b41 ewah: support platforms that require aligned reads
The caller may hand us an unaligned buffer (e.g., because it
is an mmap of a file with many ewah bitmaps). On some
platforms (like SPARC) this can cause a bus error. We can
fix it with a combination of get_be32 and moving the data
into an aligned buffer (which we would do anyway, but we can
move it before fixing the endianness).

Signed-off-by: Vicent Marti <tanoku@gmail.com>
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2014-01-23 14:05:05 -08:00
Vicent Marti
e1273106f6 ewah: compressed bitmap implementation
EWAH is a word-aligned compressed variant of a bitset (i.e. a data
structure that acts as a 0-indexed boolean array for many entries).

It uses a 64-bit run-length encoding (RLE) compression scheme,
trading some compression for better processing speed.

The goal of this word-aligned implementation is not to achieve
the best compression, but rather to improve query processing time.
As it stands right now, this EWAH implementation will always be more
efficient storage-wise than its uncompressed alternative.

EWAH arrays will be used as the on-disk format to store reachability
bitmaps for all objects in a repository while keeping reasonable sizes,
in the same way that JGit does.

This EWAH implementation is a mostly straightforward port of the
original `javaewah` library that JGit currently uses. The library is
self-contained and has been embedded whole (4 files) inside the `ewah`
folder to ease redistribution.

The library is re-licensed under the GPLv2 with the permission of Daniel
Lemire, the original author. The source code for the C version can
be found on GitHub:

	https://github.com/vmg/libewok

The original Java implementation can also be found on GitHub:

	https://github.com/lemire/javaewah

[jc: stripped debug-only code per Peff's $gmane/239768]

Signed-off-by: Vicent Marti <tanoku@gmail.com>
Signed-off-by: Jeff King <peff@peff.net>
Helped-by: Ramsay Jones <ramsay@ramsay1.demon.co.uk>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2013-12-30 12:17:20 -08:00